• AWS EBS-Snapshots alone is not a Backup Strategy
You’ve heard the marketing messages from AWS; The EBS Snapshots takes a point-in-time image of the EBS volume, and the images are stored in an incremental manner, saving you storage pace and monies. The technology isn’t new, as the same outcome can be had from LVMs and ZFS, albeit to say that AWS probably has some proprietary mechanism in place for its scalability.
That said, drive image snapshots are what it is. A snapshot of the drive’s data, structure and all. If your drive has gotten some malware, or have some disk-data corruption, the snapshot will take that as part of its imaging data. I can format the EBS with a customised (aka proprietary) encrypted File-System, and the EBS Snapshot will take an image of it, regardless whether the drive data contains actual data or garbage.
Another thing to note: EBS Snapshots have no understanding of file-systems, much less the files itself. There is no index of files tied to each EBS Snapshot so you can search for the appropriate file, whether it’s in a particular snapshot or not.
Nor does the snapshot tell you anything about the system it was hosted or mounted on previously. Sure, you can re-create a drive volume from a snapshot, but what’s the system configuration that the drive is supposed to be mounted to?
So the crux of the backup matter is: If your backup strategy is based purely of EBS-Snapshot, that’s a time-bomb waiting.
Any enterprise IT admin will gladly tell you that 80% of the time, the restores they make are because some user has lost their file, or accidentally deleted it off the server. Some of the users can’t remember when they last deleted it. All they can remember is it’s a powerpoint presentation, and it had *demo* as part of its filename. If EBS Snapshot is your only means of backup, you have to hunt for the appropriate snapshot, restore it, mount it, and then do some magic directory searching.
What about Disaster Recovery? Sure with drive images, you can restore back to the point in-time, and if you only have a couple of servers, perhaps tagging them in AWS will suffice. Woe to those system administrators who have 100s of servers to deal with.
In those situations, you will need some form of metadata system that tracks which snapshot goes to which EBS drive to which system. EBS Snapshot doesn’t contain any configuration data of the systems to which it is attached.
And I can’t stress this enough times. Architect and design your DR strategy and process, but please ensure you perform the recovery process and check data-integrity at least once a year. Most organisations ignore this, thinking that as long as they do their automated routine backups and snapshots, they are ok, until the crunch-day comes, and they figure out that there is a recovery-password involved (or some other factor), and the IT-admin that has that information has already left the company without passing down this information.