Hello, and welcome back to CS615 System Administration! This is week 9, segment 3, and we're ready to join Marty McFly and the Doc in their time traveling adventures, so hop into our DeLorean here and buckle up: our flux capacitor is charged with one point twenty one jigowatts, ready to go. Remember how in our last video we talked about the difference between simply backing up all files and being able to review changes from a given point in time? We had noticed that just having a copy of files in the past is somewhat different from seeing what the filesystem actually looked like at that time, especially when it comes to file or directory deletions. If we have the ability to snapshot a filesystem, then we gain the ability to time travel -- well, at least into the past. So hey, why not give it a try? --- Here we have our NetBSD EC2 instance, and if you paid attention to the dump(8) manual page after our previous video, you might have seen a passing reference to so-called "snap-backups" - backups to a snapshot. The default filesystem on NetBSD - FFS - supports so-called "file system snapshots", which it implements using a pseudo-device that, when accessed, gives you the view of the filesystem at the time the snapshot was taken. You can create and configure such snapshots using the fssconfig(8) utility, as shown here in the Examples section of the manual page. So let's give this a try: Here's our root filesystem on the disk /dev/xbd0a. We configure fss0 of slash under "/backup". Hey now, that went quickly! And that's already one of the main points of snapshots -- unlike a full level 0 backup, they are near instantaneous. We'll take a closer look at how this works in a minute, but for now let's continue to see how we can use this snapshot. So this file looks pretty large here -- around 10 Gig in size. Which... seems weird, in that our entire filesystem is 10 gigs in size, so... how did we create a file that large? Maybe we can compress it to save some space... Oh, hmm, no, can't do that. But permissions should allow me to read the file? Let's make extra sure... Nope, no can do. But I'm root, goddammit! Let's look at the file flags. Oh, I see... this is no ordinary file, but a special "snapshot" file. Let's try to use it as per the manual page and mount it instead via the fss0 device: There it is - looks like it's just about the same as our root filesystem. Let's change into that directory and see what we find. Hey, neat, that looks just like our root filesystem! But we can't manipulate the files here -- the snapshot is read-only, a frozen moment in time. Which is really all for the best, since we know that time travel is dangerous and before you know it you end up dating your mother and start to disappear, so probably best to keep the past immutable. But now if we were to suffer some unexpected data loss... Nooooooo! ...then we can, of course, trivially restore it from the snapshot. Yay. And when we're done, we can unmount the snapshot, and if we no longer need it, we unconfigure the pseudo device. Note that the special "backup" file here remains in place, so we could re-configure the fss device and re-mount it if needed; if we want to get rid of it completely, we nuke the file. So how did this approach differ from the previous backup mechanisms? --- In our last video, we saw that using the dump(8) utility, or using tar(1) or rsync(1) all take some time, as they create true copies of the input data set. - In contrast, a filesystem snapshot is immediate, near instantaneous. So that's a pretty big advantage. Similarly, snapshots do not - take up any additional space -- we're not creating another _copy_ of the data, so we don't have to write additional data blocks. - We can mount the snapshot and then have a complete filesystem hierarchy view of all data in the snapshot, which makes it very convenient to use. - At the same time, we can't accidentally overwrite data in the snapshot -- it remains immutable, and even root can't make any changes. And with all that -- an immutable filesystem that's mounted and can be accessed like any other part of the filesystem, you - really get a self-restore mechanism: any user who accidentally deleted a file can go and fetch the preserved copy from the snapshot. So you could configure your system such that it takes a snapshot every hour, and thereby offer your users a self-restore backup solution with a Recovery Point Objective of an hour and an instantaneous Recovery Time Objective. The only disadvantage here is that - the snapshot is invariable bound to the system on which it was taken. That is, you can't really take a snapshot and move it offsite, nor can you snapshot the filesystem to a separate disk -- it's a snapshot of this very filesystem on this very storage device. But the general idea here is indeed what underlies some common backup systems, --- most commonly used, perhaps, the macOS "Time Machine", which, perhaps surprisingly, operates entirely without flux dispersal, but instead uses a similar concept to filesystem snapshots. That is, Time Machine starts out by creating an - expensive, time consuming, full level 0 backup of the filesystem in question. This is usually done to a network attached storage device, although as a consumer targeted backup solution this often works with whatever local storage you might have as well. Then, - every hour, it creates a second... "view" of the filesystem, but it does that in a manner that preserves disk space and time: instead of creating a copy of all the files, it looks at what files have not changed and creates hardlinks for those, only copying the new or modified files. So in a way this uses hardlinks and a modification delta to create a variation of a differential backup, overlaying changes over the references to the unchanged files. - Time Machine sacrifices accuracy for speed: it'll copy any file that is in a folder with a newer last-modified timestamp, even if it hasn't changed. This is somewhat more efficient than comparing individual files. But the end-result is that you have multiple, space-saving views of the filesystem across time - as it performs this logic hourly, daily, weekly, and monthly, allowing you to go back in time to restore your files, which is really quite useful. --- A slightly different approach is used by the Write Anywhere File Layout, or WAFL -- and who doesn't like waffles? This filesystem is used by - NetApp's "Data OnTAP" operating system on their industry standard high-scale data storage devices. These devices have a high reliability requirement, and they use the concept of filesystem snapshots and filesystem checkpoints to provide data security. In particular, they perform these snapshots - every ten seconds as consistency checkpoints to guarantee a small Recovery Point Objective as well as Recovery Time Objective. Similar to what we've seen a minute ago, - the filesystem utilizes near instantaneous snapshots, with modifications of the root filesystem taking place in the new blocks. This utilizes a mechanism often referred to as "Copy on Write", but is more accurately described as a "Redirect on Write". Here, let's take a look at how such snapshots work: --- In WAFL, we use concepts close to what we discussed in our week 3 videos about the unix filesystem, but the structure is somewhat different. The system uses a root inode, but does not have inode blocks and data blocks separated the way we saw. Instead, all the metadata and data of the files can be indirected through from the root inode, meaning you can rebuild the whole filesystem from that root inode. So when you want to create a new filesystem snapshot, all you need to do is - copy the root inode, - and the new snapshot root inode will reference all the same data and metadata blocks as the original. This operation is obviously fast and cheap. But now suppose we want to perform an update on some files, say, --- we want to make changes to blocks C and D here. In the "Copy on Write" model, we'd now first create a copy of blocks C and D, then update the references the snapshot uses --- and then write the modified blocks C-prime and D-prime. But this requires multiple I/O operations -- namely the copy and the write. In a --- Redirect-on-Write approach, when we want to make changes to blocks C and D, we --- simply write the new data and keep the snapshot references pointing to the original blocks. So this is a bit more efficient, even though often times the details of the implementation are glossed over and still be referred to as "copy on write". But anyway, so we now have this second snapshot root inode here -- how do we roll back a change? Well, all we really have to do is --- copy the snapshot root inode back over the original root inode, and we're magically back to the point in time where the snapshot was taken, and the now unreferenced, modified data blocks - can be freed. In this manner both the initial snapshot creation as well as the rollback are near instantaneous. Pretty clever, huh? Other file systems implemented the same concept, and one of them is ZFS, so let's illustrate the use of such filesystem snapshots on such a system: --- So here we haven OmniOS EC2 instance using ZFS. But ZFS is a bit weird -- look, there are no files here starting with .zfs, except there is a directory, and in that directory, we see a subdirectory named "snapshot". That is, this system here already contains a ZFS snapshot named "kayak" -- which happens to be the OmniOS install system, but we can ignore that snapshot here and focus on creating our own. So let's look at our root filesystem, which we have in previous videos already observed to be located on a ZFS pool. So let's create a ZFS snapshot using this command: You can specify any label here, but perhaps we'll use a date string so we can easily identify from when this snapshot was taken. As you can see, the snapshot again is instantaneous, and we can now see it here in the special /.zfs directory. You can list all snapshots also using the 'zfs' command itself, of course, and just like in our earlier FFS example, we can change the directory here and browse the snapshot. So now... if we make some changes to the live filesystem, then we can observe the differences simply by comparing the two, and can easily self-restore the file we lost. But note that of course this is for individual file recovery -- if we wanted to really travel back in time to when we took the filesystem snapshot, then we'd also want to have newly created files disappear, for example. So let's suppose we encountered some tragedy and wanted to completely roll back all changes from the snapshot. For that, we use the "zfs rollback" command and specify the snapshot in question. Again, this is fast, as all it has to do is restore the original block references, and just like that, we're back in time 2 minutes ago before we lost data and before we created these files in this directory here. And if we don't want the snapshot any longer, then we can destroy it --- All right, let's take a look at what we've observed here in this video. We saw that snapshots are really fast and cheap, so if your filesystem supports them, go right ahead and use them. Snapshots are really quite useful - since they allow you to give your users the ability to restore files on their own without involving you at all. The concept of snapshots can then be used - for various purposes. We've seen - the example of consumer backup solutions like macOS Time Machine (although that's not a true filesystem snapshot); - the example of NetApp's WAFL, and - just now, our example of using ZFS snapshots. As mentioned earlier, snapshots do remain bound to the filesystem from which they were taken, so all by themselves they do not offer a comprehensive backup and recovery strategy, but - you should combine them with some of the other approaches we discussed. Whatever solution you come up with, though, please do make sure that - it is an automated solution that runs regularly and frequently, that meets your Recovery Point Objective, and, most importantly, that you verify that it works on a regular basis. So why don't you go ahead right now and check on your backups. You do have working backups, don't you? What's that? If I...? Uhm... gotta go. See you next time - cheers!