Hello, and welcome back to CS615 System Administration! This is week 3, segment 2. In our last video, we spent a fair amount of time looking at the boot sector and the master boot record, in which we defined the BIOS partition. In the video before, we had discussed the filesystem or OS partition, on which a filesystem would then be created. But before we go ahead and do that, let's first think a bit about what it means to be a filesystem. What is the fundamental goal of a filesystem? What is its purpose? --- As we all know, the primary function of a filesystem is to store cat pictures, right? So how do we do that? Well, we can simply try to stash all the cat photos on the disk, one next the other. - That doesn't sound too complicated, so let's give it a try. --- Here, we again start with our default NetBSD instance with an attached volume, as before. We'll now pretend to be a filesystem for the disk xbd1, and we have 1 gig of space to manage. So let's write our first cat picture to the disk. We use 'printf' and 'dd' to write the bytes right to the beginning of the disk. If we now look at the data on the disk, we see that our cat picture consists of four bytes -- f0 9f 98 b8. We successfully stored our first file! To read this file from disk, we simply read the first four bytes. There we go, there's our cat! Ok, now it's time to store our second cat photo. Since we don't want to waste space, we'll write it right after the first photo. And we can see here our 8 bytes of data. If we read all of those bytes, we get back both of our "files', but we can of course also fetch them from disk one by one. Ok, so far, so good. Now let's create our third cat photo: two cats in snow! We'll write this data right after the second picture. If we read all the data in a single 512 byte read, we get all three photos. ...and of course we can grab the second photo individually... but if we're trying to grab the third photo... ...we realize we need to know how large it is. Just assuming 4 bytes per photo will not work, so we kinda need to count... here's our first pic, there's our second pic, ... so... nope. There, 14 bytes. So this illustrates a problem with our very naive approach: in order to be able to retrieve files individually, we need to know where they start and where they end. That is, we probably want to assign specific areas for each file. --- Let's set aside one are on the disk for each cat photo. - And let's make sure that we only put one cat into each bucket. Then we'll always know exactly where each cat pic starts. --- So let's illustrate _that_ approach. Let's zero out our disk... ...and start from scratch. Again, we can write our first cat photo directly to the beginning of the disk. Now our second cat photo, however, we'll write to offset 512. That is, we declare that each of our buckets will be 512 bytes in size, so we always know exactly where the next photo will start: at offsets incrementing in 512 bytes. Now we can easily retrieve the file by simply reading a full 512 bytes, and by adjusting the offset. So in hex, this what the data looks like on our disk now. First bytes right here, second bytes at offset 512, and the bytes for the third file over here. But note that of course now we're wasting a fair bit of space: our photos are less than 512 bytes in size, but we have all this empty space here between the 512 byte offsets. In addition, to read the files, we actually did read a full 512 bytes, even though we only needed 4 bytes for the first two files, and 14 bytes for the third file. So maybe we should add some meta data to out files. --- So let's start over once more. This time, we're prefixing the file data with the file number and the size of the file in bytes. That way, we can later easily determine exactly how many bytes we need to read. Here's our second file, also four bytes in size. And here's our third file, 14 bytes in size. So now on disk our files look like so. The file data for each file is prefixed with a one-byte field indicating the file number and a one-byte field for the size, followed by the actual contents of the file. And so we begin to associate _metadata_ with our files. The file number and size in this case, but --- of course we will probably want to add additional file attributes, additional information about our cat photos. --- So we can define a new format, where we specify that we use 16 bytes of metadata: - a two byte identifier - 4 bytes for permissions, such as user/group/other read/write/execute as we're used to from our Unix filesystem - a single byte representing the numeric owner of the file - one byte for the group - and then we reserve four bytes for the size. Four bytes seemed a bit better than just one byte, since with just one byte the maximum file size we could represent would be 256 bytes. And then we decide that it might be even better to separate the metadata from the file data altogether, since the 16 bytes here will be consistent for every file, but the file data will be variable in size. - So we decide to place the metadata at the beginning of the disk and the actual file data we'll write somewhere else on the disk. Now in order to be able to map that data to the file in question, we then include the offset here as part of the metadata. Now this starts to look a bit more like a filesystem, doesn't it? Let's see what our simple filesystem would look like in our simulation: --- So let's begin: We write two bytes file number, 0 1 in this case, followed by four bytes of permissions, 0744 here, followed by the userid and group id -- both zero in this case followed by... four bytes file size, which in this case is just four followed by the offset of where we find the actual data. So this is just the metadata for the first file now, and with that committed to disk, we can now write the actual file data to the offset we had specified in the meta data. There. Our second file now gets its metadata... file id two and let's say file owner and group id one and because we know the offset of the previous file, we can now write the file data directly after that, so to offset 1004 hex and appending the metadata directly after the metadata of the first file And here come the file contents to offset 4100. So if we now look at the first 32 bytes on the disk, we should find the metadata for the two files. File 1 over here and file 2 over here. Having determined the offset of the file from the metadata, we can then retrieve the file contents. And there you have it - a really simple file system, with a separation of metadata from file data. And this is, to some degree and at least conceptually, how filesystems work. But before we take a break, here's just one more thing I want to show you, and that's an illustration that a storage _archive_ format really is rather similar to a filesystem: --- Here, consider this directory, /tmp. Let's create a few normal files. When we run 'ls', we can see all the metadata of the files: the inode numbers as well as the permissions and ownerships etc. And of course we can display the file contents of our cats using... well, cat. Excellent. So now we can use the 'tar' utility to create an archive of these files with all their metadata, but instead of writing this archive to a file as we might usually do, we'll simply write it directly to the raw disk. And this is what _that_ looks like when we inspect the bytes in hex. And here we see the structure of the archive, and I think that you will find that this looks somewhat similar to the format we had created for our trivial filesystem. That is, at certain offsets we find metadata associated with the file, including ownership and file names before we then over here find the actual file data, our hex bytes f0 9f 98 b8. And here's the second file... ...and the third file. And just like we could write the data to the raw disk, so can we read the data from the disk and pipe it directly into the tar utility, which can then extract the files with all its metadata. So this illustrates that you don't need a filesystem on the disk device so long as you write the data in a format that you understand and have defined. Or rather, that there is no meaningful distinction between an archive file and a filesystem snapshot. --- But ok, let's recap before we take a break here. In our next video, we will no longer have to pretend to be a file system after what we just covered here. But we did learn that - well, the filesystem is responsible for storing the data on the disk, obviously and that, in order - to read or write files, we need to know where on the disk we're writing the actual data. But we also know that we need some metadata, and this metadata - may actually be stored separately from the regular file data. And with that, the bottom line is that on - a high level, a filesystem really just describes a data storage format. Now of course there are other considerations, especially with respect to efficiency, but roughly speaking, we now know how the filesystem works. - In our next video, we'll then look at the traditional UNIX filesystem, or UFS, and how that filesystem implements some of the things we discussed here in this video. Before we go on to that segment, though, do play around with your virtual disks and your cat photos as well as to replay the "tar as a filesystem format" example to make sure you understood these concepts, ok? Great, until the next time, and thanks for watching - cheers!