Hello, and welcome back to CS615 System Administration! As we're moving on to Week 2, where we are going to be talking about Storage Models and Filesystems, I thought it might be useful to start out with a quick exercise to warm up. For this, we're going to try to follow every SysAdmin's favorite train of thought: "Hmm, I wonder what happens if..." I recommend that you use this video to follow along and run the same commands, but also to then think beyond just what you've seen and explore more. So make sure to open a terminal and play along, pausing the video as we run through this episode of Full House, entitled "No space left on device". Let's begin by starting a new screen session. If you're not familiar with 'screen' or 'tmux', I recommend you check out the Tool Tip video I've linked from the course website. But let's continue... We log in on linux-lab and take a look at how much disk space we have on the local filesystem. Looks like we have plenty of space - 11 gigs. Let's create a large file to use of all that space using this 'dd' command shown here. This is going to take a few seconds, but eventually, you're going to run out space. The file we created is 11 gig large, so not surprisingly, our filesystem is now full, preventing us from writing any data or creating any new files. But this can have other negative side-effects that may be quite a bit less obvious. With the filesystem now completely full, let's try to log in again: Again, we run 'ssh' and... Uhoh, we get some error messages. But... we're still logged in, aren't we? Let's try to run a few commands. Uhm... what? Nothing's working! Let's check what's going on in the other shell where we logged in in the beginning. First, let's take a look at the '/dev/stdin', '/dev/stdout', and '/dev/stderr' devices. They all point into the procfs, a pseudo-filesystem representing file descriptors. Next, let's see what processes we're running. Well, _this_ shell appears to be process ID 20115; the other shell running must then be the one that's so broken, PID 20157. So we can look at the file descriptors associated with that PID under /proc/20157/fd. Here, we find a few open file descriptors, with file descriptor 0 and 2 -- stdin and stderr -- pointing to the pseudo terminal pts/36, but we notice the absence of a valid file descriptor 1 -- stdout! This explains why the commands we ran didn't produce any output -- stdout was apparently closed! As it turns out, my login shell -- ksh, the so-called Korn shell -- tries to open a few files and redirect I/O when you log in, but if the file system is full, it can't do that, and we end up in an impressively busted state. Let's try another shell instead: The 'ssh' command lets you specify a command to run when you connect, so let's run "bash". Hmmm... now what? No error, but... are we logged in? Turns out we are! When we run a command directly via 'ssh', then we don't get a pseudo-terminal allocated, but the command -- bash in this case -- is still connected to stdin and stdout of the ssh command. So we can run commends: Look, there's that big file we created. But we'd be more comfortable having a "normal" shell when we log in, so let's ask 'ssh' to allocate a pseudo-terminal for us, by specifying the '-t' flag. There, that looks more normal, doesn't it? So apparently 'bash' has no problem with the full filesystem at login time, and we can now remove that big file that's using up all the disk space and caused the problems. Ok, with that cleaned up, we can exit here. Now the disk space that was taken up by the file /tmp/big has been made available again. ...but our connection in the broken shell remains broken, because just removing the file does not miraculously allocate a proper file descriptor for the shell we have sitting here. So let's exit this shell and reconnect. There, everything's perfectly normal again. Now if you try to recreate this scenario, please do make sure to remove any large files you create, so that the system does not get negatively impacted by your playing around with this exercise. You will likely see slightly different behavior, depending on whether you have bash as your login shell, but I specifically wanted to illustrate that the act of filling up the file system can have an effect on seemingly unrelated processes with possibly confusing or inconsistent results: one user might complain, while another user might not notice anything odd right away. Alright, so we've seen what happens when we fill up all our disk space with one giant file. Let's try something else: Again, let's take a look at how much disk space we have available here. Let's create a directory under /tmp and specify a pathname for a large file we want to work with. If we look at the output of the 'df' command, we can extract _exactly_ how much disk space is available from the fourth field. Now, we'll use the 'truncate' command to create a file of a specific size. Specifically, we'll try to create a file that's many times larger than the available disk space, so we copy this command here and append a few zeros. There. Wait, what? We were able to create a file that's thousands time larger than the available disk space? How does that work? Look at the size of this file! That's impossible! And 'df' even tells us that we have plenty of room to spare! This makes _no_ sense. What does 'du' tell us about this file? Oooookay.... 'du' tells us this file uses zero blocks. What does 'stat' say? 'stat' shows us the file _size_ as being really large, but still using zero blocks. So what on earth is going on here? The file we created clearly appears to both have a huge size and use no disk space. This is because it was created by simply setting the file size, but not by writing any data to it. It's a so-called "sparse" file. Not all file systems support sparse files, but this one does. What happens when we copy this file? Ok, that seems to work - now we have two of these weirdo files here, both seemingly huge, yet using no disk space. This is because the 'cp' command is smart: it detects that this is a sparse file and the creates a true copy of this file. But if we try to _read_ the file using 'cat' and then redirect the output to another file... ...this suddenly takes a really long time, and eventually we get back the error that we're out of disk space! And now let's look at these two files! The second file now uses lots of blocks on disk, so many that it filled up the file system. This is because when the kernel tries to read a sparse file, it notices that there's no data there, so it supplies NULL bytes instead, and the reading process will then see NULL bytes, which in this case it writes out to the second file. Weird, huh? As you can tell, our disk is now actually full again, and only after we remove the files do we get back our disk space. So this is an illustration of surprising behavior that may depend on the filesystem in question. If you run these commands on some other operating system or using a different file system, you may get a different result. But alright, let's move on... We've now seen how our system behaves if you create really large files: when you actually write data, it obviously uses up the disk space, but you are also able to create a file that _looks_ like it's huge but doesn't take up disk space. But what happens if instead of creating one huge file you create lots and lots of small files? Let's give it a try: Here, we use the 'df -i' command to inspect the inode usage of the filesystem. We'll go into much more detail in a future video segment about what exactly an inode is, but for now suffice it to ѕay that each file is associated with an inode. So in this case, we have 923,261 available free inodes. If we create a directory... ...then we have used up one inode and if we create one file, then we have used up another inode. Likewise, if we remove the file, then we get back the inode. And likewise for the directory. Makes sense. So now let's see what happens if we use up all our inodes. For that, we need to create... 923,261 files. That's going to be tedious if we run individual commands, so I wrote a simple program to create new files for us. You can fetch it from our course website, and it looks like so. We create a directory, and then we loop forever, creating new files in that directory until we fail. Let's compile and run this program. Ok, so after some time, the program will fail, as we expect. It reports that it has created 923,260 files - plus that one directory under which it created the files. Note the error message: "no space left on device". Sounds like "full house" - I mean, "disk full". Let's take a look at the size of the directory. Whoops, wrong path. Over here, in /tmp. There. This directory is pretty large. Let's look at the files we created in there. Notice how even running 'ls' on it it takes a long time? That's because the directory is so large! Let's confirm how many files are found in the directory. Yep, 923,260 files. So now let's try to create a new file. Nope, no can do. No space left on device. But I can move a file from the directory into another one! This is because moving a file does not create a new file. Again, we'll get into the details of that in a future video. But so the file we created is zero bytes in size. And so are the other 923,259 files. But how can we be out of disk space, then? Didn't we have 11 gigs of space available? Well, turns out we _aren't_ out of disk space. We actually _can_ still write data to the disk. _So long as we write it to an existing file._ We are out of inodes, meaning we can't create new files, but we certainly still have disk space available, as shown here. In fact, we can easily write a gigabyte of data to the existing file. See, no problem. But let's clean up and remove the almost one million files we created here. Note again that this takes quite some time. Let's suspend the process for a second and check if we've made any progress. Yep, looks like we've freed up almost 200 thousand inodes already. Let's continue... Oh, also let's take a look at the directory size, while we're removing all those files. Huh, look at that: same size as when it contained almost a million files. Alright, more on that later, too, let's continue for now. There we go. Back to where we started. Alright, did you run the same examples? Did you play around with what happens when we use up disk space or inodes? Here are the key things this warm-up exercise was intended to illustrate: 1) Running out of disk space can lead to odd side effects. We saw that when _some_ users were unable to log in on the system because the disk was full, but other users had no problems. 2) File sizes are not always what they seem to be. There's a difference between a file _size_ and how many blocks of disk space a file uses, and this difference can be significant, depending on the file and file system. 3) Error messages aren't always what they seem to be! When we ran out of inodes, the error message was "no space left on device", but that's misleading. If you saw this error message and then ran 'df', it would have shown you many gigabytes of free disk space. It's important to know the different error scenarios that could lead to this error message when you're troubleshooting your systems. and 4) -- and this is something we'll get back to time and time again: All resources are finite. It may seem that nowadays we have a lot of disk space, but if it's possible to exhaust it, some process, somehow, will. A common unix file system may seem to have a near infinite number of files it can store, but the file system is restricted by the number of available inodes, and those can be used up, too. We'll see many other examples throughout the semester. We'll also revisit a lot of what we touched upon here in the next couple of videos, but I hope that this warm-up exercise helped to get you thinking about filesystems and the resources we're managing in this fashion. Make sure to check out the links in the ѕlides, and read up on the suggested reading material. Next time, we'll talk about storage models, and I hope that you will keep the limitations we've seen here in the back of your mind when we do. Until then, thanks for watching - cheers!