Hello, and welcome back to CS631 "Advanced Programming in the UNIX Environment". In our last segment, we looked at creating links, both hard links and symbolic links. We discussed how hard links, filenames, are merely entries in a directory, so now let's take a look at how we create directories. --- It's really quite straight forward -- you simply call mkdir(2), which - will create a new, empty directory containing only the necessary entries 'dot' and 'dot dot'. - The permissions on the new directory are similar as when we create a new file, as specified by the mode_t argument, but modified by the process's current umask, as we illustrated in a previous video segment. - Similarly, ownership of the new directory follows the same semantics as we also previously discussed and based on the version of Unix in question. --- Likewise, removing a directory is not very complicated, either: call rmdir(2). - If the directory is empty -- that is, it only contains 'dot' and 'dot dot' -- then it is removed and the st_nlink count is decremented. If that link count is now 0 - and now other process has the directory open, then the directory is removed. --- Now with directories, there's also the issue of a process having it open as a current working directory, and removing such a directory can lead to confusing situations: Here, we create a directory and change into it then, from another terminal we yank the directory under our feet which works just fine, since the directory is empty ...but our existing process still thinks that /tmp/dir is its current working directory but if we try to list its contents -- that is, we try to open it and iterate over the entries, that fails and creating a new file in the directory can't work either, since the directory doesn't really exist any more. If we re-create the directory, we won't magically be back in it -- we'll have created a brand-new directory, and '..' still can't work, since '..' is an entry in the current directory - which doesn't exist! An absolute path, however, does work and all operations now behave as expected. --- Reading directories is something that should look familiar from our simple ls(1) clone from our very first lecture. First, we call opendir(2) to open a new handle on a the directory, then we iterate over the entries found in the directory by calling readdir(2) repeatedly, and finally call closedir(2) when we're done. --- The system calls look like so: One thing to note here is that the order in which we get entries from the directory is opaque to us: the entries are not sorted in any way, or if they are, in a filesystem implementation dependent way that we cannot rely on. --- Ok, so opening directories and listing its contents requires read permissions on the directory, but accessing any files inside a directory will, as we discussed in a previous segment, require _exec_ or search permission on the directory. - Reading a directory should always be done using readdir(2) or getdents(2), since the implementation of the directory entries is filesystem dependent. The structures involved are documented in the 'dirent' manual page. - when you open a directory, you get back a 'DIR' handle, which represents the directory as a stream, meaning the entries are returned to you in an ordered fashion, but you must not make the assumption that this happens in an order you can predict, such as alphabetic, or by directory entry creation. The ordering inside the directory is opaque to you. - Whether or not an open directory counts towards your open file handle or file descriptor limit is not something you can rely on; the implementations and standards are unclear or variable here. See the COMPATIBILITY section of the opendir(2) manual page for more information. - Finally, while you _can_ perform filesystem hierarchy traversal with a careful sequence of opendir(2), readdir(2) and closedir(2), this gets hairy very quickly. Recall that we do not get the directory entries in any order, so sorting them and recursing into sub directories -- both tasks that are reasonably common when operating on filesystem hierarchies -- requires you to juggle complexities that we'd rather outsource to a library. So instead of doing this yourself, I'd recommended that you look at the fts(3) library functions, especially for your 'ls(1)' midterm project. As you might have guessed, the fts(3) library functions do in fact call the opendir(2) / readdir(2) syscalls, but provide a lot of additional convenience. These file hierarchy traversal functions are not guaranteed to be available on all Unix versions, but fortunately for you, they _are_ available on your target platforms, so please give the manual pages a read and use that to handle recursive listing of directories. --- Here, let's take a quick look at some permission edge cases when handling directories: Here's a file inside a directory. We remove execute permissions from the directory, but we are still able to list its contents. That is, opening the directory and calling readdir(2) works without exec permissions. However, accessing a file inside of the directory fails, since we do not have permissions to exec / search the directory. If we change the permissions to allow exec and remove read permissions then we can access the file inside the directory (despite not being able to "read" the directory), but listing the directory contents fails. Perhaps unexpectedly, removing the directory fails, since we are unable to open it to see if there are any files in it. With the flipped permissions, we _can_ open the directory, see that there's a file in it, but we can't remove that file because we can't exec the directory. Since we can't remove the file, the directory won't be empty, so we can't remove it. Weird, huh? In other words, to recursively remove a directory, we do need both read _and_ exec permissions. Fun. --- Ok, we've already seen that we may have an open file handle on a directory after we changed into it, which introduces the concept of a "current working directory", which a bit earlier in the semester we had mentioned every process has. So to get the current working directory, you can call the 'getcwd' syscall. This is done by e.g., the 'pwd' command as well as -- as a builtin -- in most shells a special shell variable. --- We also just saw how to change the current working directory -- via the cd(1) command. Now how does that work? To change the current working directory of the current process, you call chdir(2). As we saw earlier, you need to have exec permissions on the directory in question, otherwise you can't change into it. Note, though, that the current working directory is set on a per-process basis. That is, if you recall our simple shell from lecture 01, when we execute commands in a shell, we generally fork a new process, then exec the binary, and then return. But this has some interesting implications on commands like "cd"... --- Here, let's implement 'cd(1)'. Not particularly difficult, right? Let's see if it works: Ok, changing the current working directory into /tmp seems to work. Let's confirm via pwd(1). Wait, what's that? Why are we still in the directory we started out in? Let's try something else: dot dot Seems to work. Except... it didn't. But we _are_ able to change directories when running 'cd', aren't we? Here, 'cd /tmp' - done. No problem. [pause] What's going on here? Well, as explained a minute ago, when we run our program 'a.out', we begin in our current working directory "/home/jschauma/04". then our shell will fork a new process. The new process will have a current working directory of '/home/jschauma/04'. Then that process calls chdir(2), and the new current working directory will be "/tmp". Then our program exits, and our parent process, which never changed it's current working directory, prints our command prompt again, where we then run 'pwd', which reports the current working directory. As I said, chdir(2) can only influence the current process, not a parent process. But then why does 'cd' work? [continue] Well, there isn't actually a 'cd' executable. Rather, the 'cd' command is built into the shell, meaning that the shell does _not_ fork a new process to run it; it is called within the current shell process. In fact, 'cd' _has to be_ a shell builtin for this reason. But now it gets weird -- let's look at a macOS system. There, we _do_ have a /usr/bin/cd command. Let's give that a try! Nope, still no go. Same problem. But what the hell good is this program then? Well, turns out that POSIX requires a standalone utility named 'cd' to exist, so if you want to be POSIX compliant and a trademark UNIX, then you have to provide this command -- even if it doesn't work and thus is completely useless. You _still_ have to provide a 'cd' builtin in your shell, so that you can actually change your current working directory, however, but you do ship with a useless utility to appease the standard. Ah, yes, computers. They're great, because things always make sense with computers! --- On that note, let's break here. To recap: To create a directory, call mkdir(2); to remove a directory, call rmdir(2). To traverse file system hierarchies, opendir/readdir _can_ be used, but for your midterm project you should use the fts(3) library. The current working directory is specific to the current process, and changing the current working directory can only work within the same process, which is why 'cd' must always be a shell builtin and cannot be a standalone executable. Even if your OS may ship one. In our next segment, we'll take a look directory sizes. You should find that they are a bit less obvious than you might initially think. Thanks for watching - until next time. Cheers!