Hello, and welcome back to CS615 System Administration! This is week 4, segment 3, and after we just finished up installing an operating system on our disks, we realized that there is more to that than just extracting a few tarballs into a filesystem. In particular, we noted that for our system to come up and be useful, we need to often times install additional software, bringing us back to the earlier discussion of what types of software there is, and whether or not we can easily say "yes, this belongs to the operating system" and "Nope, that's an add-on". And as we install such software, we may end up with our attempt of the manual "./configure; make; make install" invocation being foiled by an unmet dependency, sending us on a wild goose chase of downloading dependencies of dependencies in order to finally get our add-on software installed. The conclusion? There has to be a better way. Enter Package Management, which we'll discuss in this video. But before we get to that, let's first take a look at a few pieces of software and attempt to define their place within the software stack. --- Here I've listed a few examples with a table to let you fill in whether you think that each entry belongs into the category of "system software" or otherwise inherently being part of the operating system, or whether we'd consider the entry more of an add-on, provided by some third party. Why don't you try to pencil in what you think fits where. Now as you do that, you will probably need to take a step back and ask yourself "wait a second, just what exactly _is_ the operating system"? We had earlier said that clearly the kernel itself is not equivalent to an operating system, but that yes, you do kinda need one of those. An OS without a kernel is... not really useful. --- Ok, so a kernel is not add-on software. After all, this is how GNU/Linux came to be a useful operating system: all the other bits were already available, but a kernel was missing until Linus Torvalds released his "linux" kernel under the GNU Public License or GPL. But then if we do need a kernel, what else do we need to have an operating system? In a modularized system, you can extend the functionality of the kernel via loadable kernel modules, and for certain pieces of hardware, you do need these special drivers. So those ought to be part of the OS as well. --- But you also sometimes need drivers that are supplied to you by the hardware manufacturer. Of course the company that makes, say, your RAID controller, needs to make sure their driver works with your specific kernel version, so there is still a tight integration, but as it may be proprietary or in either case separate from the OS, we have to put a checkmark over here under '3rd party'. Now what about firmware? Well, perhaps we should specify what we mean by "firmware": the BIOS, sure, but also the actually firmware running _on_ the RAID controller, or your NIC. Now those... --- don't need to be integrated with the OS, and while there may be _tools_ provided by the OS to manipulate different types of firmware, that would generally be considered to be an "add-on". Next up: libc, the standard C library. Now _that_ --- is something that really needs to be quite well integrated with the kernel and all the other applications and services offered by your OS. Without this library, a lot of things will not work, and upgrading this library has the potential to break just about everything on your system, so clearly a check in the OS column. Now while it's conceivable to install a separate C-library implementation as an add-on, I think that would be rare, so we put an X into the "add-on" column here. What about the shell? --- Hah, trick question. Now of course just about any Unix operating systems comes with a shell, but you also might be aware that there are all sorts of different shells you could install on top, if you liked. So this one's a bit weird: the POSIX standard requires _a_ shell to be present -- a bourne compatible shell, no less -- but you could conceivably also build an OS without a shell, but ok, let's put checkmarks in both columns here. How about ssh, both the client and the server? In a way, that's similar to a shell: --- just about every Unix version around ships with ssh -- and basically all ship the same version of ssh, OpenSSH -- but it's also possible to install it as an add-on or to use different implementations. It's rare, but hey. Ok, next: a mail server. --- Again, most Unix systems come with a mail server out of the box, which is a bit weird, because nowadays there really is obvious reason why your standalone system or server should come with a full SMTP server, but this is an example of the legacy of the Unix system being primarily a very generic server OS for multiple users. And of course you can have multiple implementations, so it's also add-on software. How about an http server, then? --- Well, you'd think it'd be the same as a mail server, but traditionally an http server has _not_ been included in the Unix OS. However, increasingly more and more versions have _added_ an http server to the base system, simply because it's become such a common use case. Now the http servers included in many base operating systems might be good enough for low-performance low-traffic sites, but if you want to serve some serious traffic, you'd likely install another version. Databases, then --- are fortunately not yet included in every OS that you install and remain a primarily stand-alone add-on application. And finally, I listed 'python' here, as a stand-in for just about any programming language interpreter, compiler, or environment. And that's another funny one, because --- obviously it's an add-on: you can download and install python on any system, and even have multiple versions installed (although the resulting dependencies for your software can get really annoying here), but also many operating systems do ship with it out of the base OS. This goes back to the standard Unix versions, too, which shipped with a C compiler, even though nowadays it really is not needed for most systems. But the python interpreter may also be a _required_ component of the OS, if some of the tools the OS provides are _written_ in python, so again, it illustrates just how difficult it is to draw the distinctions here. But... seeing how most things are third-party or add-ons anyway, why do we even care? One of the reasons is that we want to understand the dependencies between different components. A coherent operating system that comprises specific core components and ships them as one coherent unit is easier to manage, to use, or to upgrade than a hodgepodge of independent packages that just happen to be deployed on the same system. --- And to resolve these dependencies, we often times use a so-called Package Manager. So which of these components here would commonly be managed via a package manager? --- Welcome to the wonderful world of system administration, where the correct answer, as so often, is "it depends". It depends on the operating system in question, as well as on the flavor of the operating system. Some operating systems do not use a package manager for the core OS and only use a package manager for add-on software, while other systems use the same package manager for _all_ components of the system, including the individual kernel modules or the kernel itself. --- But so let's look at how we manage software from a different perspective. Here, at the bottom of the stack we're about to build sits the hardware. --- On top of that sits the firmware, managing and interacting with the hardware in some capacity. --- Next is the kernel, managing the hardware. --- We then have some "system software" sitting on top of the kernel or otherwise being tightly coupled with it: device drivers, kernel modules, core libraries etc. --- But we also have a bunch of utilities and applications sitting here, such as the shell, all the common unix tools we're used to -- sed, awk, grep, etc. --- These interact with each other, but may also be used to control some of the firmware on a lower layer. --- Finally, on top of all that sits what we call "add-on software". Things like a web browser, or a web server, or a database, or a different programming language environment, the AWS utilities, and so on, and so on. Now at some point -- and that point is a bit arbitrary, as we've seen -- --- we declare that some of these components are part of the Operating System, and others are not. It really depends on the operating system provider to define what they believe is part of the OS and what is not. But even for those components that are _not_ part of the OS, we are _still_ looking for a sane way of managing it and all the dependencies between them, so --- we find that "package management" spans all of these layers. So we see that even if our operating system uses a package manager, it may want to use that to manage add-on software. But as you soon enough find out once you've been running a system for longer than a week, there is _always_ some software that's not available via your preferred package manager, and you'll have to be careful how to manage _that_ software. And so again, the distinction between what is part of the OS and what is "add-on" becomes more meaningful: --- For example, when you - upgrade your OS, will that lead to a system that's incompatible with your add-on software? - The add-on software might install its configuration files in a location that you considered static when designing your partition schema, - or the software you're adding may conflict with some of the core OS components, or you end up with multiple versions of the same software; - you may have to adjust your system startup scripts to launch the third-party service at system boot - or generally keep track of where this software is located, - what dependencies it has, and how you tell it that you meet the requirements, meaning - whether you can install the software via a package manager or by hand. And it's often times easy to say "well, I _always_ and _only_ install _all_ my software via the package manager", but - sometimes you don't have a choice, because the software is available to you only as an opaque blob that you don't have much control over. So as a general recommendation, it's usually the case --- that you simply don't have a choice. Software provided to you in source form needs to be packaged up and installed to ensure you can express dependencies correctly, and even proprietary software can and should be turned into a package for your environment. After all, package management is really just a way of delivering files to a system while expressing requirements and dependencies to yield a final state. We'll get back to this concept of asserting state, of defining the outcome ("package X is installed") rather than focusing on how to do that later in the semester, though. Anyway, since package managers are a critical tool in any SysAdmin's toolchest, let's look at a few examples of the functionality they provide: --- Here we are on the Stevens linux-lab, which happens to be a Debian based Linux variant, so we can use the 'dpkg' tools to manage our software. By running 'dpkg -l' we can see all the software that is installed on the system, which turns out to be quite a bit. 1319 packages in total. [pause] But this is quite useful now, isn't it? Being able to list all the software that's installed together with the version of the package is rather important. So the first excellent thing a package manager provides is thus: - a software inventory hooray! - Ok, next, for any given package that we have here, we can [continue] list the contents of the package. For example [pause] the tcpdump package contains all of these files. Not only that, we can also go the other way around and ask the package manager: "hey, I've got this file here - what package does this belong to?" [continue] [pause after 'dpkg -L dnsutils'] So we get here a rather convenient method of looking up which file belongs to what package, forward and reverse. So this is the second excellent thing our package manager provides: - a file listing and lookup tool yay - [continue] But unfortunately... this only works if the software in question was installed using the package manager. So for consistency, you _really_ want to make sure you go through the trouble of packaging up the software you add here. Just think, suppose you want to upgrade python on this host, but let's say the AWS tools require a different version and will break. Since the AWS tools were installed outside the package manager, 'dpkg' can't know this, and will happily let you upgrade python and break AWS for all of us. So that's really 'no bueno'. So take note: - When you use a package manager, use it consistently, or you lose many of its benefits. And there are several, besides just having a convenient inventory. To illustrate another advantage of having such an inventory, let's --- take a look at a different system. Here, we have a Fedora instance, which uses the RedHat Package Manager or rpm tool. Like before, we can get the listing of all installed packages... ...344 in this case... ...and get the contents of a given package like so. Now suppose we change one of the files of this package... let's say... /etc/pam.d/sudo to simulate a system compromise. That is, if an attacker gained write access to this file, they could change how authentication for 'sudo' is done - not something we want! So now the file looks like this. Let's also change the owner and group here. [pause] Now how would we go about finding out whether or not a file like this has been manipulated? Remember that our package manager provides a full inventory of all files, but it has even more information that that, doesn't it? In order to install all the files, the package manager also must know the ownership and permissions, right? And it probably also knows what the file size is, and even what the contents should be. So if we have all this information in the package manager's database, then we should be able to look this up and _verify_ the integrity of a given package. And indeed, the rpm command offers an option for this: [continue] If you run 'rpm -V', you get output like this: [pause] This means that the 'rpm' command detected that the _size_, the MD5 checksum, the user, the owner, and the last modified time of the file differs from what it recorded at installation time. How cool is that? - We get an implicit intrusion detection kit here via the package manager! Nice. - Now let's change back everything to normal... there. Well, ok, the last modified timestamp still shows, but we know that everything else is in order. We can of course also run the validation check across _all_ packages, and you'll find that there are some changes here. This may be entirely normal, since you may change the configuration files for several packages, logfiles will necessarily change during normal operation etc. This hints at the fact that in order to properly monitor a system you need to actually understand the context and know what the normal status is, so that you can identity whether any reported changes are expected or not. We'll get back to _that_ discussion later in the semester when we talk more generally about system monitoring. --- But all right, we get an intrusion detection mechanism, which is quite neat, but let's see what else we can do when we have a proper inventory of all packages on our system. In this example, we'll get back to a NetBSD system, where we use the 'pkg*' tools to manage add-on software from the 'pkgsrc' system. As before, we get the listing of the packages... 177 [pause] in this case, but what I want to do now is figure out whether or not any of these packages have any known vulnerabilities. After all, if I have an inventory of the packages on my system, then I should be able to automatically compare that to a list of known vulnerabilities and see what things I have to patch. So that's the next really neat feature provided via a good package manager: - Implicit analysis of which packages need to be patched by analysis of known vulnerabilities! - All right, for that, we do need to get this list of known vulnerabilities, and in this case, it is provided by the NetBSD project and can be retrieved via the 'pkg_admin' utility: [continue] Ok, let's take a look at what type of file that is... gzip compressed text, it seems. Let's have a look at the data: Looks like a plain text file, easy to read and parse. It contains a listing of package name and version, mapping that to the type of vulnerability and a URL with more information about the specific issue. As you can tell, there's quite a few entries here. So let's try to run an audit. We use the 'pkg_admin' tool for this and... wait, what's this? It tells us something about not being able to verify a signature... Let's look at the file one more time: We note that right here, at the top, it tells us that this information is cryptographically signed using PGP, which is really rather useful when you want to be assured that the data is authentic and unmodified. But we can't verify this signature because we're missing the key with which this file was signed. So let's fetch the key from the pkgsrc website and import it. There, now we can see if we can verify the data. Ok, we got a valid signature now. That is, we can verify that the file we have was the one that the pkgsrc security team published. But we haven't verified this key, meaning the gpg tool does not know whether this key, with which the file was indeed signed, is trustworthy. Let's change that: since we fetched the key from the website via https, and we know that the location is indeed that which the pkgsrc team publishes its key, we can sign it: All right, there we go. Let's try again. Aha, now we have a good signature that we trust. Great! So _now_ we can run the pkg audit and... ...here we go. Whoof, that's quite a few issues the tool identified. Well, welcome to System Administration, where just about any package older than a few weeks is likely to have a security vulnerability. Which is why it's so useful to have a tool that can tell us what they are, so that we can then decide whether we need to address them. Ok, so let's see... bash, over here, appears to have a privilege escalation vulnerability, so we probably want to fix that. So we can use our package tools to pull in an updated version. The tool calculates what, if any, dependencies it needs and then downloads the updated package, removes the old version and installs the new version. So now when we run the pkg audit... we note that the vulnerability for bash no longer shows up here. Hooray! --- Okay, so let's break here before we move on to further discussions involving package management and some of the more hairy security aspects involved therein. What have we covered in this video? We've seen that - What comprises an OS, what is 'System' vs. Add-on is not an obvious distinction. - We've seen that some dependencies are more tightly coupled (e.g., kernel + libc) than others, meaning updating them requires coordination and compatibility. - For others, however, there are multiple options, and what is grouped together as the "operating system" really largely depends on the provider. But either way, we've illustrated that all of this software could -- and should -- be managed in some coherent fashion, and a - good package manager provides a list of excellent features, as we illustrated by example of the Debian 'dpkg', RedHat-, Fedora- or CentOS's rpms, or via NetBSD's pkgsrc tools. Those features include: - the ability to easily install software - and have the package manager automatically resolve the various dependencies amongst the packages. After installation, and through consistent use of the package manager, we thus gain - a complete package and file inventory, which we can then use to build - package and file integrity checks as well as a comprehensive mechanism to - check your software for known vulnerabilities. Now all of these features apply to OS as well as "add-on" software, so long as it's packaged consistently, so - you probably want to make sure that you have integrated your package manager with the OS, meaning the package manager itself -- another piece of software in our stack, really -- becomes _part of the OS_. Now we aren't quite done yet with the topic of package management, and in our next video we'll continue with a closer look at language specific packages as well as a range of security related concepts in this area, but before we move on to that discussion, --- let me leave you with a few exercises to consider: To begin with, - it's useful to get some practice in _building_ packages, since, as I mentioned earlier, as a SysAdmin you often times have to do just that for the various software components that are not already packaged. So why don't you go and find one of your favorite tools, check if it's available as a package for your preferred Unix version, and if not.... change that. I'm sure the upstream project will be happy to accept your contribution! - Another useful exercise is to compare the different package managers, similarly to how we've run through some examples on different platforms in this video. Identify the basic commands to perform the most important tasks in package management. Compiling a clear cheat-sheet like this can be invaluable for your future efforts when you switch from one Unix version to another. - Next, think about how we can manage firmware? How does your preferred OS handle this? - Research the concept of "reproducible builds" and think about how that relates to our discussion here. That is, if you run the same package manager commands on different systems, will you always, necessarily, get the same result? What differences might occur? - Then, think about how the management of packages overlaps or intersects with the configuration of the software. Can you use a package manager to assert the desired state of a system down to its specific configuration? We'll get back to this discussion towards the end of the semester, but it's still a good exercise for you to think about this already. - And finally, here's a more detailed exercise you may want to run through. It includes a comparison of installing software by hand and via a package manager and will likely give you a number of important insights into what we have discussed here in this video as well as in the next. I know, this looks like a lot of work, but as I keep saying: it's up to you how much you get out of this class, and if you're interested in System Administration, these exercises can really deepen your understanding and in addition be practically useful to you. With that in mind, thanks for watching and I'll see you next time. Cheers!