Hello, and welcome back to CS615 System Administration! This is week 3, segment 1, and we're picking up where we left off in our last video, when we talked about partitions. In that video, we had discussed the _operating system specific_ partitions, for which we used tools like disklabel(8) on NetBSD, format(8) on OmniOS, and fdisk(8) on Linux. We also hinted at the difference between these partitions and so-called BIOS partitions used by the Master Boot Record or MBR, which gets us to today's topic: The boot process on a high level as well as a fairly detailed look at the MBR. So when we boot up a system, we'll likely see messages like these on the console. Here's the display of a NetBSD bootloader. As you can see, it is a "BIOS boot" loader, and it offers you an interactive menu to select different ways of booting. If we let it time out without a selection, then it will simply begin the normal boot process and boot the NetBSD kernel, which generates these green messages shown here, displaying the hardware as it initializes it. At some point then, it hands control off to the init(8) process, which we can see here when the boot process switches colors. init(8) then continues the bootstrapping process, mounting the filesystem, bringing up networking, and launching whatever daemons the system is configured to start before leaving us with the login prompt. So we can break down this process that we just saw into these individual steps: When we power on the physical server, what is the first thing that happens? Well, you may hear a series of beeps, which may indicate to you that there was a hardware failure of some sort. Or you may not, if you're lucky. That is, before anything else, the system performs a "Power-On Self Test" where it checks for basic sanity of the world: that there's some memory, a CPU, what storage devices may be present and which ones the system is configured to boot from. Once that's complete, the system will look for the so-called "first stage boot loader". Traditionally, that'd mean looking for a specific signature and executable code in the first sector of the disk it's supposed to boot off, the _boot sector_. You've probably heard the term "MBR" or Master Boot Record in this context, which is exactly what we're talking about here. This first stage boot loader may then hand off control to a second stage loader. This might be needed because as we will see in a minute the first stage boot loader is necessarily very limited in size, so that if you want to perform something more complex than a simple boot you may find some special code somewhere else to jump to before you can even find and load the kernel in question. Now after the second stage boot loader, eventually you'll load your kernel. Now in the case of a virtual machine such as on AWS, that'd be a special kernel, for example if you're using Xen for virtualization, you'd be booting the dom0 hypervisor kernel. All of these steps here are happening on the physical host, which is a worthwhile distinction, since, after _that_ kernel has booted, it would then start the respective guest domain, which goes through the same process and loads _it's_ kernel, which then, as we saw in our opening sequence, initializes the hardware it finds -- virtual hardware in this case. So these last three steps here are thus happening within the virtual host in kernel space, before the kernel hands off control to init(8) or systemd(8) or whatever is used to start the various services. If the system you're bringing up happens to be a web server, that application would then run, bind the right network port, and finally serve content. These last few steps are still executing within the virtual host, but now are running in unprivileged mode, or user space. So this is the simplified, complete boot process from power on to serving traffic. But today we only care about the first few steps, where we look at how we even get to the kernel. So when you power on a physical server, this is what you might see first. This is an example of an American Megatrends BIOS -- the Basic Input Output System -- showing a successful Power-On Self Test. Normally, this would now simply jump into the boot sector of the configured boot disk, but it may also let you configure certain aspects. That is, even though this piece of software is necessarily simple, it still allows for some flexibility. Since the BIOS often sits on a read-only memory chip on the motherboard and thus isn't quite so... well, soft, we don't even call it _software_. It's harder to change, but it's not quite hardware, so we call it... "firmware" instead. So this screen shows an example of a BIOS configuration menu, allowing us to select the boot order of devices, for example. But the BIOS dates back to the CP/M operating system from the 70s and was originally proprietary to IBM PCs, then reverse-engineered by others, but not standardized and often specialized for a specific motherboard or other hardware. The BIOS also had a few technical limitations, which we saw in our previous videos when we talked about the problems certain BIOSes had in addressing large disks. So we now have the Unified Extensible Firmware Interface, which provides a modern and standardized way of interfacing between the operating system and the underlying firmware. But as so often, it's important and useful to understand the history and how things were done in the olden days, which... it turns out is still very frequently how things are done today, such as for backwards compatibility reasons. Which is why we're now going to look not at UEFI, but instead at the traditional Master Boot Record, which the traditional BIOS expects to find in the first sector -- that is, in the first 512 bytes -- of the boot disk. That's right, the Master Boot Record is only 512 bytes in size. Within these 512 bytes we have to fit a fair bit of information as well as the code needed to boot the system, so let's see what that looks like: At the end of the first sector, we expect to find the magic bits 0x55 and 0xAA. The presence of these two bytes indicates to the BIOS that this is a valid boot sector. The 64 bytes right before hold the partition table. Now this partition table is different from the partition table we discussed in our last video: this is the BIOS partition table, which describes which OS partitions the BIOS can see. A BIOS partition table entry, then, is 16 bytes in size, so we can have exactly 4 of these partitions. That is, a disk with an MBR can have at most four BIOS partitions. But we know that our operating system may want to carve up the disk into more than just four partitions if it needs to, and that alone illustrates the need for a distinction between the BIOS- and the OS partitions. That is, the BIOS partition really only defines which parts of the disk are allocated for a given operating system; what the operating system then does with that slice of the disk is entirely up to it. Remember that in our previous video we showed that the BSD systems use the 'd' partition in their OS disklabel to reference the entire physical disk, and the 'c' partition to reference the part of the disk dedicated to this OS? This 'c' partition is effectively what we're defining here, and the OS can then create additional partitions therein. Anyway, so with 2 bytes for the magic number and 64 bytes for the partition table, we're left with 446 bytes. That is, everything needed to bootstrap the system needs to fit into these 446 bytes here. Which is why we are often talking about this being the "stage 1" boot loader. It's just enough code to bring up the system to a point where it can perhaps reach into other sectors on the first track to then find more complex code to transfer control to. That code is then known as the "stage 2" boot loader; GNU Grub is an example of a boot loader that may contain multiple stages. But back to partitions. So we have a partition table consisting of 64 measly bytes, leaving us with an even punier 16 bytes to describe the disk. How do we organize these 16 bytes? The first byte tells us whether this partition is active or not. Then we get 3 bytes to address the first sector of the disk using the Cylinder-Head-Sector addressing schema we discussed in a previous video. These three bytes are defined like so: The first byte identifies the head. At 8 bits, that means we can address at most 256 heads, which was a limitation we had previously mentioned. Now the second byte is a bit less straight-forward. Rather than giving us a whole byte for the sector, we only get 6 bits, with the first two bits of this byte being reserved for the high bits of the Cylinder address. So the largest sector number we can address is thus 64. And in the third byte, we get 8 bits for the cylinder address. Plus the two bits stashed in byte 2 above, we get 10 bits for the cylinder, meaning a max of 1024. After this, we get one byte to identify the partition type, for the operating system in question, NetBSD or Linux, for example. Then we get another 3 bytes for the CHS address of the last sector, which... ...we also chop up just like before. But now let's recall that with the limitations we have here, we couldn't address disks above a certain -- by today's standards pretty small -- limit. So we had changed our addressing schema to use logical block addressing, so ...here we get 4 bytes for the LBA of the first sector and 4 bytes for the last sector. So now we have two ways of addressing sectors. How do we get from one to the other? We can start with the LBA of the address and then determine the C, H, and S values from that using this formula... ...and then use those values here. Ok, so now that we know what these 16 bytes look like, we should be able to create a boot sector with a valid partition table entry for any disk simply by writing the required bytes to the correct offsets, right? Let's give it a try! We start out again as usual with a new NetBSD instance and then create a new volume to play around with. Let's make it 3 Gig in size this time. We wait for the instance to come up... ...and attach the volume using another shell function to save ourselves some typing. Ok, let's log in on the instance... ...and inspect the disks via dmesg(8). There's our root disk and our newly attached disk, xbd1. Let's look at the BIOS partition table of the root disk via the fdisk(8) command. We see the logical geometry of the drive as well as the BIOS geometry. The partition table shows the first partition to be of type NetBSD and active, starting at sector 2048 and with the shown cylinder-head-sector addresses. So fdisk(8) shows us all this information, but what does it actually look like? Let's use the dd(1) command to retrieve the first 512 bytes of the disk and display them as hex via the hexdump(1) utility. Here we go. So all these bytes here are the actual boot code with... ...the first partition being defined in these 16 bytes over here. And all the way at the end here, we see the MBR signature, 0x55 0xAA. So let's take a look at the first partition alone. We know it's 16 bytes in size at offset 446, so here it is. And so these 16 bytes describe the first partition entry shown by fdisk like this. Ok, so far, so good. Now what does our second disk look like? fdisk(8) shows that there are no partitions defined, no active partition. If we look at the bytes on the drive... ...then, no surprise, they're all null. Ok, so how do we turn this... ...into this? Let's start at the beginning. Well, the end, I suppose. Note how fdisk(8) tells us that the partition table here is invalid, because there's no magic number at the end of the sector? Let's fix that. We use printf(1) to write the hex bits 0x55 0xaa to offset 510 of /dev/xbd1. Ok, so now we note that fdisk(1) no longer complains about the partition table being invalid, but it still says that there is no active partition. So let's create a NetBSD partition here. For that, we need to get the partition type identifier for NetBSD, which is... 169 What's 169 in hex? Let's use the bc(1) tool for that. Ok, "A9" is the hex value we want. Let's write that to offset 450. And then let's mark this partition as 'active' by writing 0x80 to offset 446. Ok, let's see what fdisk(1) says now... There. Partition 0 is now a NetBSD partition, and is marked as "active". But we're still missing the size and definition of this partition. As we see here, there cylinder-head-sector definitions are zero. So let's define those. The beginning of this partition is sector 2048, because the entire first track is often reserved for additional bootstrap code as we mentioned before. So now we can use our formula to calculate the CHS address from the LBA address, with our LBA address in this case being 2048. So 2048 divided by 16065 is obviously zero, which will be our cylinder value. We then have a remainder of 2048, so let's divide that by the number of sectors per track, which gets us 32. 32 decimal is 20 hex. So our header value is 20 hex. Finally the sector value is the remainder plus one. So we can now write these three bytes -- heads, sector, cylinders to offset 447 for the starting sector. Let's check: There we go, starting sector at cylinder 0, head 32, sector 33. So now we need the last sector. We know we have a total of this many sectors, but the first sector contains the MBR here, so let's subtract that and divide it by the number of sectors per cylinder to give us the cylinder address number in decimal, 391 in this case. The remainder is 10040, which we now... divide by the number of sectors per track per the BIOS, 63, giving us decimal 159. So our headers byte in hex is then.... 9F. Now the value for our sectors is... 24 decimal. But remember we now need to puzzle together the bits for the cylinder and sector, so we need to convert this to binary. Which gives us 11 000 for the header bits. For the cylinder now we need 391 in binary.... which looks like this. So now we combine the binary bits back to hex. For this, we now grab the first two high order bits of the cylinder number -- 0 and 1 in this case -- followed by the 6 bits representing the sector, giving us 58 hexadecimal.... ...and the last 8 bits of the cylinder number converted to hex 87. So now we have out CHS bytes: 9f, 58, and 87. Let's write those to offset 451. Ok, what does fdisk tell us? Looks good so far. We have a starting CHS address and an ending CHS. But our MBR also requires the LBA addresses, so let's go ahead and add those. We know the LBA of the starting address is 2048, which is 800 in hex. But the LBA field needs 4 bytes, and it uses little-endian byte ordering, so we turn 0800 into 0 8 0 0 and write those bytes to offset 454. The last sector then is the total number of sectors minus 2048... ... in hex... ...and converted to least-significant bits first... ...and written to offset 458. And now, fdisk shows us a proper partition with the right size and the right addresses. But of course there's an easier way to do this. Let's wipe out the MBR by overwriting it with zeros. There, all zeros. Then we can use fdisk(8) to specify that we want to activate partition 0 with the specification of 169 for NetBSD, beginning at sector 2048, and of the given size... ...which, then, looks just like we had manually written to the disk. This shows that there really is no magic to any of the tools we use to manipulate our disks: it's all just a question of knowing which bits to write to which place. That, and that it's rather useful to be able to use dd(1), hexdump(1), and bc(1) to manipulate and write individual bits and bytes. Alright, let's recap. Remember, we started out discussing the typical boot sequence? We said that it all starts with some basic _firmware_, which may be the BIOS, or UEFI, etc., which... may perform a POST check and initialized the hardware it sees before it transfers execution to the first stage bootloader, such as the MBR, which then _may_ continue the bootstrapping process by handing control to the second stage bootloader, which then loads the kernel, which then, ultimately hands control off to a userland process like init(8). Now one thing to note here is that in virtualized hardware, some of these steps repeat, some may be skipped, and some may be simulated as our virtual host initializes virtual hardware as it boots up. But ultimately, we arrive at a running system. As we're looking at the boot process here, and having understood the MBR in some detail, let's consider some additional exercises for you: AWS instances allow you to get the output sent to the virtual serial console. Different operating systems display different levels of detail, and of course the boot process is different for each OS. So a good exercise for you would be to compare the output of different operating systems. Spin up a few instances -- say, a NetBSD, FreeBSD, Ubuntu, Fedora, or OmniOS instance -- and compare the output on the console. In particular, pay attention to the filesystem or disk specific messages. Make sure you understand what the output means. I've put sample output into individual files at this URL, but I still also recommend that you run the AWS commands to get more familiar with that part. Next, I want you to think about the security of the boot process. As we've seen, there are several layers to this process, and different pieces of software -- the firmware from the ROM on the motherboard, the bits written to the boot sector -- that we normally don't think about. How can we remain assured that nobody has tempered with the software or firmware? The concept of "trusted computing" comes into play here; look up the terms "remote attestation", for example, or "secure boot" in this context. Finally, you may have noticed that we _still_ haven't gotten to a point where we're actually using the _file system" or what that might actually look like. Let's fix that in our next video, shall we? Alright, that's it for today - see you next time, and thanks for watching! Cheers!