Hello, and welcome back to CS615 System Administration. This is week 2, segment 3. In our previous video, we talked a fair bit about different disk devices and how to connect them In the process, we briefly mentioned a few configurations for how to combine individual disks into larger storage pools, so in this video we'll pick up where we left off there and cover this concept of storage virtualization in a bit more detail. --- Now when we hear the word "virtualization", we often jump to certain assumptions and think about virtual machines and perhaps cloud services such as, obviously, AWS, but on a basic level, "storage virtualization" really boils down to simply separating individual, physical storage media from a logical storage unit, meaning we might populate a large disk array with hard drives in a disk array and then carve that up into virtual disks as needed. This is in a way the inverse approach from dividing a single hard drive into separate partitions, a concept we'll discuss in more detail in the _next_ video. Here, we will take a quick look at two different approaches to storage virtualization: A hardware based approach, where a physical device may support and offer means to combine individual drives, such as RAID appliances; and host-based solutions, whereby the operating system manages the physical storage devices and combines them using logical volume management. We'll also see two practical examples of logical volume management: the device mapper, commonly used on modern Linux systems, and ZFS, a filesystem that includes a storage virtualization layer. But let's start with... --- RAID. As mentioned in the last video, what you see here is a now obsolete piece of hardware, an Apple XRaid appliance. I keep using this example picture, because it's easy to immediately see how RAID works: A device offering a Redundant Array of Independent Disks -- or a Redundant Array of Inexpensive Disks, as it was originally known -- provides housing for several individual hard drives, adds some management firmware to manage the drives, and connects to your storage area network via, for example, fibre channel. But just stashing the drives into the case is not quite enough. And this is where this differs from the JBOD approach we mentioned in our last video: this is not just a bunch of disks, but RAID allows you to - combine all the disks into a single virtual disk, so that you can create a large filesystem that spans _all_ disks. Another advantage offered by RAID devices is to improve I/O efficiency by - striping writes across all drives, thereby minimizing, for example, seek time on the hard drives. And thirdly, RAID provides for a certain amount of data redundancy - when instead of writing data across all drives it mirrors data between subsets of the drives, thereby providing fault tolerance. Which is important, since we know that all things eventually fail, and a damaged hard drive should not lead to data loss if we can avoid it. --- So we see that we have several ways of combining and utilizing the disks in a RAID array. We use the following numeric levels to describe the different approaches: Level 0 through level 6 represent the most common solutions, and you can see the distinction in how data is written as described here. That is, at times we consider I/O performed on a physical block level, other times on a byte- or bit level. Don't worry, we'll go into the details of what exactly a "block" is in this context in our next video, but perhaps it might be good to briefly summarize the most popular RAID levels to illustrate the benefits they provide. --- So the simplest way to use multiple disks is to but them next to each other and then issue writes not to just one drive at a time, but to distribute the data across both drives. That is, when the filesystem issues a write operation, the RAID device will chop it up on a per-block level and write the first block to - disk 1, the second block to - disk 2, the third block again to - disk 1, the fourth block again to - disk 2, and so - on and so - on. In this fashion, I/O performance is increased, as you can read or write data double as fast as you could otherwise. However, there is one problem: If any one of these two disks fails -- and from experience as well as simple probability we understand that the more disks you have, the more likely it is that a disk _will_ inevitably fail - well, if this disk goes away, then you have lost half your data, and you're unlikely to be able to reconstruct what you lost. So RAID-0 provides I/O performance benefits, but no fault tolerance. --- if you want to ensure that you don't lose your data in the case of disk failure, you could use a RAID-1 configuration. In this case, we have the RAID duplicate every block of data that your filesystem is asking it to write to disk and it will write _two_ copies -- - one - to - each - disk. Now in this case, if you lose one of the disks - there's no problem -- you still have all the data available on the other disk. Your RAID will alert you that it's now in degraded mode, you pop out the broken disk, replace it with a shiny new disk, and when you pop that disk in, - your RAID will automatically copy all the data from disk one onto disk two, and you're back in business *without any downtime*. So that's pretty neat! But of course in this configuration, you're (a) foregoing the benefit of I/O improvements, and (b) you effectively only get half the disk space of what you put into the system: for any 1 TB of filesystem space you want to have available, you need to put in two TB of storage. --- So instead of doing either a RAID level 0 or level 1 configuration, you might decide to do a RAID level 5, which combines I/O performance with fault tolerance. It does that by striping the writes across multiple disks - with each write going to - separate disks, but then _also_ writing - parity bits. Now these parity bits allow you to reconstruct the data in the case of disk corruption. Furthermore, RAID-5 distributes the parity bits across all disks, so that you can at any point lose any one of the disks without losing any of the data. Now you _could_ write all the parity bits onto one of the drives and still retain the same fault tolerance, but doing so (known as a RAID-4 configuration) incurs a performance penalty, which RAID-5 overcomes via this distribution. Now with a RAID-5, you do need to have at least three disks, and you do not get a linear increase in disk space as you add disks, as some space is reserved for the parity information, but by and large, this is a popular and efficient solution to increase performance while at the same time retaining fault tolerance. Now of course you see where this is going, right? You can combine these approaches in different ways, so if you want to have increased fault tolerance or increased performance --- you don't have to choose any longer. That is, you can combine different RAID levels to get you - a mirrored array of stripes, a striped array of mirrors, etc. etc. The firmware in the controller handles all this for you and swapping disks while the system is running - so-called hot-swapping - is one of those things that makes your little SysAdmin heart happy as you see the blinkenlights of the device as it rebuilds the array and saves your data. Quite useful, such a redundant array of independent disk, I tell ya. But the concepts of RAID need not be implemented solely via special hardware devices. And, more generally speaking, any kind of storage virtualization can be performed --- using software, meaning the kernel exposes the hardware and then allows a piece of low-level software to manage it. This is often referred to as "logical volume management", and, in general terms, this is broken down into - the management of the physical storage units -- for example: hard drives or storage devices connected via fibre channel, say. These units are divided into _physical volumes_, which then - can be combined to form so-called _logical volumes_. These logical volumes may then span multiple physical devices, providing a layer of abstraction from which - the LVM, or logical volume manager, can then further combine or create logical volumes such that you could - combine individual disks to create a larger volume in a jbod-like fashion; - could allow for redundancy and fault tolerance by allowing faulty disks to be replaced without downtime - could allow for a file system to immediately and automatically be resized, to grow when a new disk is added; - to provide the same RAID functionality we already discussed; or - to automatically perform periodic snapshots of the filesystem, thereby offering a convenient live-backup mechanism. We'll discuss filesystem snapshots in particular and backups more generally later in the semester. For now, let's briefly demonstrate what the use of a logical volume manager on a typical Linux system might look like: --- For that, we log in to linux-lab, where we observe two disks mounted: /dev/sda1 mounted under /boot, and what appears to be a mapped device as the root file system. This root file system thus does not sit on a regular disk partition, but appears to be managed via the device mapper, the basis underlying the logical volume manager in Linux. Let's look at the dmesg output relating to the disk. Here, we see that /dev/sda appears to be a SCSI disk that contains several partitions. The 'lsblk' command shows us a bit more information about this block device: We have one 20 GB disk, with the first partition mounted under /boot, the second partition, /dev/sda2, is an extended partition, so only contains the meta information for /dev/sda5, which itself is then divided into two sub-partitions, both managed by the LVM. One partition for slash, and one for swap. 'swapon' confirms that the swap space available on this system is made available via the device mapper device /dev/dm-1. We'll look into the concepts around partitions in more detail in our next video, but as shown here, we can see the use of an LVM even for a very simple system with just a single disk. --- Another thing I wanted to demonstrate here was the use of ZFS to manage storage resources. ZFS is a filesystem originating from Sun's -- well, Oracle's, nowadays -- Solaris operating system. It is a fairly different filesystem from others in that it includes all the bits for logical volume and storage pool management, as we'll see. Let's start with a new screen session... ...and spin up an OmniOS instance. OmniOS is a version of Illumos, an open source unix system that's based on OpenSolaris, so the easiest way for you to run a Solaris variant. This AMI image here lets us get started, and our custom `ec2wait` function, which you may recall from one of the warmup exercise videos, lets us know when the system is up and running. Alright, so let's log in. Here we are. Let's again look at the disk devices reported via dmesg. Here, the disk is called 'xdf'. Let's try out the 'format' utility to look at the partition table for this disk. We find one disk, identified using the historical scsi addressing schema as controller 1, target 0, disk 0 with individual partitions -- or "slices" in Solaris lingo -- then being referenced after this prefix. When we select this disk, however, we get a warning message: "slice 0 of this disk is part of an active ZFS pool" So we can't use the 'format' utility here. 'df' confirms that out root filesystem appears to be located on the 'rpool' ZFS pool. Let's take a look at that, via the 'zpool' tool. Here, we see that indeed the 'rpool' appears to be backed by that very disk: c1t0d0, giving us roughly 7.5 gigabyte of disk space. 'zfs list' then shows us the filesystems created on this pool, in a way similar to how the extended partition on the linux system contained the root filesystem. Now, let's pretend that we're adding new physical disks to our server here and want to create a second storage pool from those disks. For that, we run the 'aws ec2 create-volume' command in a separate screen session and create a 1 gigabyte volume. Then we attach that volume to the instance... wait, what was our instance-id? Let's check what instances we have running with one of our aliases. There, let's grab this instance-id... ...and continue. Now let's repeat the same thing for a second volume, so we can pretend that we just hooked up two separate hard drives to our server. Ok, now back to our server. Looking at the dmesg output again, we now see that we have two more disks showing up: xdf0 remains the root file system, but now we also have xdf1 and xdf2. Note that when we used the 'aws ec2 attach-volume' command, we specified a different device name: this is a bit confusing and annoying, because the device name we choose there may not be what the OS on the instance uses, but so be it. Note also that we didn't have to reboot our instance -- we could simply plug in our new hard drives into the live system, et voila, here are our disks. That's pretty cool. Anyway, let's see what 'diskinfo' tells us: Ah, here they are. Our root disk, eight gigs in size, and two new disks, c1t5d0 and c1t6d0. Now let's create a new pool from these two disks. Let's call it "extra". 'zpool list' shows us that our new 'extra' pool combines the space from the two disks, yielding roughly 1.9 gigabyte. That is, there's always a little bit of overhead involved, so we didn't get the full 2 gig of space, but we do see how we can combine storage into a single pool. 'zfs list' shows our filesystems, but we haven't even created one on our 'extra' pool yet, so let's do that real quick. 'zfs create extra/space' - there. Now let's tell the system where to mount this new filesystem... ...and, here we go. New disk space, now available under /mnt. We can write data into this filesystem now as we'd expect. Ok, so far, so good. But now let's pretend that we got our hands on yet another disk drive and want to add this to this pool. So we create another volume... ...attach that as before... and immediately, 'diskinfo' shows the new disk present, c1t7d0. Our disk mounted on /mnt has about 1.8 GB of space. We add the newly attached disk to our existing zpool and... bam, just like that our mounted disk is now 2.7 gigs in size. So that's really cool, right? We didn't have to shut down the system to attach the disk, and we didn't have to partition the disk or recreate the filesystem or anything. Simply adding the disk to the pool underlying the filesystem gets us extra space! I hope that this gave you an impression of the flexibility and power of ZFS, and how this illustrates the concept of storage _virtualization_, as we're now using storage units in a very flexible manner to combine and create file systems. --- Alright, time for a break again. I want to make sure that you follow along with these examples, so here's another set of exercises I recommend for you: Create an OmniOS instance and use ZFS to create different types of pools. ZFS supports all the concepts we talked about and you can use it to increase performance by striping data, increase fault tolerance by mirroring data, or a combination of these via something called RAID-Z. Once you've created such a pool and mounted the filesystem, simulate a hard drive failure by detaching the EBS volume -- how does the system handle this? Next - think about how the system behaves if we add a disk. Adding a disk adds space, so growing the filesystem seems an easy enough thing to do. But what if you were to _remove_ a disk? Can you _shrink_ a filesystem in this manner? Seems possible so long as the data on the filesystem still fits into the new pool, but what if that's not the case? As you can tell, there are a number of things you can play around with here, and I hope that you will explore the concepts from this video in this way. If you run into problems or have questions, don't hesitate to ask for help! Ok, that's it for today. - Next time we'll talk about the physical structure of traditional hard drives as well as continue our discussion of different partitions that we already hinted at a little bit today. Until then - thanks for watching. Cheers!