Hello, and welcome back to CS615 System Administration. This is week 2, segment 2, and after we covered conceptual storage models in our previous video, we'll next be talking briefly about storage devices and interfaces. As usual, this is merely scratching the surface on a much larger topic, but I hope you get a bit of a taste of the many layers and concepts involved as we talk about storage media. --- Let's begin with SCSI, one of the older standards describing how to connect devices or peripherals to computers and transferring data between them. The "Small Computer System Interface" has been around for over 30 years and exists in a confusing number of implementations and variations. SCSI used to be the default method to connect any peripheral device using long, wide and generally unwieldy ribbon cables and different types of connectors --- Here's a SCSI drive, but - different devices may use different connectors and require - different cables. SCSI has now been largely obsoleted by the Advanced Technology Attachment or "ATA" standards, but still lives on in the iSCSI standard, specifying storage connections using the SCSI command protocol over IP-based networks, a common choice in storage area networks. Another variations we'll see in a few minutes is the modern "Serial Attached SCSI" (SAS) or SCSI-over-Fibre Channel Protocol. --- The ATA standard, on the other hand, is often equated with the "Integrated Device Electronics" interface, or IDE. You may have seen parallel ATA, using flat, wide ribbon cables that make it impossible to wire multiple drives inside a server case, although nowadays -- fortunately for sysadmins and their hands -- _serial_ ATA or SATA is more common. These are your typical hard drives, which are named Integrated Device Electronics, because the drive includes the controller, integrating some of the complexity carried on the motherboard and separate controller in SCSI. That is, the drive includes a controller circuit as well as a few bits of firmware to facilitate the access. Now we're talking about a fairly low level connection here, but now is as good a time as any to remind you that security affects everything. --- A few years ago it became public that the NSA was capable of implanting malware in the firmware of harddrives, which included an API and the ability to read/write arbitrary information into hidden sectors on the disk. This is a really difficult threat to protect against, and a good reminder of the fact that just about anything can be compromised. Specifically, it helps us consider that - words have meaning, and that the name "integrated device electronics" implies that there is not just a bit of rather helpful magic sitting there. But don't fret -- not every harddrive is necessarily compromised, and you are not necessarily likely to be on the target list of the NSA. However, for several reasons, including security but more likely performance, you may wish to move away from IDE drives towards --- Solid State Drives, or SDDs. An SDD drive puts away with having mechanical, rotating, magnetic platters on which to store data and instead uses integrated circuits to store data persistently, such as flash memory. These drives are significantly more resistant to physical shock, more silent, and have less latency than IDE drives, which is why your cell phone most likely uses SDD for storage. While still more expensive than traditional, mechanical harddrives, you nowadays find more SDD or flash memory in use even in the server market. These will still use the SATA standard to connect, but may also be combined into larger storage devices which then might be connected externally or over a storage area network using --- for example, the fibre channel protocol. Fibre Channel is usually used in a switched fabric -- meaning it looks and behaves a whole lot like your normal switched ethernet network -- and utilizes optical fiber cables (shown here on the top right), although you can also run it over copper wires. Now if all these different technologies are not enough for you, you also want to consider that in just about any but the most simple environments all of these are combined in some form when Storage Area Networks are created. That is, you will likely find --- a multi-layered stack of protocols, building on top of the Fibre Channel Protocol, which might be utilized in a pure optical fiber channel network, an ethernet network, on top of regular TCP/IP and so on. Similarly, the SCSI protocol may be used on top of either one of those or over something like Remote Direct Memory Access over, say, Infiniband. That is, we have all sorts of means to piggyback storage protocols on top of other protocols: --- ATA over Ethernet, for example, allows us to reuse our existing ethernet network and turn it into a storage area network by encapsulating ATA frames into Ethernet frames; - Fibre Channel over Ethernet does the same, but for the Fibre Channel Protocol, and thus gives you an idea, a trend, whereby folks realized that, hey, we already have a working network here, let's just make it carry block-level instructions as well. So you really get - anything over ethernet, which makes things quite easy, but also notably implies that the storage area network you build in this way is restricted to the same layer two network segment and, as it runs on that layer, has no inherent security properties. So you can then try to push things up the stack by using - for example, iSCSI, which includes authentication and can be wrapped in IPSec, for example. - And of course we can take it up a notch and move on to "Serial Attached SCSI", which we mentioned earlier, and which nowadays is used in huge storage arrays... --- such as this one, offering efficient, high-speed storage access using the SCSI commands and protocol on modern hardware. - But, true to its SCSI heritage, SAS of course _also_ suffers from no shortage of confusing variations, and connectors. As you can tell, the opportunities to broaden your understanding or to specialize in storage technologies and protocols are not limited in any way. --- As a quick look at how technologies have advanced, note how performance throughput has increased over time and by protocol: - You see all the technologies we mentioned so far in this scroll, increasing bit rates over time, noting the introduction of Fibre Channel with around 100 MB per second and then moving forward with the various "over Ethernet" variations now going over Gigabit Ethernet, and eventually over 100G Ethernet, which really is pretty slick. So this addresses the bit rate for throughput -- but just how many bytes can we store on the different storage media? --- Well, the individual IDE drive has come a long way from the roughly 5 megabyte drive costing around 1500 dollars in 1980, hasn't it? I remember well when we built servers with 500 Megabyte drives and when eventually a 10 gigabyte drive was considered huge, but of course nowadays that's nothing. In fact, the price of storage for IDE drives has gone down so much that you can now buy an 18 terabyte drive for - just about 600 dollars. 18 terabyte in a single IDE drive! That's just amazing! So yeah, in a way upgrading your individual disk drive is an example of - scaling vertically, as we discussed in a previous video. --- But if even 18 terabyte is not enough for you because, as we established, disk usage expands to fill all available space, you can consider just getting a whole bunch of them and hooking them up one by one. That'd be a disk configuration commonly referred to as a "jbod" -- just a bunch of disks, which is exactly what it sounds like. You'd buy a bunch of drives and there you. Now, there's a fair bit of overhead on your server now, because you now have 15 individual disks, but it's certainly an approach that might represent - horizontal scaling, with all its implied drawbacks. --- A perhaps better approach might be to take all these - beefy exos drives - and put them into a RAID controller, to then combine them into a single volume, in effect combining the - - vertical and the horizontal scaling approaches. We'll talk a bit more about RAID in our next video, but of course nothing requies us to use IDE drives for this approach. We could instead... --- ...use SSDs, right? Like this one. This is a one hundred terabyte solid state drive. 100 terabyte in a tiny 3.5 inch form factor. Only problem: this will cost you a cool - 40 thousand dollars. Yep, you heard that right: 40 thousand dollars for a 100 terabyte SDD. Well, SDDs are more expensive, but they also are a lot faster and reliable than HDDs, so - talk about scaling vertically here! And now imagine scaling _that_ horizontally as well, into a storage appliance that combines SDDs or --- flash memory, like this NetApp "All Flash" arrays shown here. Again, as you can see there are countless ways to combine things to provide you with the right solution for your storage needs, with each carrying its own advantages or disadvantages. And so there is no single, simple solution to say "you should buy X" or "you should use Y" -- what the right solution is for you depends on your specific requirements as well as, as we've just seen, your budget. Sky's the limit... --- Ok, I think we're going to take a quick break here. Next time, we're going to be talking a bit more about RAID and logical volume management as well as the physical aspects of a typical HDD. But before we go there, I'd like to leave you again with another exercise: - Let's pretend you're a SysAdmin working at Stevens, and you have to replace the current storage system we use for the home directories on linux-lab. What solution would you propose? Now obviously, you're lacking a lot of information to make this call, but on the other hand, you can probably make an educated guess on the storage needs based on the observed usage. Try to spec a solution and then see how much that would cost you. Then consider that as an academic institution, you are likely bound by a budget with certain limitations. Finally, consider what implications your choice might have on the rest of the compute environment. Ok, I think this should keep you busy until the next video. See you then - cheers!