Hello, and welcome back to CS615 System Administration. This is week 11, segment 4, and it's about time we set things straight: "crypto" means "cryptography". That's it. None of this crypto-currency nonsense, ok? Ok. But how does "cryptography" help us? We've talked a bit about system security by now, covering risk assessment, threat models, the attack life cycle and Zero Trust concepts. Now I understand that most of you have a good understanding of some of the core concepts in cryptography, but it's still useful to revisit them here in this class to ensure that there are no misunderstandings when it comes to how cryptography can provide us with some level of security, and in so doing let us choose the right mechanism for the right threats as identified in our risk assessment. --- So let's start out with something terrible, that nevertheless we still see a lot: A trivial 'curl | sudo bash' command. Now this is often criticized as being a terrible idea, but just why exactly is that? --- Well, on the one hand, if we have a passive attacker here in the middle, then of course they can observe all the traffic - since everything is being transmitted in the clear. We're using plain http here, and as we've illustrated in previous videos, capturing the network packets lets you thus observe all the data. But this is not the only issue here, right? --- We also know that instead of a passive attacker we have an active attacker in the middle, then they can not only _see_ all the traffic, but they can intercept it and - reply instead of the server we were trying to reach and thus slip in whatever nefarious commands they want. And since we're piping this directly into 'sudo bash', boom, our host is compromised. So this is pretty insecure, right? And what do we do to fix this? --- Let's ask some random people on the street -- if you saw this command, how would you fix this? Oh, ok, true - most random people from the street would not have a clue what the hell you're talking about, but let's pretend we asked some of the smart cookies from Stevens. Pretty sure the most common answer would be: - rub some crypto on it! That's right - if you're on the internet, and something's insecure, then you rub some crypto on it and it becomes secure - right? Perhaps even "military grade" encryption. But just how exactly does cryptography help us in this scenario? --- It's true that cryptography can help us, but we have to be clear about what areas cryptography can help us with and which ones it cannot provide any security. That is, - cryptography _can_ provide some risk reduction or even threat elimination to some degree in the following areas: - Secrecy or Confidentiality, helping you answer the question "Did or could anybody else see (parts of) the message?". Some things are best kept a secret, right? And this is perhaps the most obvious use case, and what most people think about when they hear cryptography. Secondly, cryptography may help us assert - Accuracy or Integrity, answering the question "Was the message (or could it have been) modified before I received it?" This is also really important, but different from secrecy. For example, if you issue a bank transaction, it really makes a difference whether it says that you get 10 dollars or 10,000 dollars, and this difference may be much more important to you than the assurance that nobody could see what the value was. Finally, the last area where cryptography can help us is that of - Authenticity, answering the question "Is the party I'm talking to actually who I think it is or who they claim they are?" So with these three broad areas in mind, let's go back to our example and see how rubbing some crypto on our commands might help us mitigate some of the risks we've identified. --- First comes confidentiality, because that's easiest and most obvious. As we've seen in our previous exercises when trying to capture packets that were encrypted in transit using TLS, - the data exchanged can no longer be read without possession of the encryption key - meaning that our adversary can no longer - eavesdrop on our communications. Hooray! --- But of course there are some problems that might occur. One of them is - that if we have no assurance of authenticity of the other party, then the confidentiality we gained really isn't buying us much. Sure, the data cannot be seen by somebody in transit, but if we don't know who we're talking to, this doesn't do us much good. The other issue here is that of course in order to be able to establish an encrypted channel, - we need to exchange keys with the other party. Now many protocols such as TLS facilitate this, so we don't have to manually perform the key exchange, but it remains a potential point of failure. And once we've exchanged keys, if we use the same key over and over, then at some point an adversary might be able to gain possession of the key, and then might be able to retroactively decrypt communications they collected earlier. To mitigate that risk, we - need to perform periodic key rotation, or establish and use individual session keys, which increases the complexity of the protocol and further introduces a new requirements: - we often want to be able to revoke a previously used key. This aspect of key management and key rotation leads to one of the most common mistakes encountered: people frequently end up - storing the key alongside or inside the code, which is why you can so often find passwords or private keys in your own or even public code repositories. To mitigate this risk of exposure, it's critical that you separate your code from your config, and your config from your secrets. The best way to ensure that secrets are handled appropriately is then - to use a proper key management system, such as for example AWS KMS, where you can ensure that access is granted on an as-needed basis and keys are automatically rotated. But confidentiality is only one aspect of the communication, and in our example we had a passive attacker in the middle. But what --- if we encounter an _active_ attacker? With our confidentiality assured via TLS, the attacker - would of course not be able to understand what the message says, but, having actually intercepted the connection, they'd be able to - modify the message nonetheless. That is, even if they can't _read_ the message, it's often possible that changing the contents of the albeit encrypted message can lead to undesirable results on the client side. In the best case scenario, the message turns into garbage, but it may also be possible for an attacker to introduce select bitflips or other changes that change the meaning of the message. So how can we protect against that? That is, what sort of protections may provide us with assurance of data _integrity_? --- One method might be to use a separate checksum of the data. That is, we not only fetch the data, we also fetch a known checksum of the data, then compare the checksum with the known checksum and only execute the commands if the checksum matches. But you may of course notice that we have another problem with this approach: --- First of all, we _still_ need authenticity. Just like before, if we have no authenticity for the data, then we don't know whether we can trust the checksum, either. An attacker able to MitM our connection when we fetch the data surely could equally MitM the connection when we are looking for the checksum, and simply provide us with a modified checksum that matches the modified data. In other words, the checksum we use to validate the data needs to come from an authenticated source or out of band -- otherwise, it's as untrustworthy as the original data. And of course we then really have a circular dependency here: - because the integrity of the checksum used to assert the integrity of the data is at risk itself. And there are multiple angles here: on the one hand, there might be a modification of the checksum, but on the other - there might be the risk of a collision in the checksum algorithm, allowing an attacker to provide data that matches the checksum, but isn't the original data. Furthermore, how are we verifying the data? - If the tools we use on our end to validate the checksum are not trustworthy, then the checksum cannot be trusted. This is a common mistake made when deploying and using, for example, host intrusion detection systems that perform analysis of the checksums of the system's binaries: if you validate the system's binaries using tools _on that system_, then you cannot have any assurance of validity: any attacker able to modify the binaries on the system would likely also have been able to modify the tool you use to verify the checksum on that system. So make sure to always perform such validations using known-good tools, usually on another, a trusted system. --- Talking about using checksums, it's important to note the different use cases these functions provide in the context of "cryptography": - On the one hand, when we talk about integrity assurance, we usually want to be able to calculate the hash quickly, and we don't add a salt here, meaning that the input will always produce the exact same output on all systems -- otherwise, integrity checking wouldn't work to begin with. - But we also use hash functions in different contexts. For example, for true integrity _and authenticity_ assurance, we use a hash-based message authentication code, which utilizes a shared secret. And of course - we use hashing to store passwords, but here we have a different requirement: we want to ensure that the algorithm is not only collision resistant and utilizes a salt, but we want it to be incrementally slower so as to make a brute-force attack infeasible. So for password storage, which really is rather different from integrity checking, do make sure that you - understand the difference between the two and - always _hash_, not _encrypt_ the passwords. That is, _you_ should not have the clear text passwords in your possession at all. For this purpose, you also - always want to ensure you salt the data, meaning you add some extra data such that the algorithm will produce a different output on your system when compared against another input. This mitigates the risk of so-called rainbow tables, where an attacker pre-computes all possible hashes so that they can then perform a lookup in constant time. So for the password context, you want to use - a key-derivation function that utilizes "key stretching", based on a salt and the number of iterations you may perform based on your compute capabilities and where you want to strike the right balance between number of iterations and usability when the user enters their password. --- But let us go back to the third property cryptography can provide, and which we already mentioned twice as being required for the other two: authenticity. As we just said, if we have an active attacker in the middle, they can impersonate the server here on the right, but of course through the authenticity provided by TLS - our client will notice that this is not the server it was trying to reach, and thus - will not continue to talk to the server. So that's great: as long as the attacker cannot impersonate the server, we get assurance of authenticity. But do note that the authenticity only extends to the _server_, not the _data_. That is, --- if our attacker is able to compromise the destination server and steal the private key for the certificate, then --- it can of course impersonate the server, and our client would not be any the wiser, - make the connection, be presented the valid certificate, and then - happily accepting the rootkit provided by the attacker. So even though our TLS connection promises _authenticity_, if the certificate key is compromised, we still lose. Now that may of course seem obvious, but we _can_ add another layer by further using cryptography to provide _additional_ protections: --- Suppose the data provider were to not only put the data on the web server for us to fetch, but also a cryptographic signature of the data. We can then fetch both, verify the signature, and only run the commands if the signature matches. Now this may _sound_ identical to the case we presented just a minute ago for integrity, but we have a distinct difference here: --- In this scenario, note the location of the - secret key: it remains on the developers system, _not_ the web server. So with this private key, the developer can then - sign the data in question and then --- upload both the data and the signature to the web server. Now when our MitM impersonates the server, it can _still - feed us the rootkit, but if we then --- make a request for the signature, the attacker - can _not_ fake it, because they do not have possession of the private key used to sign the data, even though they were previously able - to compromise the web server since the private key remained - on a separate system the entire time. So cryptography provided complete authenticity of the data, even though the authenticity of the server was compromised. --- Now of course there are some threats to authenticity as well, and of course - one of them is integrity. Any assurance of authenticity is meaningless if the integrity of the data used to provide its assurance is not provided. Now fortunately, most algorithms and protocols do combine the two, and any changes in the data would invalidate the authenticity assurance, but it's still something to keep in mind. Now since we are using a PKI for our TLS connection, we - also are relying on a very complex infrastructure with - a very broad trust model -- remember how many root certificates we found on our laptop, and how every single one of those can issue a certificate for any site on the internet? Next, a major pain point with any and all cryptographic tools and protocols is - that of usability: if a system isn't easy to use, it rarely matters that it's secure, as users will either not use it or use it incorrectly. Lastly, it's worth stressing that - authentication should not be conflated with authorization. In the example we used here, the distinction is not meaningful, but if you were, for example, to use a client certificate to authenticate to the server, that alone should not entitle you to all access privileges and instead you need to include and add authorization controls, as we discussed in our last video. But ok, having seen how cryptography can provide some protection in the areas of confidentiality, integrity, and authenticity, let's quickly summarize a few common --- classes of vulnerabilities and see which of these cryptography can help us with. So there are - those dealing with memory safety, where - you may use uninitialized data, - accidentally overflow a buffer and reach executable code, - or use after free issues. Some of these are programming language specific, and certain libraries as well as kernel protections can help mitigate them, but by and large these have nothing to do with cryptography, right? Next, we have - input validation errors, which are one of _the_ most common vulnerabilities. On top of the list there is of course - Little Bobby Tables, as well as - format attacks. In both cases the use of some libraries can help catch such errors -- we had mentioned perl's taint checking when we talked about programming tools as an example of a language security mechanism that can help you catch such errors -- but again, not something that cryptography can help you with. Another class of vulnerabilities is - race conditions, such as - time of check, time of use or - symlink attacks. The latter are of course a filesystem and permissions issue, while the former is more generic and can extend to many scenarios. In a nutshell, any time you have two system calls, such as a "does this file exist" and "open this file", you have a race condition whereby another process might be able to invalidate your check. For this reason it's important to understand which of your system calls are _atomic_ and which ones are not. --- Then we have your common privilege escalation vulnerabilities, which include - cross-site scripting or cross-site request forgery attacks, or - any way by which you can elevate your privileges on the same system. There are - social engineering attacks, including - phishing and - watering hole attacks, and of course for some threats you may be vulnerable to - simple brute-force attacks, - whereby the attacker simply tries out all possibilities, which of course can also - lead to a denial of service scenario. Finally, we have - information disclosure attacks, including - simple lack of proper protections, or - the various attacker in the middle scenarios, both passive and active, - and the lack of any of the required protections. And you'll note that of all of these different classes of vulnerabilities, cryptography really only - help with these last few types here. Which isn't to say it's not valuable -- it is, and often times the only guaranteed way to yield integrity, confidentiality, and authenticity, but there are a lot of things we should keep in mind when employing cryptographic solutions: --- For starters, and I hope that this is generally understood, you should not try to invent your own cryptographic protocols nor attempt to implement them yourself. Using broadly tested, widely available and used solutions and libraries is always preferred. - Next, as mentioned, it's critical not to conflate authentication with authorization -- just because you could prove that you are who you say you are does not mean you are entitled to perform any and all actions. A bank robber does not have to simply show her passport to prove that she really is who she said she was to then be led to the safe to fill her pockets with cash. And specifically, - cryptography does not handle authorization at all. Authorization is business logic, a decision made based on an understanding of the system, the actors, and their needs, so not something a mathematical algorithm or protocol can help you with. We also saw that - most of the time we want to have all three properties provided by cryptography: confidentiality without integrity or authenticity is not often useful, and authenticity assurances without integrity don't count much. When correctly deployed and used, - cryptography can help protect us in many ways, but it cannot protect against incorrect use. Which is one of the biggest headaches we have with many security tools: they often are hard to use, unintuitive, and failure to use them correctly can not only lead to a loss of the cryptographic protections, but worse to a false sense of security. Finally, as we just saw, even though people often times think that you can make anything "secure" by rubbing some crypto on it, - cryptography is not a remedy to all vulnerabilities. In fact, often times cryptography cannot help you at all. So what do you do when you can't use cryptography to mitigate a given risk? Well, - it all depends. You need to understand your system, the threats, the possible defenses, and then decide. You need to understand your threat model. And just like there are caveats and pitfalls when using cryptography, so are there many additional fallacies in the realm of system security, but those shall be the topic of our next video. For today, thanks for watching, and I hope you remember: crypto means cryptography. Cheers!