Hello, and welcome back to CS615 System Administration! This is week 08, segment 3, and we're still talking about email. In our last video, we had seen the use of TLS to wrap plain text SMTP and add transport layer encryption, which is nice and well and provides some security, but unfortunately does not solve _all_ the problems one might have with email. In this video, we'll take a closer look at the elephant in the room: Spam. As mentioned in our first video on the topic, Spam makes up for up to 70% or more of all emails sent. So let's take a look at why it's so easy to send unsolicited email in bulk, and what defense mechanisms we have against it. --- So here, in the bottom window, we have our mail client, mutt, and up here, in the top, we are once again sending a simple email from our EC2 instance. After we submit the mail, we know that the sending MTA will connect to our receiving MTA and hand it the mail, which will then process it locally and eventually deliver it into our inbox. There it is. When we look at it, we see the various fields, "From: ", "To: ", "Subject: ", and the contents of the mail. Now let's compare to sending the same email manually, using 'openssl s_client' again, also like before. We specify the "mail from" and "recipient to" fields and then add our text here. Then we wait for the mail to be delivered, but when it shows up in our inbox, it originates from the "Spamd User", and when we look at the contents of the mail, it looks rather different. Specifically, it tells us that our spam detection software running on the mail server here has determined that this email is, well, spam. As you scroll through the text here, you see that the spamassassin tool appears to use several heuristics to determine the probability of an email being Spam, including missing headers. Since these various properties add up to a score above the currently configured threshold, the system marked the message as Spam instead of just dropping it into our inbox. So let's see if perhaps we can outwit the spam detection system here. Let's try again. We once more specify the same "mail from" and "recipient to" fields, but then, when it's time to provide the data, we _also_ fill in some additional headers: a "From: " header that looks more normal, a "To: " header a "Subject: " header as well as the other headers that the spam detection system had complained about being absent. So _now_ when we hand off this email to the mail server... ...it shows up in our inbox as we'd expect, not marked as Spam. There, that looks much more normal. Here's the "Date" header we had specified, the From, To, and Subject, but... where is the Message-Id header? To view that, we have to ask our mail client to show us _all_ the headers: most email clients only display a small subset of headers that the user may be interested in, and hides the others. So now when we look at all the headers, we notice quite a few more details hidden in our email. Let's get rid of the second window up here for the time being. So now we see up here at the top a "From" header, some information that our spamassassin added here, where it claims that "no, this message doesn't look like spam", as well as some headers telling us how we received the message, even including the TLS cipher and strength used. Down here we have another "From" header, but that's... different from the "From " header we have way at the top, and the mail client only showed us the second one. So we see that an email is a bit more than just what we've come across so far. - In particular, an email consists of - a few mandatory headers. Without these headers, the email cannot be delivered. This includes - The "From " header, also known as the "SMTP Mail From" header up here, as well as a recipient, obviously, and - a Date header. In addition, the email may have any number of _optional_ headers. They are truly optional in that the mail protocol doesn't require them, but as we've seen their absence may lead other systems to classify the mail as Spam, because well-formed email sent by actual, friendly, cooperating email systems tend to include them. These headers include - the cosmetic "From:" header, which we noticed can be different from the SMTP Mail From header, - the "To: " header, which can also be different from the SMTP RCPT To header, - the Subject, and so on and so on. As we'll see in a little bit, some of these optional headers can be used to relay a lot of useful information about the email to the receiving system. Now of course we also - have the content of the message, which we generally consider to be plain text -- or, unfortunately, HTML nowadays, but - the content really is independent of the Simple Mail Transfer Protocol - and may even contain attachments and multipart messages, all defined in another standard. But for the time being, let's keep our focus on the email headers, and how our spam detection system reacts to them or uses them. --- So let's go back to our EC2 instance and try something else: Let's specify a random "MAIL FROM" address here, and then let's try to send an email to my Stevens address. Now this mail server immediately says "not so fast, my friend! I'm not responsible for stevens.edu, so you I won't let you send mail to that domain from me." so let's perhaps try another mail server. We all know that Yahoo runs a mail service that anybody can sign up for, and surely Yahoo can send an email to Stevens, right? So let's give that a try. Again, same "mail from" address, and same recipient, but nope, Yahoo doesn't allow relaying mail for domains it's not responsible for, either. [pause] This is known as _not relaying email_. - Back in the early days of the internet, there were only so many mail servers, and any mail server was happy to accept mail for any other mail server and hand it off to them, to _relay_ the mail. - But that of course meant that you could abuse the other mail server and send mail through it to other systems. So with time, mail servers said "you know what? I only allow people to hand me emails that are for _my_ users.", which is what you're seeing here in action: the netmeister email server is not responsible for the stevens.edu domain, so rejects our attempt to send an email to that address. - Now there _are_ still a few so-called open relays on the internet today, but they are often times abused by spammers to deliver their mail for them without being detected and thus end up on various internet blocklists. But anyway [continue] so if we want to send an email to our Stevens address, we have to talk to the mail server responsible for that domain, which... we can look up via the MX DNS record So let's give that server a try. Hey, look, that one's not shutting us down right away! Let's create an email and see what that looks like when it shows up in our inbox. The mail will in fact be delivered to our local inbox in our window below because I have my Stevens account set up such that all mail from there is automatically forwarded to my private address. Ok, so our mail server has accepted the mail, and... wait for it... down here, it comes in. Ok, so that looks fairly normal. So we've shown that in order to send email _to_ a specific domain, you pretty much have to talk to the responsible mail server and that open relays are not, by and large, used any longer, which helps address one part of the Spam problem, but obviously we still have some work left. --- Let's look again at this email we had open a minute ago: We have a "From" address as "some-user at this ec2 instance", but specified a different user down here, so maybe there's no real relation between these two? Can we just send an email pretending to be somebody else? Let's give it a try. Let's again talk to our mail server here and... let's pretend we're sending a mail from, oh, I don't know, let's say Barack Obama, who wants to invite me. So we specify all the headers... ...type our message and send it off. And here it shows up, and sure enough, it looks like it came from by good friend Barack. Let's look at the headers, and, yep, this email definitely came from Barack Obama, no two ways around that, is there. So in other words, - SMTP provides no authenticity guarantees. If you can send me an email, you can pretend that you are whoever you want to be, really. - The SMTP Mail From can be set to anything, and - doesn't even have to be in alignment with the "From" header that you're showing, which sometimes Spammers use when a mail server tries to only accept SMTP Mail From headers in, say, its own domain. But that seems like a huge problem, right? But how do we fix this? I mean - the mail server can restrict what mails it accepts, which makes sense, since it knows what domains it's responsible for, but just how is it supposed to know whether the system connecting to it is responsible for the email that it claims to be sending "from"? But, wait a second, if we're connecting _from_ an IP address, maybe there's a way to determine whether that address should be allowed to send mail on behalf of a given domain? --- Let's give this another try, using a different example. Let's try to send a mail through Yahoo's servers again, this time pretending to be from Bill Gates, why not. Oh, right, we already knew that Yahoo doesn't allow relaying mail, so let's instead have Bill Gates send this mail to my Yahoo account. There, of course Yahoo accepts mail that goes to yahoo.com - makes sense. So here comes our spoofed email, since obviously all these people want to communicate with me per email. But hey, what's this? Yahoo tells me it won't accept my email. The website it references here will give you more details on why it rejected our mail, but let's see how it figured out that it shouldn't allow this random EC2 instance to send mail on behalf of microsoft.com. And what do we do when we want to look up some information? Why... to the DNS! Here, let's make a DNS query for a text record in the microsoft.com domain that matches "spf". Look at that! "SPF" stands for "Sender Policy Framework", and is a way for you, as the domain owner, to specify what systems are allowed to send email on your behalf. What this says here, in effect, is: "include the results of these other DNS lookups, but then deny anybody else." So let's look at one of these. _spf-a.microsoft.com expands to these netblocks, specifying that any SMTP connection from these IP addresses is allowed to send email for the "microsoft.com" domain, but nothing else, which is why Yahoo knew to not let us send mail on microsoft's behalf. What does this look like for my own domain? This entry here says that any IP addresses for "netmeister.org" as well as the MX record for this domain are allowed to send mail, but nobody else. Let's compare to what Google uses for gmail... ok, redirect... looks like that also uses some netblocks. And now back to our original victim, poor Barack Obama: Looks like obama.org uses salesforce and sendgrid to send mails, but note that the last word there is not "-all", but "~all", which indicates a so-called "soft failure" mode, while the Microsoft example had a so-called "hard failure". So our mail server up here can actually reject the email right away when it checks the mail from address, even before accepting any more data. But now note that the SPF "hard" and "soft" fail modes do not necessarily imply that the mail is to be rejected by the mail server -- it's merely an indicator to the mail server that it should consider. And if we then look at the headers from our email that we sent pretending to be Mr. Obama - we note that the headers include this information: "obama.org discourages the use of this random EC2 IP address as a permitted sender" and the system processing that mail may then decide what to do with this information. A common example might be to place the mail into a quarantine inbox or to otherwise label it as possible spam. Now in the case of the "hard" SPF failure, we've seen both Yahoo and my mail server not even accepting the mail, but if we had tried to send the spoofed Microsoft mail through the Stevens mail server, it would have been delivered, but then - gotten a note in the headers like this: "SPF Failed - microsoft.com does not permit this random EC2 address to send email on its behalf". --- But wait, so Stevens email server sent the SPF failed email, so that mail server seems to act differently from the others we've seen. As I mentioned before - if I send email to my @stevens.edu address, it gets automatically forwarded to my private mail server... ... there. So now let's take a look at the full headers here: Ok, that's a lot, let's see what might be of interest in this context. Over here, we see an "Authentication-Results" header, and... whoa, a whole bunch of Microsoft Exchange data here. Let's change our display to show us only the headers we care about right now. First, let's take a look at the path the email took. We know it was forwarded from Stevens to our mail server, but what did that look like exactly? We see several "Received" headers, which provide us with the exact path the email took: - it starts out being sent by our local user and accepted by the local MTA on our EC2 instance, which then talked to this MX server responsible for stevens.edu, which appears to be an outlook.com hosted domain, - which then talks to this other mail server in the office365 domain - internally using TLS and IPv6, which then connects to another internal system - before then being delivered to my mail server - which then feeds it into spamassassin, as we've seen before - So this email did go places in between being sent and being delivered into my inbox! But --- if we look at this path, how would we know that the email that entered the system is indeed the one that was delivered in the end? That is, how do I know that - the party invitation from my good friend Barack over here is actually authentic? It could have been the case that in fact - it was Michelle who wanted none of Barack's nonsense this weekend, - but Barack hacked into the intermediate mail servers and changed the mail! Now we already know that the origin IP may not have been trustworthy for the obama.org domain - which is why SPF told us to be careful, but also note that we have an additional header -- "Authentication-Results" -- that tells us something else: - It says "message not signed". Meaning a few things: It implies that messages _can_ be signed, that this message was _not_ signed, and that we have to learn about one more system -- DKIM. --- "DKIM" stands for DomainKeys Identified Mail, and is a way to detect email spoofing by providing a digital signature across parts of the message. It was - developed by combining two parallel efforts by Yahoo and Cisco, then - standardized in RFC4871 and now RFC6376, and it - adds yet another header -- the DKIM-Signature header, which we'll see in action in just a second. And if you're wondering what else we might need if we want to verify such a signature, and if you then guessed "oh, I don't know, but I bet it involves the DNS", well - you'd be right. We get even more DNS TXT records. Look, I told you that we depend heavily on the DNS, even though nobody uses DNSSEC and the DNS really cannot be trusted. Welcome to the internet! --- But let's look at DKIM in action to understand just how exactly it's supposed to help us. Let's send an email from our own mail server to my Stevens address. My mail server is configured for use with DKIM, so will sign the message when it sends it. It will then bounce around the Stevens -- or rather: outlook / office365 mail servers as we've just seen before being delivered back to this mail server. There it is. Let's take a look. Here are our headers. We see the DKIM header, but let's quickly weed out the ones we don't care about and have mutt display the DKIM-Signature and Authentication-Results headers in the default display. There. So here we see multiple DKIM signatures, and multiple Authentication-Results, since our email traversed multiple mail servers. Let's start at the bottom: Here we see the signature made by our outgoing mail server, with Authentication-Results showing both a pass of the SPF check as well as the DKIM -- "signature was verified". Likewise, we see that the Stevens mail server over at Microsoft also signed the mail when it forwarded it back to us. But wait... it says down here "signature was verified" -- how did the remote server know how to verify the signature? And what exactly does this blob of a signature here actually signify? [pause before host lookup] So, the - DKIM signature includes information about - which domain it is responsible for -- netmeister.org, in this case; - the so-called "selector", which allows for using multiple keypairs. In our example, the mail server uses a selector of "2021". - It then includes the hash of the body of the email, thereby guaranteeing that the body cannot be modified, and I know that it was indeed Barack's invitation to his party and not Michelle's note nixing the party. - But DKIM also extends to the headers, since, as we've seen in this video, they play a rather important role here. So we are informed which headers this signature covers -- "To", "Subject", "Date", and "From" in this case, and of course it then - includes the actual signature of all this data. But ok, so now... how do we verify this signature? Don't we need to know which public key to use to verify the signature _with_? Well, that's where the DNS comes into play again: [continue] in order to determine the right key, we can look in the DNS for the correct public key for the given selector. - We combine that with the string "_domainkey" and the domain in question, and we get back the public key that can verify the signature in the header. Pretty neat, huh? But hey, what's this down here? In the DKIM header, it says something else: - "dmarc pass". Now what on earth is "DMARC"? I thought we were done here! --- Well, not so fast, I'm afraid! We've seen that we can use SPF and DKIM to provide some assurances, but we still don't quite know what to do when we find a mismatch. As a receiving mail server, we have to be quite careful about what mail we reject, since a falsely rejected email is something that people can get pretty upset about -- much more upset than about a falsely accepted Spam message. So how do we get the sending domain to tell us what we should do when we encounter bogus SPF or DKIM information? Well, for that, we have DMARC: "Domain-based Message Authentication, Reporting and Conformance" As noted, it combines the use of - SPF and DKIM, but also - checks that the SMTP Mail From and the "From:" header are in alignment, as well as what to do when there are mismatches. - It further defines how to report such problems and to whom, and, of course - it uses the DNS. I bet you saw that coming, though, didn't you? So now --- let's see _that_ in action, too. For example, we can look up what Yahoo's DMARC policy is in the DNS. This entry here tells the world that if you encounter email that doesn't match DKIM and SPF, then you should reject the mail. So now let's see what happens when we talk to one of gmail's MX servers and pretend to send an email from a Yahoo address. We know that Google should now perform an SPF check at a minimum. Let's construct a bogus email to some random gmail account. Because of Yahoo's DMARC policy, this email should get rejected... ...and it is. And note that it is explicitly the DMARC policy that triggered this rejection. Let's see what other company's DMARC policies look like: Google, for example, is a bit more lenient -- rather than instructing mail servers to reject mismatched mails, it says "go ahead, accept them, but please mark them as bogus and put them into a quarantine inbox", while Facebook, for example, is strict like Yahoo here. So how did DMARC help us here? - We saw that SPF can tell you who is authorized to send mail, but you could conceivably use an authorized IP, but then set the "From" address later to something else, which users wouldn't notice -- DMARC can enforce alignment here. - We know that DKIM can sign parts of the message, but we didn't have a good mechanism to tell the receiving mail server what to do when it encounters a mismatch; DMARC provides this mechanism. - But DMARC also allows some finer control -- it's not "reject or accept", but - you can also quarantine, for example. And finally, if you'd like to know what the different problems are that mail servers observed for your domains, you can - ask them to send you an aggregate report, which allows you to then fine-tune your antispam mechanisms accordingly. --- All right - let's take a break here. We covered a lot. In our effort to understand how Spam protection mechanisms work, we've seen - that we need to pay attention to the SMTP headers, - some of which are mandatory, some of which are optional, - and some of the optional ones aren't even quite so optional. We've also seen that each - hop along the path an email takes may add additional headers, which provides us with a pretty detailed view and a lot of metadata. Now since email is such a simple protocol and has no authentication restrictions built it, it's easy enough to spoof messages, so we - need a range of spam protections. For starters, - we don't allow just about anybody to send email through our mail servers any longer, only to the domains that we're in charge of. - We can perform dynamic lookups in public blocklists via DNS lookups to ascertain the IP reputation of the sender and then decide whether we want to talk to them; we saw that in our earlier videos. - We can use the Sender Policy Framework to define who is allowed to send email on our behalf, and - we can use DKIM to sign parts of the email to assure the recipient of it's authenticity and integrity, and - we can inform receiving mail servers how they should handle mismatches via the DMARC mechanism. Now all of this is probably a fair bit more complex than you initially assumed when we started talking about the Simple Mail Transfer Protocol, but it also isn't the full story just yet! Think, for example, about what happens when mailing lists handle and redistribute emails, and what the impact of that on DKIM signatures is. We won't be able to discuss the solutions to this issue in this video, but perhaps you consider this as another jumping off point to extend your research. Finally, though, --- it's important to consider the implications of running a large scale email service. As we've discussed here, - you need to have solid Spam protections in place, many of which - overlap but aren't quite identical with phishing protections. We'll discuss those a bit later in the semester. - Now if you are sending a lot of email, you need to understand the impact of all these lookups we're performing, for example, - as well as all the traffic we're logging for each connection. As we've seen in an earlier video, there's quite a few messages logged for every individual message, so now multiply that by a few million... We've also talked about Spam a lot here, but Spam is... well, subjective. What's unsolicited bulk email to you may be an important business newsletter or customer campaign for me. - How do you send lots and lots of email without being marked a spammer? The mechanisms we discussed here can also help you in this regard. Now in the end, you may decide that getting all this right isn't easy, and perhaps you should - outsource this to another company. We've seen that Stevens, for example, has made that decision by having Microsoft handle all your email -- just like Cloudflare handles most of the HTTP traffic for Stevens, but as always, this has a lot of implications, not the least of which - are privacy implications, because now all your emails are floating around the internet across systems owned by a third party. The list of difficult aspects and implications of managing email at scale goes on, and I hope I've been able to give you an impression of the depth of this topic. As always, we can't cover all nuances and often times only hint at some of the larger topics. The best way for you to understand the material, though, remains playing around with the commands and examples from these videos yourself: try to find out how the different popular email services handle sending and receiving email, look at the headers of the mails you receive, try to spoof some mails and see what spam protections are in effect. I'm sure you'll find out some interesting angles and have a bit of fun in the process. And with that, thanks for watching - cheers!