Hello, and welcome back to CS615 System Administration! This is week 8, segment 1, and after we've spent some time talking about the Domain Name System, we'll now dive into another crucial service that every system administrator needs to understand as they inevitably end up using, maintaining, and troubleshooting it: E-mail. Just like the DNS and other simple protocols like HTTP, email utilizes a simple text based protocol -- SMTP, the appropriately named Simple Mail Transfer Protocol -- and can trivially be observed and analyzed. But... wait a second? Didn't we get told that email is dead and that Slack has replaced all email communications and freed us from the yoke of constant interruptions and sending attachments back and forth in endlessly growing threads? --- Well... about that. I have some bad news for you. Email is not dead yet. It remains a critical service. Here are some rough numbers pulled from various websites, so consider them ballpark numbers at best, but they should give you an idea as to why it's important for you to actually understand how email works: - There are currently estimated to be about 5.5 billion email accounts in use, with - Gmail making up a good third of them. That's right, there still are email accounts other than gmail accounts, although of course Google runs many companies' email, which is another example of where one company uses their direct competitor's services, which is kind of wild to think about. And with these 5.5 billion accounts, there's - a lot of emails that get sent and received every day. The current estimate is around 300 billion emails, which breaks down to about - 3.4 billion emails per second worldwide. Just think about all the traffic this generates! And you can bet if this service goes down, your boss is going to notice you exist after all. - Poor office workers receive an average of 120 emails per day, which... is just painful. Of course not every email you receive is one you actually read or have to read, since - somewhere between 30 and 70 percent of all emails are Spam. This range is so large because it actually is not easy to correctly classify "spam" -- sure, you know what spam is to you, but if you're an email service provider, it actually is really difficult to correctly identify spam, and believe me, if you accidentally incorrectly classify a single non-spam message as spam, your users are going to be upset. Now of course Spam is a topic that could fill a few weeks worth of materials all by itself, but we'll try to provide a bit of a summary of the problem space and some possible defense mechanism in the next videos. And of course email has been such a success story that it's been used -- abused, if you will -- for all sorts of other purposes. As you well know, people use it as a general file transfer protocol by sending attachments to themselves and others, as a todo list, and, in our area, - as an alerting and monitoring system. Which of course is a terrible idea, because email really is not a reliable alerting mechanism at all, as we will see when we look at the details of the protocol. We'll discuss better approaches to monitoring systems in a future video, too, but let's go back down to the basics, as we've done with other protocols: --- So here's our naive view of how email works. One person - composes a message and hits "send", and then the email --- will be delivered to the recipient's inbox. Bam, just like that. But of course we know that there's bound to be a bit more to the story, so --- let's go back and start over. We said the user is going to compose their message, which actually involves the first component of the mail system: - the Mail User Agent, or MUA. This is the program that you use to read your email and to compose your emails. There is a large number of possible MUAs, ranging from command-line clients, terminal apps such as the mutt(1) or pine(1) programs common on many Unix systems, the Mail.app program on your macOS system, Outlook, Thunderbird, etc. etc. Now obviously nowadays many people use the browser as their Mail User Agent, as "webmail" is the dominant form of email, which brings with it all sorts of major headaches, since of course nowadays email, which used to be, well, simple text, is by and large interpreted as HTML, which facilitates phishing attacks via link obscuring, "official looking" emails through the use of images or logos etc., and which also allows for all sorts of privacy invading tracking methodologies, such as beacons and trackers, causing the MUA to make a callback to the sender's server, allowing them to track just when and where you opened the message and so on and so on. By and large, HTML in emails was a terrible idea, but I digress... So anyway, the user uses this MUA to - read their email, and when they then do send it on its way, it --- is not magically transported to the receiver, but rather it's handed off to the designated Mail Transfer Agent, or MTA. This is what we generally refer to as the "mail server", and in the Unix world the most common pieces of software you may have heard about here are postfix, sendmail, and qmail, perhaps. The system we see here is the outgoing MTA, which takes the mail you handed it, looks at where it needs to go, and --- then sends it on its way to the receiving MTA. That system then may do a number of things, as we'll see later, but if it eventually decides to accept and deliver the mail, it may then --- hand it off to other mail processing systems such as an engine to assess its spaminess before it eventually is passed on to a so-called Mail Delivery Agent, which may sort the mail, duplicate it, forward it, or otherwise process it in some way. 'procmail' is a commonly used MDA on Unix systems, and it can be used system wide, or it can be invoked by each user locally. So now the email has been accepted and filtered and sits there in the user's inbox, and if the user accesses the system directly -- say, they are logged in locally into the mail server and use a local MUA such as mutt(1) or pine(1), then they can now read the mail. But often times the mail server is not directly accessible to the user, and they are more likely to --- use a separate service, an Access Agent that allows remote access via a separate protocol, such as POP or IMAP. This is what you configure as your "incoming mail server" in your --- Mail User Agent, which then can pull the emails down from the server. Now if you _are_ using such a service, then it's quite likely that when sending an email you do not hand it directly to the outgoing MTA, but --- instead interact with the Access Agent here as well, which kind of completes our high-level overview of the mail system. As you can tell, there are a number of moving parts here, with plenty of opportunity for things to go wrong, but for our immediate discussion, we'll only - focus on the work the MTAs are doing. That is, we're going to look at what happens between these two parties, as this is where the - Simple Mail Transfer Protocol is spoken. So let's just go ahead and observe what happens when we send a simple email: --- Here we are again on our EC2 instance, and we start a packet capture for traffic to port 53 and 25. Then we use the mail(1) command to send an email from this system with a subject of "SMTP Test" to my personal email address here. The mail(1) program is simple enough -- we just type our message in here, then end it with a period on a line by itself. And that's all there is to it, so we can stop our packet capture again. [pause] Now before we look at the packets, let's first see what our mail server says about what it did with the message. Since we are on a unix system, we have a local mail transfer agent or MTA running -- postfix, in this case. This allows for local delivery without involving any network traffic, but since we specified a remote address, we expect the server to hand it off to a remote system somewhere. The mail server logs informational messages into the file [continue] /var/log/maillog, so let's take a look at that file. Here we see that our postfix mail server picked up the message we composed from our userid and assigns it a unique identifier -- the string EB4851CEAD over here. [pause] It then also generates a unique "message-id" for this email, to facilitate tracking of the message across systems. So we have two identifiers here -- the local string EB4851CEAD, and the message-id, which includes this string, thereby making it again easy for us to correlate and track messages across multiple systems. [continue] We then see that this mail server tried to talk to the host panix.netmeister.org on its IPv6 address, but that system didn't like that our connecting server here doesn't seem to have a reverse DNS PTR record for its IPv6 address, so denied its mail. This MTA then tried again using IPv4, for which there exists a valid reverse PTR record, and the remote side then happily accepted the mail, informing us that it queued the mail as identifier E843D855AB. Having delivered the mail, our local MTA can then remove the message from its queue. [pause] So with these few lines, we can tell that our mail server itself consists of several components, and that it tries its best to ensure that we can track the message easily. We also saw a first anti-spam mechanism on the receiving end: that server appears to reject messages from systems that don't have a reverse DNS record. We'll talk more about this and other defense mechanisms in one of our next videos, so for now let's take a look at what network traffic we captured. [continue] Up here, in the first packets, we see that our server is looking up a DNS MX record for the domain "netmeister.org". [pause] This is how we figure out what our mail server is, similar to how we saw earlier how a nameserver determines what the authoritative nameservers are for a given domain. That is, when we send an email to foo@example.com, then the sending MTA needs to perform an MX lookup for "example.com" to find out what mail servers it should talk to. The result returned includes a hostname -- panix.netmeister.org in this case, so unsurprisingly [continue] we then see our host here look up the A and AAAA records for that name. With the information about the IP addresses in hand, we then see our server making a TCP connection to port 25 on the receiving server, TCP handshake SYN, SYN-ACK, ACK and then here we see the SMTP protocol packets where we identify ourselves and try to send a mail but then get rejected. So our trusty mail server doesn't give up quite so easily and will then attempt again over IPv4, where it's successful and receives confirmation that the mail has been queued. Ok, so far, so good. Now let's take a look at what this looks like on the receiving end: --- Here we are on the receiving mail server. Remember the identifier the mail server told our sending MTA it had assigned to the mail when it accepted it? We can check for that identifier in the receiving mail server's log files. And then here we see that we accepted this mail from our AWS EC2 instance with the message-id our sending MTA had generated. We then see that our MTA takes this message and pipes it into a separate service -- spamassassin, which is an independent program that processes mail and attempts to determine whether it's spam or not. Since this is a separate process, the mail server here considers this message to be delivered and removes it from its queue, but then the spamassassin process, after it has completed whatever it was doing, _reinserts_ the message, which is why we're seeing a new identifier here -- 13D7B85A22. But notice that that message still has our earlier message-id, and is then accepted by the mail server for local delivery, which it does by feeding it into the Mail Delivery Agent "procmail". Now procmail will process the mail based on the user's preferences, and, according to _its_ log as shown here, has delivered this message based on a subject match into the mailbox /tmp/smtp-test. We can then use our preferred Mail User Agent mutt to open this mailbox, and here we go, there's our mail! If we view it, it looks just like we'd expect, a real, regular, normal email, just as we had sent it from our originating system, showing the "From", "To", and "Subject" headers as we'd expect. Alright, so we've now seen an email across the entire mail system from composing the email using the mail(1) command as our admittedly simple MUA; to it handing it to the local MTA, postfix(8) in this case, which found the responsible mail server and handed off the mail to it, even if only on the second try, to it then processing the mail locally via spamassassin and procmail and us using mutt(1) to read the mail. --- In the packet capture, we saw the DNS lookup and the TCP packets, and since those were all clear text, we saw the protocol in action. So we should be able to re-enact the exact exchange manually -- let's give that a try! - So first we said we needed to find out which mail server is responsible for the domain in question, so we start with an MX lookup. Once we have that, we need an IP address. Then we can use the telnet(1) command again to connect to port 25 on that IP address. The server greets us with a 220 banner, and we say EHLO, followed by our hostname. The server replies with a list of SMTP options it supports, but at this point, we don't really care, so we start sending our mail by specifying the MAIL FROM command and providing our sender's address. The server seems to be ok with that, so we then specify the recipient of the email, which also meets no objection. Next, we tell the server "ok, watch out, here's the data", upon which we can then provide the full email. Now note that at this point we are able to specify another "From" and "To" header -- the difference between these and the MAIL FROM and RCPT TO fields will be discussed in our next video. Next, we provide the subject header, an empty line to signal the start of the mail body, and then we include our text message here. When we're done, we signal the end of the message via a single dot on a line by itself, and we see that the mail server then accepts our message and queues it for delivery, providing us with the identifier shown here. The connection to the server is kept open, so we could send another message here, if we wanted to, but we're ok for now, so we say goodbye. And there you have it, sending an email using telnet. See, I told you that SMTP was a _simple_ mail transfer protocol, didn't I? --- Ok, time for a break. Let's summarize what we've observed so far. - For starters, we noted that the default protocol and port is TCP 25, and that we can easily observe all the data in clear text, which makes for rather convenient troubleshooting and debugging, but obviously isn't quite so great from a security and privacy perspective. - We saw that the service lookup uses the DNS for discovery of which mail server is responsible for the given domain, and we then observed - that SMTP really is a very simple protocol following a trivial dialog structure with the most basic steps - being shown here. The return codes in the SMTP protocol are once again simple numeric values that indicate what action the client should take next, which is rather similar to HTTP status codes, as you will notice. Now with this first look at SMTP, we're far from done covering the email topic, so in --- our next video, we'll dig a bit deeper, in particular into the receiving end, where we'll look at packet captures on the mail server as well as observe multiple relays. We'll also - see what we can do about protecting the contents of our messages in transit by wrapping the mail communications in TLS. We'll further dissect - the anatomy of an email message and look at the various headers that make up the message and what those headers tell us about the message and the mail system by which the message was delivered, and - of course we'll have to spend some time talking about how the mail system can be -- and is -- abused. As you've seen, we can connect to a mail server and just hand it an email, claiming to be from anybody, really, so we have to think about how to add at least some layers of authentication or other assurances on top. In the mean time, I recommend that you once again follow the commands and examples from this video, and then perhaps - take your investigation a step further by running through this practical exercise from this URL. You'll find much of the discussion here to make a bit more sense if you've manually observed the packets and messages yourself. Until the next time - thanks for watching. Cheers!