Hello, and welcome back to CS615 System Administration! This is week 6, segment 2, and we're continuing to trace a simple HTTP HEAD request from one system to another. In our last video, we used the ktrace(1) utility to inspect the executable and see what files it opens and what system calls it makes. In the process, we learned how the system determines how it should resolve a hostname to an IP address, and we saw it make connections to the DNS server before then sending the HTTP request to the destination via TCP. But we didn't look at all at the network packets we had captured, so in this video we'll pick up right there: So remember, this is how we started our invocation of telnet(1): we told tcpdump(1) to capture all packets except for SSH traffic, flushed our ARP cache, and then began the execution of our command. So now we have all our network packets here in the file 'simple.pcap' and can begin our analysis. If we look at what we captured, we see a whole bunch of packets having been logged between different IP addresses, so let's first verify what _our_ IP addresses are on this instance: 'ifconfig' tells us that out IPv4 address is 10.10.0.47 and that we have two IPv6 addresses, a link-local address starting with fe80:: and a global scope address, starting with 2600:. Now the first two packets in our tcpdump look like this: We see a TCP SYN packet being sent from this IP address -- 162.142.125.150 -- to our 10.10 IP address on a random port. We don't have anything listening on this port, so of course we send back a RESET. But... what's up with that? We didn't ask for this traffic, why are we seeing it? It has nothing to do with our telnet command. Let's look at who might be behind the sending IP address. Well, the address reverses to a name in the "censys-scanner.com" domain, suggesting that what we're seeing here is a generic portscan, the kind of thing that just happens to every system that's on the public internet: random outside systems begin to scan you and try to identify just what exactly you're exposing to the internet. Some of these scans are malicious -- somebody trying to actively or at least opportunistically break into your host -- and some are just overall internet reconnaissance. In this case, the IP address appears to belong to the "censys.io" service, which provides attack surface detection and analysis as a service. We can quickly check what this service knows about our IP address here, but... ...of course that's not going to be very productive, since the IP address we entered here is an RFC1918 address, a private address. As you probably are aware, our AWS instances are provisioned with RFC1918 IPv4 addresses, but we _are_ able to reach them from the internet via a network address translation or NAT mechanism. Our _public_ IPv4 address for this instance is 54.80.35.155, so let's put that in here instead: And here we go. Look at that, the system knows quite a bit about us: It knows that we are in Amazon -- which we recently learned is easy enough to figure out via a simple WHOIS lookup. It knows the netblock and AS numbers, of course, and after a port scan it was able to determine that SSH is running on port 22. From there, it was then able to fingerprint the SSH version and determine even the operating system version. Since we don't have anything else exposed here, that's all the system knows, but this goes to show you that yes, anything you put on the internet will be scanned rather quickly, and what we saw in our TCP dump was just such a routine port scan. But ok, let's now focus on the packets that _do_ have something to do with our simple request, shall we? After the first two packets that we just discussed, we then see a few ARP packets, numbers 3 through 6 and packets 8 and 9 over here. Let's just extract only the ARP traffic. There. What we see here is the link-layer lookups of the MAC addresses associated with some of the IP addresses on our layer 2 segment. Let's see who we're talking with here. Remember that if we want to talk to any host that we can't communicate with directly here in layer 2, then we send the packet to our default route, which of course then _must_ be on our layer 2 segment. So our default route for IPv4 appears to be 10.10.0.1, and our address is 10.10.0.47, which is why we see these ARP broadcast queries and the respective responses up here. The default router is asking what the MAC address for our IP address is, and we're asking if anybody out there knows what the MAC address for the default router might be. Then we each reply with the answer, update our ARP cache accordingly, and then are able to talk to one another by creating an ethernet packet with the correct MAC address of the other side. Note that we also see an ARP request from us for the MAC address of 10.10.0.2, which you may remember from the last video was the IP address of our DNS server. That is, at this point our tool has already gone through /etc/nsswitch.conf, /etc/hosts, and /etc/resolv.conf to determine that it needs the IP address of the DNS resolver, found that to be 10.10.0.2, and is now looking to send a packet to that address. We know that DNS uses UDP port 53, so let's see what relevant packets we find in our tcpdump for that protocol: Here we go. We see our system sending a query to the DNS server 10.10.0.2 on port 53 -- the first packet for which the previous ARP lookup was necessary. We're asking the DNS server for a quad-A lookup of www.yahoo.com and receive an answer, including a CNAME and several quad-A records. We'll go into the details of the DNS system in a future video, but here we then also see a second query for an A record, an IPv4 address, also with a similar response from the DNS server. [pause] Alright, so now we know what IP address we want to connect to, and since the reply we got from the DNS server included an IPv6 address, and since that address is not on our local broadcast domain, we are now creating our TCP packets to hand to the default router, 10.10.0.1, for which we have the MAC address in our ARP cache. [continue] As you all know, a TCP connection is initiated via the three-way handshake, so let's just look at those three packets. Here we go. Our global scope IPv6 address sends a packet to the destination address on port 80 with the SYN flag set, the remote side acknowledges our packet and sends a SYN as well -- the SYN-ACK and finally we reply with the ack of that packet, thereby completing the three-way handshake "SYN, SYN-ACK, ACK". We're now in business and have an open TCP connection. Now let's look at the actual traffic once the connection has been established: Here we see our system sending data to the remote side -- the PUSH flag is set, and tcpdump helpfully shows the ASCII data we are sending: "HEAD / HTTP/1.0", which, with the carriage-return line-feed is 17 bytes long, hence showing as sequence numbers 1 through 18. The remote side acknowledges byte 18, and we send an empty line -- another carriage-return and line-feed, i.e., two bytes. Yahoo's server acks these two bytes, then sends us a 234 bytes long response that starts with "HTTP/1.0 200 OK". After that, and because we are speaking HTTP 1.0, the remote server closes the connection by sending a packet with the FIN flag set. Our server acks the data the remote server sent, then acks the FIN packet with its own FIN flag set, upon which the remote server acks _that_ FIN, and our TCP connection is now terminated. And there you have it -- that's all the traffic in our tcpdump file here for our simple HTTP request. Now having seen the packets in the tcpdump, we can visualize exactly how the request was made: We illustrate our instance up here in the upper left in Amazon's AS14618 network and Yahoo's web server on the right in its AS36646 network. As we saw in our tcpdump, we started out by sending an ARP request looking for the DNS server, which whatever network switch we're connected to will broadcast to all connected systems, upon which the DNS server will dutifully reply with its MAC address, allowing us to then send our UDP packet to port 53 on the DNS server, asking for the AAAA records of www.yahoo.com, which then replies with the correct answer. Now at this point, we're ready to send a packet to the Yahoo web server, but that system of course doesn't live on our local network, so we need to hand the packet to the default router. So next we then send out an ARP broadcast request for the default router, which then replies with its MAC address, so we're then ready to wrap our TCP packet to the Yahoo web server into an ethernet frame for the default router and send that packet out. The default router then will forward the packet to the next hop, where it'll at some point cross the internet until it reaches the router on Yahoo's network, which will then forward the packet to the right web server. The webserver processes it and sends back the reply. And so we note that our simple request that we already identified as not being quite so simple ends up going across a few layers and systems, all together illustrated over here like this, with dotted arrows representing ARP traffic, blue arrows representing DNS traffic, and the black arrows representing the HTTP traffic. It's also worth noting that we are still quite far from a complete view of this "simple" request, as we glossed over all of this here. But don't worry, we'll cover at least _some_ of that in a future video, too. For now, let's just observe that looking at our TCP/IP stack, even a simple request crosses multiple layers and protocols. In our example here, we've used the HTTP protocol on the application layer, because it's a simple request-response protocol that lends itself to trivial analysis. We'll discuss HTTP in more detail and with an eye towards the serving aspects in a future video. The other application layer protocol we've used here is, of course, the DNS protocol. That, too, will be covered in more detail in another video in the future. Below the application layer, on the transport layer, we've seen both TCP, used for HTTP, and UDP, used for DNS. These two transport layer protocols were encapsulated by a network layer protocol, the Internet Protocol -- IP -- as in the case of our communications with the DNS server, and the IPv6 protocol, which we used to talk to Yahoo's web server. Finally, at the bottom of the stack we saw by example of the ARP protocol how those packets were encapsulated on the link layer. Up and down the stack we go... But before I let you go, take another look at the tcpdump that we collected and review the packets we presented here in this video. There were at least two things we didn't explicitly address: We observe ARP requests from/to the default router before we talk to our DNS server, but when we ran through our example, we claimed that the ARP request for the default router happens only after we've received the IP addresses from the DNS server. In fact, the tcpdump does show that that ARP request happened before -- but why is that? Secondly, look again at the ARP packets and the responses from both the DNS server and the default router -- what does that tell you about the network layout of this particular segment? Together, these two thought exercises allow you to hopefully realize just how much information we can extract from observing the packages in even such a simple request as the one we ran through here. In our next video, we'll reinforce some of these lessons by walking through a few more protocols and layers by extracting information from packet captures. Until then, thanks for watching - cheers!