Hello, and welcome back to CS615 System Administration! This is week 7, segment 2, and we continue our discussion of the Domain Name System. In our last video, we provided a bit of historical context as well as presented the hierarchical structure of the domain name space, so now it's time to go back into the trenches, dig out our trusty tcpdump, and look at some packets to really understand how the name resolution process works. So let's not waste any time and jump right in... --- We know that DNS uses port 53 by default, so let's start capturing those packets here, and then perform a simple DNS lookup. There we go. We can stop our packet capture and then take a look at the result. The nslookup(1) tool tells us which nameserver it used -- 10.10.0.2, which no surprise, is the one configured in /etc/resolv.conf, as we recall from our earlier video. Next, we'll note that the result presented by nslookup is marked as being "non-authoritative". This is because the nameserver that we asked for the results is not the nameserver that's in charge of the zone in question. We'll go into the details of this distinction in a little bit, but for now we note that authoritative or not, we did get back the results here. So now let's look at what packets we captured. Now that doesn't look very complicated at all! We see our query to the nameserver asking for an A record together with the result it returned, then a query for a AAAA record, again with the result. That's all. But let's look at these packets in more detail, --- this time using Wireshark. Let's select this packet here, and we see that Wireshark correctly identifies this as a DNS query packet, which contains a single question for an A record. The response is in the second packet, which contains 5 answer resource records, namely the canonical name for www.yahoo.com as well as the different IP addresses that canonical name resolves to. Our second query here, the request for AAAA records yields a result that looks similar, but note that here we did not ask for the AAAA records of www.yahoo.com, but for those of the CNAME. When we drill down into the flags here, we note that this is where we are informed that the response was non-authoritative. --- So our little example all by itself already illustrates some important aspects: One, there is a difference between an _authoritative_ name server and server that simply _resolves_ things. That is, - an authoritative name server provides (authoritative) answers; a resolver relays answers it determined by asking the right authoritative name servers. These resolvers typically cache these results for some time to avoid having to go and ask the authoritative servers again and again. For this reason, they are also often times called a "caching resolver". We also saw that this simple request - involved several independent queries and multiple resource record types: - We asked for 'www.yahoo.com', which had a CNAME record, indicating that the canonical name for wwww.yahoo.com is new-fp-shed.wg1.b.yahoo.com. The tools we used as well as the DNS itself are smart enough to know that users are most likely interested in the IP addresses of the names they ask for, so unless specified otherwise, they will then look for both the - IPv4 addresses -- A records -- as well as - IPv6 addresses -- AAAA records. --- Taking these packets and visualizing them gives us this image: We ask our resolver to get us the right IP addresses, and then - well, basically, as best as we can tell from looking at our packets, some magic happens over here, and the resolver can then - give us the answer. --- We repeat this by asking for the AAAA records, - again some magic happens, - and the resolver hands us the results. But come on, this is a CS class -- we don't do "magic", we want to actually _explain_ and _understand_ what happens over here! So let's try to break this down. --- We'll start another tcpdump, and this time we're trying to be a bit smarter. We want to get an authoritative response, so we want to find out who's in charge of 'www.yahoo.com', so we ask for the nameserver responsible for this name. Again we get back the response that www.yahoo.com is a CNAME, but we also get information about where we can find an authoritative answer: by asking yf1.yahoo.com. So let's go ahead and ask _that_ server. There. Here we see that 'nslookup' didn't ask our local resolver, but asked yf1.yahoo.com directly, and the answer is missing the note about this not being an authoritative response, since, well, this _is_ an authoritative response, directly from the horse's mouth. So let's see at what the packets look like now: Here we go. The first few lines show the NS lookup against our resolver, and then down here we see that we're asking yf1.yahoo.com directly. So let's again visualize what we did: --- Step 1: we want to determine the name server responsible for www.yahoo.com, so we ask our local resolver for the NS record. Then... - some magic happens, and the resolver - tells us the answer. But now we only know that we need to ask "yf1.yahoo.com", but of course we need an IP address for that name, so --- we again ask our local resolver for the IP address of yf1.yahoo.com, - the same magic happens, and the resolver - gives us the IP address of yf1.yahoo.com, so that --- we can _now_ ask - yf1.yahoo.com for the authoritative answer to the question of life, the universe, and everything, or, perhaps, more simply, what the IP address is for new-fp-shed.wg1.b.yahoo.com. Which this nameserver - then can provide to us, thank you very much. But we've still had to involve magic over here in steps one and two, so that's no good. Suppose we didn't have - this magic resolver down here. How would we - get these answers? That is, instead of _relying_ on a resolver, what if we _were_ the resolver? How do we find all the answers? For that... --- ...let's remember what we learned in our last video about the tree structure of the domain name space. Looking at this, we know that --- in order to find out where 'www.yahoo.com' is, we need to first --- find out where yahoo.com is. To find out where yahoo.com is, --- we need to find out where "com" is. And how do we figure out where "com" is? I have an idea: --- Let's ask the root! --- So here we go: we send our query for www.yahoo.com to the root nameserver. Well, at least that's what we _used_ to do for decades. Every query we had, we'd send to every nameserver in the entire process - but the root nameserver really wouldn't care about your whole query, and telling everybody on the internet all the names you're looking up has a number of privacy implications, so nowadays many name servers implement "DNS Query Name Minimisation", whereby they - strip off all the labels and thus send a more private query. So we're going to ask the root nameserver about the "com" TLD, and the root DNS nameserver is going - to reply with the nameservers that are responsible for that TLD. But again, since the nameserver knows that you're likely going to want to contact the "com" TLD nameservers, and that you'll need their IP addresses for that, it - will helpfully supply the IP addresses right away _in the same response packet_, in the so-called "additional" section. Alright, cool! Now we have the IP addresses of the gTLD nameservers responsible for "com". Let's --- ask one of those. Note that now the name we're asking about has been extended from "_.com" to "_.yahoo.com", in effect asking "hey, do you know where I can find answers in the yahoo.com zone?" Since that server is authoritative for the entire "com" zone, it doesn't need to invoke any magic and - can tell us that ns1.yahoo.com and friends are responsible for yahoo.com. It then - also tells us the IP addresses, how convenient! --- So then we can ask ns1.yahoo.com, which will - tell us that 'www.yahoo.com' is a CNAME and that we really should be talking to yf1.yahoo.com. This illustrates what we talked about in our previous video regarding the ability to further delegate any zone: just because a company controls one zone does not mean that there's a single authoritative server for all names under that zone. So in this example, yf1.yahoo.com is responsible -- or authoritative -- for the wg1.b.yahoo.com domain, and --- we then ask _that_ server for the IP addresses of the canonical name for wwww.yahoo.com, which - it then returns to us. And this, then is the rough outline of what lookups a caching DNS resolver performs whenever you ask it for an IP address. Now note that of course the caching resolver will, well, _cache_ the results, so the next time you ask it for a record in the yahoo.com domain, it won't have to go all the way back to the root or the gTLD nameserver, but it can directly ask ns1.yahoo.com, and if you were to look up, say, www.google.com, then this resolver would only need to ask the gTLD server where to find Google's authoritative server and so on. As you can tell, that's quite a few packets floating around the internets here, but we can still capture them: --- But to fully observe these lookups we just illustrated, we'll need to turn our EC2 instance here into a caching resolver. Fortunately, this is really easy to do on NetBSD: we just enable "named" in /etc/rc.conf, update our /etc/resolv.conf to use localhost as the resolver, so we don't talk to the default resolver at 10.10.0.2 any longer start our tcpdump as before and start the BIND name server. Then we flush any outstanding caches real quick to ensure that our tcpdump captures _all_ queries when we issue our lookup request, run nslookup again, and there we go. We now see that 'nslookup' asked our caching resolver on localhost, which provided us with the answer. So now let's take a look at all the packets we captured: You'll note that there's quite a few packets here, including TCP and UDP traffic, and including various RR lookups we haven't talked about. But as you scroll through the output here, you should be able to identify the queries we discussed: the hit to the root nameserver, the query to the gTLD nameserver, as well as to yahoo's nameservers. Since teasing out these packets will be part of your next homework assignment, I'm not going to individually highlight each query here, but if you've been playing along with the videos here, you should have no problem identifying the right packets. Do pay some attention to whether we're using UDP or TCP for which lookups, though: we generally talk about DNS traffic using UDP port 53, so why are we seeing TCP traffic here? Another question that you should ask yourself is that while we avoided relying on magic for most of the lookups, we did simply declare that we were going to ask the root nameservers for our first query, without discussing just how we know what the root nameservers are. How we find out, as well as some other considerations of these rather important, internet infrastructure load-bearing systems, will be the topic of our next video. Until then, thanks for watching! Cheers!