Hello, and welcome back to CS631 "Advanced Programming in the UNIX Environment". In our last video, we started our discussion of the socket(2) system call and observed interprocess communications using datagrams in the UNIX or LOCAL domain. This allowed unrelated processes on the same system to communicate with one another; in this video, we'll see our first examples of communications between two hosts over the internet. As a reminder, this is what the socket(2) syscall looks like: we specify a domain, a type, and a protocol, which then creates a socket suitable for use with the conventions associated with those properties. So without further ado, let's dive right into our code example: Once again, we have based our example on the BSD IPC tutorial found on your system under /usr/share/doc, and once again it is split into two programs: a reader and a sender. The reader, dgramread.c, looks like this: Here, we call socket with a domain of PF_INET and type SOCK_DGRAM, meaning we will be using datagrams in the internet domain: UDP We fill in our sockaddr struct but this time, instead of providing a pathname, we need to specify an IP address. We can provide any IP address on our host, or, as in this example, we can simply say "allow connections to come in on _any_ of our addresses" by passing INADDR_ANY. Similarly, if we don't care which port we wish to listen on, we can let the kernel pick one for us by passing '0'. After that, we call bind(2) just like before. But since we let the kernel pick a port for us, we don't know which one it is. So to find out the port number, we call getsockname(2). As explained in the manual page, a common use case for this function is to retrieve the kernel-assigned port number. Next, we want to print this port number, but we have to be careful to first convert it from network byte order to host byte order. As you may recall, different hardware architectures may use different ways to represent numbers in memory: least-significant bit first (or little-endian) or most-significant bit first (or big-endian). TCP/IP networks are declared to use bid-endian byte order, so if you're on a little-endian system, you'd need to convert. Since it's annoying to have to know whether you need to convert or not, we outsource this to the htons/ntohs calls, which may simply be no-ops if you happen to be on a big-endian system. Oh, and by the way, the term "endiannes" does indeed derive from Jonathan Swift's "Gulliver's Travels": Lilliputians who peeled their eggs from the bigger end were called "Big-Endians", those who peeled it from the smaller end, "Little-Endians". So, you know, you learned something today! Alright, so anyway, after we print the port number, we then read from the socket, print out what was sent to us, and exit. Our UDP sender looks like this: We're asking the user to provide us with a port, which we dutifully check for validity, then mirror our socket call from the reader, and then try to turn the hostname provided by the user into an IP addresses. We then populate the struct sockaddr, again using htons(3) to convert the port number, and then happily send our message on the way. So let's build our sender. But now having the reader on the same host is a bit boring -- we wanted to illustrate communications over the internet, so I'll run the reader on a different system. For that we create a new shell and ssh to the remote system. There, we build the same code we just looked at on the system and then spawn a second shell on that remote system. Ok, so now we are running our UDP reader. It tells us that it is now listening on port 54670, which we can confirm by running 'netstat' in the other shell. 'netstat' shows us all the open ports on the system, and here you see a number of TCP connections and open ports, as well as several UDP ports, showing us that port 54670 is currently listed as being open. So now back on our local VM, we can run the sender and give it the remote hostname and the port in question... et voila, our message was delivered across the internet from our local VM to the remote system. Ok, but what happens if we try to do this again? Our sender quietly sends the message, but since there is no reader listening on the remote side, we do not receive the message. But we a also didn't get an error. Why is that? Well, remember: we are using UDP, which is connectionless and unreliable _by design_. That is, you can send a packet, but you won't know whether it will arrive. Or maybe our program didn't send the message at all? Let's observe our packets being sent. To do that, we open yet another shell, this on our VM, and now we're going to run tcpdump(3) to capture and display all packets between this VM and the remote host. With that tcpdump running, we now start the reader again, which now will listen on a new port, port number 54653. So let's send a message to that port. There we go. We see the message having been delivered in the upper right, and the tcpdump packet capture in the bottom left. Now we repeat sending our message, and again in the bottom left we see that the message was indeed put out on the wire. Ok, let's quickly rearrange our windows so we can take a closer look at the network packets. There, so now we have the packet capture on the right, our sender on the left. The first packet shown here was the successfully delivered UDP packet. Looking at the hexdump we can see all the information about the packets: The first 45 indicates an IPv4 packet of 20 bytes; the hex 40 here is our TTL (64 decimal), the hex 11 tells us this is a UDP packet. The source and destination IP addresses can be decoded by converting the hex to decimal, for example using printf. First the source address, using these four bytes here. Next, the destination IP address, using the next four bytes, which is the IP address of the system where we're running our reader on. Next, both ends of our connection have a port, so our source port here is... 65490. And our destination port... 54653, as we had specified. We also can decode the actual message, which is simply turning hex into ascii. Here are the bits in the tcpdump(8) capture. Or we can look it up in the ascii(7) manual page. So as you can see, all the information is included in the packet capture. Now after we sent the message once, we repeated it, and so we do see it having been sent out over here, with almost identical bits. But then we did receive a message from the remote end -- an ICMP packet, informing us that port 54653 on the remote system is unreachable, which is encoded here in the hex 0303 of the ICMP packet. Likewise, if we run this again we'll again observe the UDP packet being sent, and ICMP telling us that the port is unreachable on the remote side, as the reader has terminated. So while our UDP sender did not get an error when it sent the message, the system _may_ receive for example and ICMP message letting us know, but that happens outside of the UDP context. Ok, so let's summarize what we just observed and how it stands in contrast to the use of sockets in the LOCAL domain: Since sockets of type INET exist entirely in kernel space only and are identified by their IP and port tuple, we don't have to do any cleanup when we exit -- the kernel cleans up after we are terminated. We can either specify a specific IP address, or we can listen on all IP addresses that are available by passing INADDR_ANY. Similarly, we can request the kernel pick an ephemeral port for us by passing in '0'. If we want to have a specific port, we can provide that, but the so-called "well-known" ports -- ports under 1024 -- can only be bound if you have an effective UID of 0. But we may need to convert the numbers to network byte order, but fortunately we have the convenience functions htons/ntohs to do that for us. And finally, as we saw, we can fire off messages without any regard or knowledge of whether the remote site is currently listening or would receive them. This is by design. In our next video, we'll see how this is different from stream oriented protocols, such as TCP. Ok, before we close, here's again a list of questions or suggestions for you to work through to get a better understanding of UDP sockets in the INET domain: First, try to make the reader tool more user friendly by allowing it to let the user specify the IP address and port number. Next, play around with ntohs/htons and see what happens when you do not use these functions. Think about how you might want to handle a host having multiple IP addresses... Think about what happens when your system is in a dual-stack environment or is only IPv6. Can you change the sample program to work there? And finally, get some practice capturing and analyzing network packets if you've not done this before. There are many other tools, such as the wonderful Wireshark graphical user interface, which can help you drill down into the details of the packets, but it's a good idea to develop some comfort level analyzing packets via tcpdump(8) alone. Good luck and have fun with these exercises -- next time, we'll talk stream sockets in the internet domain. Until then, thanks for watching. Cheers!