Hello, and welcome back to CS631 "Advanced Programming in the UNIX Environment". After we discussed socketpairs in our last video, we'll move on to regular sockets in the UNIX or "local" domain. Once more relying on the system provided BSD IPC tutorials, we'll further deepen our understanding of the socket API in this way. So let's get started... To create a socket, we use the aptly named socket(2) system call. This will create such a communications endpoint and return an identifier to us in the form of a file descriptor. Sockets can be created in different domains, an adress- or name space from which a suitable socket name is drawn. This domain defines certain communication properties, and present a useful layer of abstraction from the specific implementation of the communication's internal aspects. As we will see, there are several different domains defined; one important feature of the sockets API is that it allows you to implement interprocess communications logic that is largely identical for processes communicating on the same system as for those communicating across the network. In either case, you'd start by creating a socket using this system call. In addition, sockets are typed based on how the user process interacts with the socket in the given domain. Finally, the user may choose a particular protocol, a set of rules that further governs the details of the communication. Usually, there is one protocol for each socket type, and in most cases it's sufficient for the user to simply let the kernel pick the appropriate, default protocol for the selected socket domain and type -- this is done by specifying '0' for the type. The available domains to select from vary depending on the operating system and version, but at a minimum, you should find the following to be supported: PF_LOCAL -- the local domain. This was previously referred to as the "UNIX" domain, and the different domains used to use the AF_ prefix -- for "Address Format"; the PF prefix used today stands for "Protocol Family". This domain is for communications on the same system, and sockets are named with a standard pathname. As we'ĺl see in a minute, a socket of this type does indeed appear in the filesystem as a file of type "socket" and is then used as the rendezvous point by the communicating processes. If we wish to communicate over the network, we can create a socket in the PF_INET domain, and to communicate via IPv6, the PF_INET6 domain. There are several other domains that may be supported on your system - check the manual page as well as the socket header file for those. Next, just like there are multiple domains, so are there multiple types. For starters, there's sockets of type stream, which provide sequenced, reliable, two-way connection based byte streams. For network communications, the prime example here would be TCP. Second, there's datagram sockets, which support bidirectional, connectionless, unreliable messages. For network communications, the obvious example here would be UDP. Next, there are raw sockets, which allow the called access to the underlying communicatino protocols. For example, if you want to send ICMP packets, you need to utilize a raw socket. Raw sockets are only available to the superuser. Other socket types may be defined on your system, such as one for sequenced packets streams, or connection oriented datagrams, for example. As before, check your manual page and system header for which types are supported. But enough talk, let's take a look at what using sockets looks like in practice. Here, we'll again follow the system-provided BSD IPC tutorials. And similar to how we handled message queues in week 08, we now have separate programs for the sender and receiver, thereby illustrating that we can communicate between unrelated processes. Ok, so here we have our reader. We create a datagram socket in the LOCAL domain, then populate the family and path into our struct sockaddr_un before we call bind(2), which assigns the name to the given socket. After that, we can use the read(2) syscall to operate on the file descriptor of the socket and read any data sent to us by a client. Afterwards, we close the socket - again, just like a file descriptor, but here, before we exit, we also need to unlink the socket we created. We'll compile this program into an executable called "read", and then take a look at the sender. The setup is rather similar to the reader: we create a datagram socket in the "local" domain, again fill in the struct sockaddr_un before we then send data to our reader using the sendto(2) syscall. When we're done, we close the socket and exit. Let's compile this program into an executable named 'send'. Ok, so now we start our reader. It creates the socket, prints its name, and then blocks, waiting for data to appear. We create a second shell and list the socket, which, as expected, we find to be of type 's' for socket and size 0. This looks similar to our fifo we used previously, so perhaps we can just write data into the socket? Let's give it a try. Nope, not gonna work. Unfortuantely, the shell doesn't use strerror, so all we get is the numeric code here. Let's look up what error 45 is. Aha, operation not supported for the type of object. Ok, so we can't just write to the socket, but let's use our 'send' program: Here, we see the data sent by our program be immediately read by the reader on the left. What happens if we try to send data again? We get an error that our socket no longer exists. Remember, we had unlinked the file in the reader after reading the data. Let's try to keep the socket and see what happens: Ok, we read again, and send data again. Now the socket remains in the filesystem, still with 0 byte size. Let's try to run 'send' again. Ah, now we get a different error: "connection refused". Our reader is no longer paying attention to our socket, so we can't connect to it. Let's try to read again. Oh, another error: "address already in use". This happens because our reader is trying to create a new socket, but the file already exists. That is, we get the same errors we'd get if we tried to listen on a network socket and another application was already listening. Which is why we need to remove the socket after our program completes. If we remove the socket and then run the reader again, it will be able to create the socket and use it again, and our sender will be able to send data again as well. Ok, so we've seen how communications over a socket work. Specifically, we noticed that after we created our socket, we had to "bind" to it. When you first call "socket", the new socket will exist in the given name space, but it won't have a name yet. By calling bind(2), we are assigning a name to it. If our socket is in the UNIX or local domain, then calling bind(2) causes the socket to be manifested in the filesystem. Since this creates a new file, and as a socket is used to allow another process to communicate with us, we'll have to consider the permissions on the file. Although the fact that the new file of type socket is created with consideration of our umask shouldn't come as a surprise, it should be noted that this is not portable, and we should ensure that we provide a pathname under a properly restricted directory. So after we've called bind(2), our socket exists in the filesystem, and is represented internally by a file descriptor, which we can use to read(2) from -- as our reader does. But we saw that our sender, instead of calling write(2), used another system call to send the data: the sendto(2) system call. The send(2) and sendto(2) system calls can be used to transmit a message to another socket. They have the advantage over the write(2) syscall that they are designed for use with sockets, and as such allows, for example, for additional flags to be set, some of which we'll see in a future lecture. In addition, in order to be able to transmit data reliably -- that is, using a stream -- a socket must be in connected state; since we're using datagrams in our example, we can use sendto(2) to submit our data without calling connect(2). Now in our reader program, we did use read(2), but there, too, we have specific socket API calls to receive the data: recv(2) and recvfrom(2) are thus the equivalent calls to send(2) and sendto(2). All are returning the number of bytes sent or received on success, negative 1 on failure. We'll see their use again and in more detail in our next video segment when we perform network communications. Alright, so our short example of using datagrams in the local domain serves as a good introduction to the sockets API. We've seen that we first have to create a socket by calling the socket(2) syscall, specifying the domain, type and protocol; then we have to bind(2) the socket to assign a name to it. In order for communications to be possible, both parties have to agree on the name -- that is, use the same pathname and have access permissions for the file in question. The file of type socket that is created when we call bind(2) is, like a fifo, only used for rendezvous between the two programs, however. Since we get back a file descriptor, we are able to use the standard I/O syscalls like read(2) and write(2) on it, but dedicated system calls like recv(2) and send(2) etc. exist and offer socket API specifc, additional functionality. We've also seen that after we are done with our communications, it's up to us to remove the file. In our next video segment, we'll see how we can use sockets for network communications, and we'll discuss the parallels to interprocess communications in the local domain, but before I let you go, here are a few questions and exercises for you: Our sender program currently always sends a fixed message. This is ok to illustrate some basic functionality, but not terribly useful. Can you change the program to write data read from stdin into the socket instead, one line at a time? Play around with the permissions on the socket after the server called bind(2) -- what happens if you restrict them, what other processes can use it etc. Change the two programs to use the respective other system calls to perform the I/O: change the reader to use recv(2), the sender to use write(2). What works better? What's easier? Can you have multiple processes using the same socket to send data to a single reader? For this, you'll have to change the reader to loop and read repeatedly; see some of our earlier IPC examples and update our code here. And finally, what happens if you change protocols or socket types - can we mix and match? Ok, I think that might be enough to keep you busy until the next video. Good luck, and thanks for watching - cheers!