Hello, and welcome back to CS631 "Advanced Programming in the UNIX Environment". With this video, we're continuing our discussion of how to restrict processes, picking up where we left of with segment 3 and process limitations, in particular with respect to CPU time. In that video, we noted that while we can adjust the priority of a process and thereby ensure that a process cannot, for example, hog the CPU and starve other processes, this priority -- or "niceness -- does not have any impact on the placement of the process on a given CPU. In this video today, we'll discuss how we can accomplish that. But first, let's look at a simplified illustration of CPU placement for your common processes. --- Let's suppose that we have a system with four CPUs - and then we have a bunch of processes running on our system. As you can tell, this is a fairly typical representation of what you normally find on your NetBSD system: you have a bunch of commands run interactively from a shell, you have a few system daemons, and then, at the bottom there, we picked a few generic job names that do some CPU intensive work. Now with your usual time-sharing priority based scheduling algorithm, any of these processes may be placed on any --- of the available CPUs. As work is being completed and as jobs may be preempted and rescheduled as we saw in our last video, they --- may be moved from one CPU to another or new jobs placed on the CPUs as the scheduler sees fit. But now let's assume that our 'worker' jobs here are all very CPU intensive. By having them get placed on any of the CPUs, you might end up with a fully loaded system, and, depending on their priority, some of your system jobs might not compete as quickly as you'd like. --- So let's try to pick these 'worker' jobs and try to ensure that they don't get placed on just _any_ CPU, but --- only on CPUs 1 and 2. Doing that is called "CPU pinning", or assigning a processor _affinity_. When we do that... --- ...the workers are correctly placed onto just these CPUs. But note that we still have other jobs on CPUs 1 and two: the shell and the find command are not evicted from the CPU, and in fact --- new processes may be placed on CPUs 1 and 2 as needed. It is only the 'worker' processes that have been bound to the specified CPUs, all other processes can still be placed any way the scheduler sees fit. Let's demonstrate this in practice: --- Here's a silly little program to peg the CPU and keep it busy. Before we run it, let's split the screen and run top(1) in the bottom half. Note that on this system we now have four CPUs available, and we see some of the standard system processes. Now when we start our worker process, we note that after it starts out on CPU 0, it is then placed on CPU 2, where it then continues to run and use up cycles. Let's place it in the background. We note that it has been moved to CPU 3 now, and we start a second worker in the background. That one ends up on CPU 2, and we kick off another one... and another one. Now we have four worker processes, and the scheduler distributed them across all four CPUs. If we then run another jobs -- dd in this case -- it has to share the CPU with one of the other processes, and in this case it ends up on CPU 2. Our worker jobs all execute entirely in user space, but the dd(1) job also executes in kernel space, as it performs I/O, which we see reflected here in the display updated by top(1). We run yet another job, one that itself executes another program over and over, and we note that we can observe the placement of the wc(1) commands on different CPUs for subsequent invocations, while our workers are still hogging all four CPUs. Ok, let's kill all these jobs, and look at ways that we can assign processor affinity. For that, we can use the schedctl(8) command. Note that in addition to the affinity, we can also tune the scheduling algorithm and the priority as well. Let's give it a try. First, let's take a look at the processory affinity of our shell here. Right now, we have no affinity, meaning the scheduler is free to place this shell on any CPU it likes. Let's run our worker host... it ends up on CPU 2. Now let's try to move it to CPU 3. Ah, we can't do that -- changing CPU affinity requires superuser privileges, which makes sense: we don't want to allow regular users to move around their jobs and possibly interfere with others'. Ok, let's try as root. There, now our worker has an affinity for CPU 3, and we see down here that it was moved from CPU 2 to CPU 3. We can now move it back to CPU 2... CPU 1... or CPU 0, and see it get updated below in the top(1) display. Note that we can choose to allow users to change the CPU affinity. For that, we have to change the user_set_cpu_affinity sysctl here and now we can move our current shell from having no affinity to CPU 2. Since CPU affinity is inherited by a child process from its parent, when we kick off a new worker here, it, too, will be running on CPU 2. So even if we run multiple worker jobs, they will all remain bound to CPU 2, with the other CPUs remaining idle. Now note that while we have bound the worker jobs to CPU 2, we can still move other jobs to that CPU as well. Here, we're moving the top(1) process itself. Note also that even though a child process inherits its parent process's affinity -- dd here as a child of our shell with an affinity for CPU 2 also gets placed on that CPU -- we can still moved it off explicitly. And likewise can we move the worker jobs to a different CPU. --- Ok, so we've seen how to move individual jobs around to certain CPUs by assigning a processor affinity. But, as we've seen, this allows other processes to still be placed on those CPUs, possibly competing with our jobs. Can we _reserve_ one or more CPUs for specific tasks such that no other job can be run on them? Let's say we want to take our four CPUs and reserve two of them - for our worker jobs, - and one of them for our shell, then we can do so using "CPU sets". When you create CPU sets, you will always keep one default set available for any of the leftover processes. So in our example here, all our system processes would end up --- on CPU 0. We can then explicitly bind our shell to --- CPU 3, which means that any processes subsequently tarted by this shell would _also_ be bound to this CPU set. If we then bind our worker jobs to cpuset 1, --- then we are lookin at a distribution like this. Now note that when we start other jobs from our shell, they will continue to be placed on CPU 3 only, while any other scheduling that has to occur on the jobs that are not explicitly bound to a cpuset will --- end up on CPU 0, but no job other than our worker jobs will ever be assigned to CPUs 1 and 2. Let's run through a practical example to illustrate: --- We've extended our little 'busy' program slightly to make it easier to differentiate the worker jobs. Now, our program takes an argument of how many jobs to kick off, then runs them with an easy to differentiate argv[0]. As before, we split our screen to run top(1) completely unironically in the bottom half. I know, I know. Anyway, let's start out with 6 worker jobs, which now get distributed across all four CPUs, as expected. But we said we wanted to use CPU sets, so let's do that. The psrset(1) command shows us the current CPU sets -- one set, the default set, comprising all four CPUs. Let's replicate the setup from our illustration: Like before, creating CPU sets is not something normal users are allowed to do, so we need sudo(8). There. Now we have three CPU sets: the default CPU set, with CPU 0 only, cpuset 1, comprising CPUs 1 and 2, and cpuset 2, with CPU 3. Now when we run our 6 worker jobs... they all end up in the default CPU set, on CPU 0. This is because any job that is not explicitly bound to a cpuset can only be placed in the default CPU set. That's not what we wanted, so let's place them on CPU set 1. There, that's better -- we now see them distributed across CPUs 1 and 2, as we had planned. But note that despite binding the worker to a given CPU set, we can _still_ explicitly move it to a CPU in the default CPU set. But if we try to move the top(1) process from the default CPU set -- CPU 0 -- to CPU 2, then we fail: CPU 2 is part of a non-default CPU set, and so does not allow any jobs that are not explicitly bound to it via the psrset(1) command. Same for CPU 3, which is part of the non-default CPU set 2. We can't even move _back_ the process we had removed from the CPU set. To make the CPUs in the CPU set available for any other jobs, we have to delete the CPU sets again, at which point all jobs can be scheduled on any CPU once more. --- Ok, let's summarize what we've learned: - We noted that it can be beneficial to restrict jobs to certain CPUs. This can be done for performance reasons -- scheduling processes and threads on the same CPU can reduce the number of CPU cache misses -- or to ensure resources are used fairly or to prevent a group of processes from interfering with other jobs. - There are two ways we can accomplish this: the first is "CPU pinning", where we assign a CPU affinity to a process or process group. While the specified process will be bound to the specified CPU, other jobs can still be placed onto the same CPU, however. - As we've seen, a child process will inherit its CPU affinity from its parent, but changing a parent's CPU affinity will not also change that of all its children. - In contrast to processor affinity, CPU sets allow you to really _reserve_ one or more CPUs for specific jobs: the schedule will not able to move any jobs onto those CPUs other than those you have explicitly bound to them. As before, children inherit CPU set placement from their parents, and you can explicitly remove a process from a CPU set by changing its affinity, but to move it back into the CPU set, you need to explicitly call psrset(1). - Finally, none of this is standardized; different operating systems implement CPU pinning and CPU sets differently and are using different command-line tools and library functions, so make sure to check your specific operating system manual pages. Alright, this gets us to the end of this topic. Looking back at all the different ways in which we can restrict processes, we now should be able to combine them to build really specific environments. But before we do that, we'll have to cover just one more topic: control groups, namespaces, and capabilities. We'll do that in our next video. Thanks for watching - cheers!