Hello, and welcome back to CS 631 "Advanced Programming in the UNIX Environment". This is our fourth video segment on the larger topic of how to restrict processes. In our last video, we looked at restricted shells, chroots, and jails; in this video, we'll go back to a more process-centric view. Even if we are able to restrict with fine granularity which users may execute which commands and, by way of file system permissions (extended or otherwise) or even through the use of a chroot or jail which files may be accessed or which processes may be viewed, we are still facing a number of problems: we are still using the same resources, all software contains bugs, and humans tend to misconfigure things, thereby oftentimes allowing a process to do things that they shouldn't be able to, or to interfere with the normal operations of the system. --- All processes, even those running in jails, effectively compete for these same resources and may continue to run forever, or consume certain resources to a degree that we'd rather it not. We've discussed some way to restrict a given process's resource utilization in Week 06, Segment 5, where we looked at the getrlimit(2) / setrlimit(2) syscalls and the ulimit shell builtin: - The use of resource limits brings us back to a significant and important concept: self-restriction. That is, a process can voluntarily restrict its own usage such that it itself cannot later regain the privileges it previously had. This applies equally to any children this process may create, thereby allowing you to create more confined processes or process groups. --- One of the limitations we had observed here was the total CPU time in seconds. That is, you may not want to allow certain jobs to keep hogging the CPU. But... if we didn't have this ulimit, would the process just sit there and not let any other processes get the CPU? To better understand how processes may compete for CPU cycles, let's briefly take a look at how the scheduler works when placing processes on the CPU: --- You've probably seen similar illustrations from your basic operating systems class, but to clarify our understanding of the resource limitations we discuss here, it's still useful to rehash this topic real quick: In a naive round-robin scheduler, you have a run queue of processes that are waiting for CPU time. --- And the scheduler picks off the next job from the run queue and places it on the CPU. But a job may not necessarily run to completion and instead request some I/O. As we mentioned before, this is very slow -- in computer terms -- and so while we're blocked for I/O --- the scheduler may move our job into the wait queue. This frees up the CPU to handle another jobs, and so --- the scheduler picks the next job off the run queue, PID 2234 in this case, and lets that job have the CPU for a bit. But even if PID 2234 does not request I/O or otherwise ends up being blocked, it might still not run to completion. If we simply let any job get the CPU and keep it until it's done, that'd block all other jobs. --- So instead, the scheduler preempts the job after some period of time and places it back into the run queue, giving the next job a chance to run. Now with process 8723 on the CPU --- process 1234 may have finally completed the I/O it was blocked on, so it gets back into the run queue. But now it's all the way at the end of the queue and will have to wait until all the other jobs ahead of it got their slice of CPU time before it gets a chance to do some work again. This trivial round-robin approach also assumes that all jobs have equal importance and should always get the same amount of CPU time, but in reality we sometimes have some jobs that we'd like to be preferred over others. Certain system jobs or kernel tasks, critical for the OS to run smoothly, should probably not be preempted for unimportant, long-running processes initiated by normal users. --- So let's try again. This time, in addition to our run queue, we let each process have a _priority_. If all jobs have equal priority, then the scheduler can pick the first job off the run queue, just like before. --- But let's suppose different jobs have different priorities. In that case, the scheduler will pick the job with the highest priority off the run queue and place it on the CPU. In this case... --- that'd be process 8723, with a priority of 15 --- with process 8833 being up next. Note that as we place process 8723 onto the CPU, we are increasing the priority of all waiting processes. This ensures that any process that waits for a long time will eventually have a high enough priority to get picked up. Now if process 8723 gets blocked waiting on I/O, for example, then --- it gets placed into the wait queue, while the next job -- process 8833 -- gets the CPU and the other jobs in the run queue have their priority incremented again. --- After some time, process 8833 gets preempted, but note that when it gets placed back into the run queue, it has its priority slightly lowered, since it just _had_ the CPU, --- and process 1234 now gets the CPU. If process 8723 now completes its I/O, it will then --- get placed back into the run queue, but it has kept its priority the entire time, since it never got a full cycle on the CPU. So this way, the job that was blocked by I/O now has the highest priority, and when process 1234 gets preempted --- _it_ gets the CPU. Now this is a very simplified view of dynamic priority scheduling, but I think you get the idea: by allowing for priorities, we can ensure jobs are scheduled according to their needs, and by dynamically adjusting the priority based on CPU cycles used, for example, we can avoid starvation of even a low priority job. Now there are many other scheduling algorithms, but for us, this is a good enough approximation to now take a look at how process priorities are adjusted. For that, --- we have the getpriority(2) and setpriority(2) system calls. - By default, every process gets a priority of zero - neutral. - 'which' specifies whether you're interested in the process priority, the priority of a process group, or that of all of a user's processes. For multiple processes, getpriority(2) will then return the highest priority of any of those processes. - The default priority of 0 is neutral; numerically lower priority values cause more favorable scheduling, higher values imply less favorable scheduling. This is a bit confusing, since normally we might consider a "higher priority" to lead to more favorable scheduling, which is why we differentiate between the actual kernel priority and the process "niceness". As we will see in a few minutes, we can adjust the priority of a process using the nice(1) utility, and the logical mapping here makes a bit more sense: the nicer a process is, the more it is willing to let other processes have the CPU. So a higher number means you're being nicer, so receive a lower kernel scheduling priority. The value you can set here ranges from -20 to +20, although on some Unix versions you can only be nice up to a level of 19. A value of 19 or 20 will schedule a process only when nothing at priority <= 0 is runnable. - Similarly to how we handle ulimits, our process is only able to raise its niceness and can never lower it, unless you're the super user. We'll see examples of this in a second. - Finally, note that if our nice value can range from -20 to +20, then -1 is a valid return value for getpriority(2), meaning we can't just rely on the return value to identify an error. For this reason, we need to explicitly set errno before and then check errno after we call getpriority(2). Let's see all this in action: --- When we run 'uptime', we will see not only how long the host has been up, but it also includes the load averages in three numbers. These three numbers show the load average across the last 1, 5, and 15 minutes, with a load average being defined as the number of processes in the run queue averaged over that interval. Here's a trivial program to print just the load averages. The three numbers printed by our program roughly match what uptime(1) reported, as the averages are calculated in real time. Ok, so now let's try to generate some busy work, a silly little script that will keep our CPU busy for a bit. Here, we're simply looping around, subtracting numbers, round and round. If we run this and then take a look at the output of top(1) in another window, then we see our busy job up here as process ID 1361 with a kernel priority of 28 and a niceness of zero. When our runnable jobs decrement, we find that this script completed. Ok, now let's run a few instances of this script in parallel, to observe how our CPU priorities are assigned. I'll just run four instances in succession, waiting a bit in between. We then expect these four jobs to complete in the same order they were started: A, B, C, and then D. Now in our other window, we should see these jobs show up one by one, with each having the same nice level and approximately the same kernel priority, adjusted on each update as they get on the CPU and are preempted. As they complete, we see the order we had expected: A, B, C, and then D. So now, let's try to alter their priorities after we kicked off the jobs. We can use the nice(1) command to specify an initial priority... ... or we can use the renice(1) command to adjust the priority of an already running process. So let's use renice(1) - but for that, we need the process IDs of our jobs, so let's print those out when we start them. Ok, now we have job A with process ID 2197, B with 1869, and so on. Let's adjust the priorities such that we give process A the highest nice level -- 20, process D a nice level of 10 process B a nice level of 5, and leave process C at the default. Now we can observe our jobs over here again. Now we see the nice level reflected here, and the kernel priorities adjusted accordingly. As the jobs complete, we should now find another order of termination: Now, C was first to complete, then B, then D and A was the last. This reflects the nice levels we assigned to each process, showing that the higher priority jobs did indeed get the CPU more and were thus able to complete faster. Alright, now let's take a quick look at nice(1) might set the priority: Here we have a program that prints its current priority, then tries to set its own priority to the value provided on the command-line, then prints out its new priority, and then tries to set its priority to that which it had at program startup. Finally, it again reports its current priority and exits. Let's run it and request a nice level of '5'. We start out with the default nice level: zero. We are able to set the new nice level, since we're _raising_ our niceness, i.e., _lowering_ our priority. We're being nice. After that, we try to set our priority back to zero, but that fails, and so our priority remains 5 throughout. If we run the same command with super-user privileges, then we _are_ able to lower our niceness again from 5 back to our initial zero. What happens if we're starting out being really nice? Let's try. Ah, our initial nice level is now 10, so trying to set it to 5 will fail, since that would be _lowering_ our niceness. But we can of course still be even nicer and lower our priority further. Can we start out not being nice at all? Our default value is zero, so let's try to start out with -5. That fails -- we can't lower our nice level below zero, but note that nice(1) still lets us run the program now with the default priority, while of course the super-user can start out with a lowered nice value and then adjust as they like. --- Ok, let's take a break here and summarize. - As mentioned before, a process іs able to voluntarily self-restrict their resource utilization. This applies to ulimit resource limitations just as it does to CPU scheduling priority. - We can adjust our CPU priority -- our niceness -- using the setpriority(2) system call, - or using the command-line utilities nice(1) and renice(1). This can be done for individual processes or process groups. - Unlike in the human world, once you're nice, you can't decide to be naughty again. This follows the principle that we can lower our privileges, but not raise them again once lowered, with the intent to allow a process to intentionally shed some privileges so that it or its children cannot abuse them at a later point. - Finally, while we can control to some degree how often we get CPU cycles by being nice, this has no direct effect on on _which_ CPU we are scheduled, if the system has multiple CPUs. It stands to reason that, on a multi-CPU system, you might want to keep one CPU for system processes, for example and farm out all the other jobs to the other CPUs. We'll talk about how to accomplish that in our next video. Until then, thanks for watching! Cheers!