Hello, and welcome back to CS631 "Advanced Programming in the UNIX Environment". So far we've talked a lot about individual programs such as the common Unix utilities found under /sbin, but we really haven't talked much about processes. Well, we did mention in our introductory lecture that a process has certain properties, such as a process ID; we've seen that a process has an effective UID and Group ID, and we've noted that properties of a process cannot be changed by a child process, such as the current working directory. But we really have not looked at all at what a process really _is_, what it looks like, or how processes are brought to life and how they die. In these coming videos covering the materials for Week number 06, we will do all that. But first, let us get a visual understanding of what a process looks like in memory. This can be quite useful to understand many aspects of process execution later on. --- I'm sure you've seen an illustration of a process in memory like this before, where we show a high address, a low address, the stack, the heap etc. But we'd generally like to really _see_ what things look like, by writing some code and inspecting the outcome. How can we do this? --- Well, let's take a look. For starters, let's note that in C, we are sufficiently low-level that we can inspect the address of any program element even without using a debugger. Every variable really is just whatever is stored at a given memory location, with the amount of memory used being defined by the type of the variable. A pointer, in turn, is nothing but a variable whose value is an address in memory. By using the ampersand operator, we can _dereference a variable_ -- that is, instead of the value stored at the given address, we get a pointer to the address of the variable. We can then take this memory address, cast it to a number, and print that value in hex. Here we go. So apparently the variable 'var' inside of 'main' can be found at this hex address. - Now that we've seen how to do that, let's try to use that to explore a little bit more about our program. We have two arguments passed to our 'main' function, argc and argv. We declare a private variable visible only within 'main', but we also have a global variable, accessible to any function in our program (even if we only have one function here: main). But since this is C, we are not bound to dereferencing variables, we can also dereference function names, thereby obtaining a function pointer, and print _that_ address. So when we run this program, the output looks like so, and this is now getting a bit closer to what we imagined our process to look like, with some things being found at a high address (argc, argv, and the private variable in this case), and some things found at a notably lower memory address - our global variable and the function 'main' itself. --- But that's not enough. We want to further see if we can identify the location of the stack and heap and some of the other segments of the program in memory, so here we have a more comprehensive program that declares a number of different things and prints their respective memory addresses. Here we have a few variables declared. Note that some of them are initialized -- meaning they have been assigned an explicit value -- and some of them are uninitialized. Our 'main' function has some variables visible only to 'main', and we then begin printing out the various addresses. From within 'main', we then call two other functions... ...dynamically allocate a bit of memory via malloc(3)... and continue printing other addresses. Don't worry too much about reading the code here -- you can download it from the course website to play around with yourself. Let's run it... --- Here's the output this program produces, nicely formatted to help us better understand where the different memory addresses are. This looks a little bit more like what we had previously visualized: - Notice how the output is (mostly) sorted by memory address, starting with the highest address at the environment - and down to the lowest address of the main function itself. - Since we invoked the command without any additional command-line parameters, argv will contain two elements -- argv[0], the program name, and NULL. Each member of argv is a 'char *', a pointer, and thus takes 8 bytes, which we can see here. So our findings match our illustration: command-line arguments and environment variables are found at the high address of the process in memory. --- At the bottom, the low addresses, we find the text segment. This is the part of the virtual address space that contains executable instructions. This is usually marked read-only. - This, by the way, is where the original use of the sticky bit comes in. If that bit is set in the st_mode, then the OS can keep the text segment in swap space after the process terminates. Upon a subsequent execution of the same command, the kernel can then simply move the segment from swap into real memory, which is much faster than fetching it from disk. That is, the text segment remains sticky in memory, and this is why the 'sticky bit' is also known as the 'save-text' bit. This used to be set on certain large, frequently invoked binaries -- the compiler, say, or emacs. Most modern Unix versions do not use the sticky bit on files any longer. --- Next, going bottom up, we have our initialized data. That is, variables that we have defined and declared in global scope, or that are static variables even if defined within a function, meaning they retain their address across function invocations. This is the "initialized data", or simply "data" segment. - Both the text segment and the data segment are read from the program file by exec(3) at program startup time. --- After that, we see the part where we store uninitialized data. Here, we can see our globally declared, but uninitialized 'array' of size ARRAY_SIZE -- which is we had declared to be 16 bytes, which is nicely reflected in the hex address showing the start of the array at 601A A0 and ending 16 bytes higher at 601A B0. All of the uninitialized global variables end up in this segment, the "unitialized data segment", also known as the "BSS" segment, as it is placed after the "block starting symbol". This also includes static variables that have been initialized to 0; all other uninitialized variables are initialized to all bits zero by exec(3). --- Next, on top of the BSS segment sits the heap. This is where memory is dynamically allocated from at runtime on demand via malloc(3) or similar functions. This area is shared by all thread, shared libraries, or dynamically linked modules in the given program. We allocated 32 bytes, which we can again see as the starting address ending in 00 increased to 20 hex. - As additional memory is allocated, the heap grows upwards towards the higher addresses. --- Now back up towards the higher addresses, right below where we found our command-line arguments, the stack begins. The stack is a LIFO, a last-in first out... well, stack, on which the function frames are pushed as they are executed. - The stack pointer points to the top of the stack, which, somewhat confusingly, is the lower address in this visualization. Sometimes you may encounter this illustration flipped upside down with to allow the stack pointer to actually point to the visual "top" of the stack, but for this architecture that then places the "high" address at the bottom. I find that even more confusing and prefer this image here instead. You get used to saying "top of the stack" when pointing at something that grows downwards, don't worry. - Anyway, so the stack starts at the higher address and so all the addresses on the stack decrease in order. So we find our first variable on the stack, the 'int var' at ...A82 54; as an int, it only requires 4 bytes, so the next variable, the array we delcared inside of main, begins at A82 50, which, at 16 bytes, ends at !82 40. Note also that argc and argv, the arguments to 'main' are also found here and everything belonging to this function goes into this frame on the stack here. - From within 'main', we call 'func2', which gets pushed on the stack, - and our stack pointer is updated. After 'func2' completes, we pop the frame off the stack and return back to 'main'... --- ...with the stack pointer back here. Now we call 'func'... --- ...which is pushed onto the stack. 'func' calls 'func2', which gets pushed onto the stack. --- So we saw how while the heap grows up from a low address, the stack grows down from a high address, as function frames are pushed onto the stack. But notice that we have one outlier here on the left. - Within our function 'func', we declared a 'static int n = 1;'. As a static variable that was initialized, we should find this address... - ...in the data segment down here. If you were to flip the initialization of this variable to '0' in the code, you should see that this flips the address to end up in the BSS segment. Give it a try! --- Alright, so now... let's take a look at our process image in memory. As we said, the heap grows up and the stack grows down. But that means that the two are growing towards one another. So what happens if you keep pushing more and more frames onto the stack? --- Let's give it a try. We've modified our program to allow a compiler flag to have 'func2' call 'func', thereby causing infinite recursion. If we compile it normally, everything behaves as before, ... - ...but if we define STACKOVERFLOW, then... - we eventually get a segfault. That's right, a _segmentation violation_, because we are accessing a segment we have no business accessing. Let's take a quick look at how this happened: --- We start out and call 'func' - which calls 'func2'... - --- which then calls 'func', which calls 'func2' and so on and so on --- and so on and so on, over 43 thousand times, until, eventually, we overflow the stack into an area that it has no longer any business reaching into. We segfault due to a stack overflow. --- And now that you know why that website from which you copy all your code examples has that name, I think we can take a break. I hope that the illustration of the process layout in memory was useful to you; we'll get back to this every so often throughout the remainder of the semester. Make sure to download the source file for the program and play around with the results. Next time, we'll talk a bit more about how the process starts, and how we enter the 'main' function. Until then, thanks for watching! Cheers!