Hello, and welcome back to CS631 Advanced Programming in the UNIX Environment. This is week 5, segment 4, and after we covered the general functionality of a compiler in our last video, we'll now take a look at each of the different stages in practice. --- Remember, a compiler is a tool -- or rather, as we'll see in this video: a set of tools -- that turn - programming language source code into - an executable for a given machine architecture. Isn't it nice to have a helpful mustachioed fella like that to help you out? So let's take a look and see what's inside our little friend here. --- As we mentioned in our last video, a compiler actually performs a series of steps, which, in the case of the GNU Compiler Collection (or "gcc"), is itself broken up across multiple stand-alone tools. There's - preprocessing, - compilation proper (with all its lexical, syntax, and semantic, analysis), - assembly, and - linking. Preprocessing, as we briefly illustrated, takes our source code and - expands it, according to certain rules. Let's now illustrate this in detail and by example: --- Here we have a file "hello.c" that looks like so. We're calling "func1" from "main", "func1" calls "func2", and "func2" simply calls "printf". This will then inform us about which particular food item is especially delicious. We're using the preprocessor directive "ifndef - define" to illustrate how cpp(1) may expand this code. Ok, so first, let's just compile it, using cc(1). There. Now we know: avocado is great on anything. Good! This is because we had not provided a definition of "FOOD", so the default definition was used instead. How did cc(1) do that? Let's take a look at the manual page. Here we see confirmed what we had previously mentioned: the compiler performs the different steps in order, using different input files at each stage. But we had also mentioned that preprocessing is actually performed not by cc(1), but by the "cpp" command, the C Preprocessor. cpp(1) is automatically invoked by cc(1), as noted here. So what does that look like? Here, let's first display our source code with line numbers: We see preprocessor directives on line 21 to 23 as well as on line 19. Now let's run cpp(1): There. Ok, that's a lot of output all of a sudden, but our basic C code remains in place. Now let's scroll up a bit, and we see a whole lot of code here that we never wrote. These lines are, as you probably recognize, the forward declarations of the various functions from the "stdio" header file. See, over here we have "printf", for example. Let's run this command again and start at the top. The C preprocessor tells us which files it's reading. It then shows that on line 19 of "hello.c" it then pulled in the file /usr/include/stdio.h, which, in turn, appears to have an "#include" statement on line 40 for "cdefs.h", and so on and so on. pause So in this way, the preprocessor iterates through all the files and pulls in whatever is found in those headers, and so we then continue get the various type- and function declarations here. We can instruct cpp(1) to leave out the annotations using the "-P" flag, and the output then looks like this. typedefs and structs and function prototypes... ...and finally our original code. hello.c is, obviously, a C source code file. The output of the preprocessor, stored by convention in a file ending in ".i", is... also C source code. But the difference here is that hello.i contains a lot _more_ C code -- it contains all the code _included_ from the header files, so is an easy 10 times as large. So now can we actually just compile that ".i" file? Let's give it a try. Yep, that works. pause Ok, so far, so good. Now we know how the "#include" statement works. What about the "#define" statement? continue Note that over here our output of cpp(1) no longer contains any "#ifdef" statements, and instead we see the bare string "avocado" over here. What happens if we define a different "FOOD" - let's say: "tomato"? Look, there it is. But... note that this is a bare word, not a string. We need quotes here! Let's try again. We need to escape the quotes here, since our shell would otherwise eat them when it processes the command prior to execution. There. So let's see what the difference is between our original and this output. There, just that one word. pause But do note that we did indeed replace "FOOD" _in place_ at preprocessing time, prior to compilation. That is, we are changing the actual source code, and replacing any occurrence of "FOOD" with "tomato". This simple replacing means that whatever we define needs to be syntactically correct anywhere it appears, which is why we needed to include the quotes in the definition. continue Ok, so this now will produce a program that says... tomatoes are great on anything. pause But we never invoked cpp(1) before; we always only ran cc(1). Can we see what cc(1) does at each stage? continue There, if we pass the "-E" flag, then cc(1) will stop after invoking cpp(1). Yep, that looks just like before. Now let's add the "-v" flag to the compiler so that it shows us exactly what it's doing. We'll ignore the actual output and just look at the compiler's messages. When we add the "-v" flag, the compiler shows us all the details about the execution: what version, how it was built, etc. It also shows us the compiler flags it used to run, and here we see that it will use "-E", as we had asked it to. But note that it also uses "-Wall", "-Werror" etc! We hadn't specified those - where do those flags come from? Well, remember when we had set up our virtual machine we had configured our shell to use an alias for "cc"? And we had also added some default CFLAGS to our environment to ensure that whenever we run "cc" it would use those flags. So that's where those come from. Let's just unset them for the moment. Yup, here we are now without any of those flags. Except... there are some flags we hadn't specified, which appear to be default or built-in for this compiler. This allows the compiler to use machine specific options to optimize the code by default. But ok, so we run with "-E" and our preprocessor now has to find the included headers. For that, it uses this path over here, meaning when it sees "#include ", it will first look in the first directory here, and then under "/usr/include". Ok, so... avocado: great on anything. We know. How do we get that to change again? Just like cpp(1), the compiler _also_ takes a flag named "-D", so that the following will work, too: Now once more in verbose: We see the compiler performing all steps, invoking the preprocessor implicitly and then performing compilation and code generation into a temporary ".s" file under "/tmp". It then invokes the "as(1)" command on that file to create a temporary ".o" file and finally invokes the linker -- ld(1) -- to create the executable. --- Ok, so we observed the first stage of the compilation process, preprocessing via cpp(1): - we can manually invoke the preprocessor ourselves, if we like; it will produce C source code without any preprocessor directives, suitable to be compiled and by convention written to a file ending in ".i"; - We saw that we can _define_ macros or values using the "-D" flag, both to the preprocessor or to the compiler driver, cc(1). That means that some flags passed to cc(1) may be passed through to the different tools it might invoke. We'll see other examples of that in future videos. - Finally, if we want to observe _exactly_ what the compiler does, we can pass the "-v" flag. When doing that, we saw the different stages and different tools invoked as noted above. Now after having covered the preprocessing stage of the compilation process, we'll move on to the compilation and code generation, - so stay tuned for our next video in this series. Until then, thanks for watching. Cheers!