GCC: Understanding compilers

Pretty much anyone that has used a computer knows what an executable is. It comes in many forms, performs many different tasks, and sometimes it ends in “.exe”, but have you ever wondered how how an executable is made? In the simplest terms, a programmer writes some code, puts it through a compiler, and, assuming everything goes right, an executable is made. Today we are going to take a look at that middle step, the compiler, and walk through it step by step.

Depending on how far you want to push the logic, any number of things can be the first step to writing a program, from someone realizing there is a problem, to deciding the language to code the solution for the problem. In this case we are going to say that the first step is writing the actual code. So now you have your code written, but it isn’t really ready to go. This is because machines speak a different language than humans do. Currently we are speaking English, but computers speak binary (1’s and 0's). So now we feed our code into a compiler which will, using a series of steps, take our code and turn it into machine language so that the computer understands what it is supposed to do. While there are several compilers out there, when writing c code on a linux system you use GCC(GNU Compiler Collections). An example of running this compiler goes as follows:

examplepath/commandline/promt$ gcc filename

The above command line is a simple call to the compiler that we will now take apart. The compiler follows through a proccess that can be seperated into 4 major parts: Preprocessor, Compiler (to avoid confusion with the overall compiler, this compiler will be refered to as the (step) compiler for the rest of the article), Assembler, and Linker. Gcc possesses options that allow us to stop the process at any given step, and as I cover each step I will include the prompt commands to do so.

1: Preproccessor- The preprocessor goes through three tasks to prepare the code for the (step)compiler.
a) First it will remove any comments from the code. For those that are not familiar with programming practices, comments are sections in a code file that are not code themselves but ways for a programmer to document what they are doing so that other programmers who have the code file can understand what is supposed to happen.
b) Once the comments are removed any included header files, which are pre-written code meant to be used over and over again and are stored within files commonly labled as .h files, have their code written into the program’s file so that it will be included by the compiler.
c) Finally the preprocessor takes all macros, a block of code defined by the programmer to carry out a set of instructions and then be called with a simple command, and replaces the simple command with the block of code. For example: a programmer defines square(x) to equal x * x. For the rest of the program rather than typing that something equals x * x, he can just put square(x) and when the preproccessor comes accross this in its last task it will replace square(x) with x * x. Once this is done it passes the code to the (step) compiler. Since we are using gcc we have the option to instead tell the main compiler that we would rather have the output sent to standard output(in other words have it printed to our terminal). The command to do so is:

examplepath/commandline/promt$ gcc -E filename

2. (Step) Compiler- While the (step)compiler’s job is a bit more complex, it is a bit easier to explain. In simplest terms it takes our newly altered code from the preproccessor and converts it to Assembly code. For those that do not know, Assembly code is a mid level language that still uses English commands, but is closer to machine language than higher level languages such as C, C++, Javascript, etc. While normally the (step) compiler will pass this on to the assembler, we can request that gcc instead creates a file for us that contains the assembly code. While we can control the name of the file using an additional option, it will default to naming the filename.s replacing any previously existing extensions. The for this is:

examplepath/commandline/promt$ gcc -S filename

3. Assembler- The asssembler’s job is equally straightforward, it takes the assembly code produced by the (step)compiler and converts it into machine language, or binary. This code is also known as object code, and like before we can tell gcc to stop here and produce a file with the object code instead of sending it on to the next step, this time defaulting to calling the new file filename.o. The command for this is:

examplepath/commandline/promt$ gcc -c filename

4. Linker- The linker’s job is a bit more complex both in function and description. To start, programmers don’t always program alone, in fact most real world programs are made by multiple programmers working on individual pieces of the code, for example in the game industry one team will work on the math behind the game while another creates the graphics, and yet another will make the story and either will convert it into dialogue themselves or have another team take that task on. The linker’s first job is to take all of the object code from all of these different teams and combine it into one. The other part of this is bringing in the code from libraries. Much like how you can include header files that are taken care of by the preprocessor, libraries have functions that programmers can use by including the library, which will be brought in here. Since this is the final step that produces the executable file there is no special command to stop the compiler here. Instead I will teach you the command to change the name of the output file produced. This command works at any step of this process, converting the preprocessor code into a file if done with -E, or overwritting the default names of -S and -c, as well as the default name given by the linker which depends on the system used (Linux gives a file called a.out, Windows will produce a .exe). This command is

examplepath/commandline/promt$ gcc [X] filename -o newfilename

where X is any of the previous options we have explored, with -E producing the altered code text, -S producing our assembly code, and -c producing our object code.

Depending on the system and defaults you might have to add the ability for the compiler’s output to be executable, but otherwise this is the end for our compilers journey on this program.