Understanding Compilers, Assemblers, and Loaders: Key Concepts
Lexical Analysis and Syntax Analysis
Lexical Analysis: This phase takes the original program as input. If the elements of the program are correct, it generates meaningful units called “tokens.”
Syntax Analysis: This phase takes the tokens generated by the lexical phase as input. If the syntax of the statement is correct, it generates a parse tree representation.
Functions of Compilers, Cross Compilers, and Interpreters
Compiler
A compiler is a computer program (or set of programs) that transforms source code written in a programming language (the source language) into another computer language (the target language, often having a binary form known as object code). The most common reason for converting source code is to create an executable program.
Cross Compiler
A cross compiler is a compiler capable of creating executable code for a platform other than the one on which the compiler is running. For example, a compiler that runs on a Windows 7 PC but generates code that runs on an Android smartphone is a cross compiler. A cross compiler is necessary to compile for multiple platforms from one machine. A platform could be infeasible for a compiler to run on, such as for the microcontroller of an embedded system because those systems contain no operating system. In paravirtualization, one machine runs many operating systems, and a cross compiler could generate an executable for each of them from one main source.
Interpreter
In computer science, an interpreter is a computer program that directly executes, i.e., performs, instructions written in a programming or scripting language, without previously compiling them into a machine language program. An interpreter generally uses one of the following strategies for program execution:
- Parse the source code and perform its behavior directly.
- Translate source code into some efficient intermediate representation and immediately execute this.
- Explicitly execute stored precompiled code made by a compiler which is part of the interpreter system.
Bootstrap Loader
Alternatively referred to as bootstrapping, bootloader, or boot program, a bootstrap loader is a program that resides in the computer’s EPROM, ROM, or other non-volatile memory. It is automatically executed by the processor when turning on the computer. The bootstrap loader reads the hard drive’s boot sector to continue the process of loading the computer’s operating system. The bootstrap loader is stored in the master boot record (MBR) on the computer’s hard drive. When the computer is turned on or restarted, it first performs the power-on self-test, also known as POST. If the POST is successful and no issues are found, the bootstrap loader will load the operating system for the computer into memory. The computer will then be able to quickly access, load, and run the operating system.
Loader
In computing, a loader is the part of an operating system that is responsible for loading programs and libraries. It is one of the essential stages in the process of starting a program, as it places programs into memory and prepares them for execution. Loading a program involves reading the contents of the executable file containing the program instructions into memory, and then carrying out other required preparatory tasks to prepare the executable for running. Once loading is complete, the operating system starts the program by passing control to the loaded program code.
Assembler
An assembler is a program that takes basic computer instructions and converts them into a pattern of bits that the computer’s processor can use to perform its basic operations. Some people call these instructions assembler language and others use the term assembly language. An assembler enables software and application developers to access, operate and manage a computer’s hardware architecture and components. An assembler is sometimes referred to as the compiler of assembly language. It also provides the services of an interpreter.
Mnemonic Table and Database in Assembler and Loader
Mnemonic Table in Assembler
Machine operation Table (MOT) or Mnemonic Opcode Table (MOT): It is used to indicate for each instruction: a) Symbolic mnemonic b) Instruction length c) Binary machine opcode. d) Format. In computer assembler (or assembly) language, a mnemonic is an abbreviation for an operation. It’s entered in the operation code field of each assembler program instruction. For example, on an Intel microprocessor, inc (“increase by one”) is a mnemonic. On an IBM System/370 series computer, BAL is a mnemonic for “branch-and-link.”
Database Use in Each Pass of Loader
Pass 1 Databases
- Input object decks
- Initial program load Address (IPLA) supplied by the programmer on the operating system that specifies the address to load the first segment.
- A program load address (PLA) counter, used to keep track of each segment’s assigned location.
- A table, global External symbol table (GEST) that is used to store each external symbol and its corresponding assigned core address.
- A copy of the input to be used by pass 2. This may be stored on an auxiliary storage device, such as magnetic tape, disk on drum, on the original object deck may be reread by loader a second time for pass 2.
Pass 2 Databases
- Copy of object programs inputted to pass 1
- Initial Program Load address parameter (IPLA)
- The Program load address counter (PLA)
- The Global External symbol table (GEST) prepared by pass 1, containing each symbol and its corresponding absolute address value.
- An array, the Local External Symbol Array (LESA), which is used to establish a correspondence between the ESD ID numbers, used an ESO and RLD cards and the corresponding External Symbols absolute address value.
Linker vs. Loader
Linker | Loader |
---|---|
1) Linker is a system program which links different object modules to form a program. | 1) Loader is a system program that places the object program in memory and prepares it for execution and starts the execution. |
2) Many object modules are combined and given to loader | 2) Loader accepts the combined version of object modules. |
3) Linker accepts the input (.obj file) from compiler or assembler. | 3) Loader accepts the input from linker |
4) Source code is given to compiler which generates object code, which is given to linker, which forms machine code and is given to loader | 4) Source code is given to compiler which generates object code, which is given to linker, which forms machine code and is given to loader which then makes it executable in processor memory and is executed |
Execution Time vs. Compile Time
Execution Time | Compile Time |
---|---|
1. Execution time is when a program is executing or running. The instructions are in memory and are being processed by the CPU. Additional memory may be allocated and/or deallocated at this time. | 1. Compile time is when your code is being processed by a compiler. In this context, it’s talking about a compiler that is transforming your code into an executable binary. |
2. If the process can be moved during its execution from one memory segment to another, then binding must be delayed until run time. The absolute addresses are generated by hardware. Most general-purpose OSs use this method (Dynamic). | 2. The compiler translates symbolic addresses to absolute addresses. If you know at compile time where the process will reside in memory, then absolute code can be generated (Static). |
3. Executable takes in input (from keyboard, mouse, network, etc.) and generates output. | 3. Take source code, make an executable. |
Symbol Table and Base Table in Assembler
Symbol Table (ST): It is used to generate the address of the symbol address in the program.
Base table (BT): It indicates which registers are currently specified as base registers by USING Pseudo-ops.
Punch Card Workspace and Print Line Workspace
PUNCH CARD workspace: It is used for punching (outputting) the assembled instruction on to cards.
PRINT LINE workspace: It is used for generating a printed assembly listing for the programmer’s reference.
Function of Lexical Phase in Compiler
Lexical Phase
i) Its main task is to read the source program and, if the elements of the program are correct, it generates as output a sequence of tokens that the parser uses for syntax analysis.
ii) The reading or parsing of the source program is called scanning of the source program.
iii) It recognizes keywords, operators and identifiers, integers, floating-point numbers, character strings, and other similar items that form the source program.
iv) The lexical analyzer collects information about tokens into their associated attributes.
Methods to Reduce Different Processes in Compiler
1) Elimination of common subexpression: The elimination of duplicate matrix entries can result in more concise and efficient object programs. The common subexpression must be identical and must be in the same statement.
2) Compile time compute: Doing computation involving constants at compile time saves both space and execution time for the object program.
3) Boolean expression optimization: We may use the properties of Boolean expressions to shorten their computation.
4) Move invariant computation outside of the loops: If computation within a loop depends on a variable that does not change within that loop, then computation may be moved outside the loop.
a) Recognition of invariant computation
b) Discovering where to move the invariant computation.
c) Moving the invariant computation.
Loader Explained with Diagram
The loader is a program which accepts the object program decks, prepares these programs for execution by the computer, and initiates the execution. The loader must perform four functions:
1) Allocate space in memory for the program (allocation)
2) Resolve symbolic references between object decks (linking) 3) Adjust all address-dependent locations, such as address constants, to correspond to the allocated space (relocation) 4) Physically place the machine instruction and data into memory (loading).
Function of Relocating Loader
Relocating loader avoids possible reassembling of all subroutines when a single subroutine is changed and performs the tasks of allocation and linking for the programmer. The BSS loader allows many procedure segments yet only one data segment. The assembler assembles each procedure segment independently and passes on to the loader the text and information as to relocation and intersegment references. The output of a relocating assembler using a BSS scheme is the object program and information about all other programs it references. For each source program, the assembler outputs a text prefixed by a transfer vector that consists of addresses containing names of the subroutines referenced by the source program. The assembler would also provide the loader with additional information such as the length of the entire program and the length of the transfer vector position. After loading the text and the transfer vector into core, the loader would load each subroutine identified in the transfer vector. It would then place a transfer instruction to the corresponding subroutine in each entry in the transfer vector. The BSS loader scheme is other used on compilers with a fixed-length direct address instruction format. The relocation bit solves the problem of relocation, the transfer vector is used to solve the problem of linking, and the program length information solves the problem of allocation.
Use of Macros with Example
Macros are single-line abbreviations for groups of instructions. For every occurrence of this one-line macro instruction, the macro processing assembler will substitute the entire block. By defining the appropriate macro instruction, an assembly language programmer can tailor his own higher-level facility in a convenient manner, at no cost in control over the structure of his program. He can achieve the conciseness and ease in coding of a high-level language without losing the basic advantage of assembly language programming. Integral macro operations simplify debugging and program modification, and they facilitate standardization. Many computer manufacturers use macro instructions to automate the writing of “tailored” operating systems in a process called system generation.
Macro expanded sources
A 1, Data
A 2, Data
A 3, Data
.
.
Data DC F’ S’
Boolean Expression Optimization
Boolean expression optimization: We may use the properties of Boolean expressions to shorten their computation. e.g. In a statement
If a OR b Or c,
Then … when a, b & c are expressions rather than generate code that will always test each expression a, b, c. We generate code so that if a computed as true, then b OR c is not computed, and similarly for b.
Types of Registers
MAR stands for Memory Address Register
This register holds the memory addresses of data and instructions. This register is used to access data and instructions from memory during the execution phase of an instruction.
The program counter (PC), commonly called the instruction pointer (IP) in Intel x86 microprocessors, and sometimes called the instruction address register, or just part of the instruction sequencer in some computers, is a processor register
Accumulator Register
This Register is used for storing the Results those are produced by the System. When the CPU will generate Some Results after the Processing then all the Results will be Stored into the AC Register.
Memory Data Register (MDR)
MDR is the register of a computer‘s control unit that contains the data to be stored in the computer storage (e.g. RAM), or the data after a fetch from the computer storage. It acts like a buffer and holds anything that is copied from the memory ready for the processor to use it. MDR hold the information before it goes to the decoder.
Index Register
A hardware element which holds a number that can be added to (or, in some cases, subtracted from) the address portion of a computer instruction to form an effective address. Also known as base register. An index register in a computer’s CPU is a processor register used for modifying operand addresses during the run of a program.
Memory Buffer Register
MBR stands for Memory Buffer Register. This register holds the contents of data or instruction read from, or written in memory. It means that this register is used to store data/instruction coming from the memory or going to the memory.
Data Register A register used in microcomputers to temporarily store data being transmitted to or from a peripheral device.
Code Optimization Phase
Code optimization Phase: Two types of optimization are performed by the compiler: machine-dependent and machine-independent. Machine-dependent optimization is so intimately related to the instruction that is generated. It was incorporated into the code generation phase. Whereas machine-independent optimization was done in a separate optimization phase.
Purpose of Various Phases of a Compiler
The different phases of the compiler are as follows:
1) Lexical Phase:
i) Its main task is to read the source program and if the elements of the program are correct it generates as output a sequence of tokens that the parser uses for syntax analysis.
ii) The reading or parsing of the source program is called scanning of the source program.
iii) It recognizes keywords, operators and identifiers, integers, floating-point numbers, character strings, and other similar items that form the source program.
iv) The lexical analyzer collects information about tokens into their associated attributes.
2) Syntax Phase:
i) In this phase, the compiler must recognize the phases (syntactic construction); each phrase is a semantic entry and is a string of tokens that has meaning, and 2nd Interpret the meaning of the constructions.
ii) Syntactic analysis also notes syntax errors and assures some sort of recovery. Once the syntax of statement is correct, the second step is to interpret the meaning (semantic). There are many ways of recognizing the basic constructs and interpreting the meaning.
iii) Syntax analysis uses rules (reductions) which specifies the syntax form of source language.
iv) These reductions define the basic syntax construction and appropriate compiler routine (action routine) to be executed when a construction is recognized.
v) The action routine interprets the meaning and generates either code or intermediate form of construction.
3) Interpretation Phase: This phase is typically a routine that is called when a construct is recognized. The purpose of these routines is to on intermediate form of source program and add information to identifier table.
4) Code optimization Phase: Two types of optimization are performed by the compiler: machine-dependent and machine-independent. Machine-dependent optimization is so intimately related to the instruction that is generated. It was incorporated into the code generation phase. Whereas machine-independent optimization was done in a separate optimization phase.
5) Storage Assignment:
The purpose of this phase is as follows:
i) Assign storage to all variables referenced in source program. ii) Assign storage to all temporary locations that are necessary for intermediate results.
iii) Assign storage to literals.
iv) Ensure that storage is allocated and appropriate locations are initialized.
6) Code generation:
i) This phase produces a program which can be in Assembly or machine language.
ii) This phase has a matrix as input.
iii) It uses the code production in the matrix to produce code. iv) It also references the identifier table in order to generate address & code conversion.
7) Assembly phase:
The compiler has to generate the machine language code for the computer to understand. The task to be done is as follows:
i) Generating code
ii) Resolving all references.
iii) Defining all labels.
iv) Resolving literals and symbolic table.