Compiler Construction: Phases and Parsing Techniques
Compiler Design and Phases: Lexical Analysis to Code Generation
1. Phases of Compilation
The compilation process is divided into six key phases:
Lexical Analysis
- Converts the source code into tokens such as keywords, identifiers, operators, and delimiters.
- Example: For the code
int x = 5;
, the tokens generated areint
,x
,=
,5
,;
.
Syntax Analysis
- Checks the structure of code against the grammar rules of the programming language and builds a parse tree.
- Example: Parsing
E → E + T | T
for the expressiona + b
produces a parse tree with+
as the root.
Semantic Analysis
- Ensures the meaning of the code is valid, such as type compatibility, variable declarations, and scope.
- Example: In
int a = "hello";
, the type mismatch is detected.
Intermediate Code Generation
- Converts the code into an intermediate representation that is machine-independent.
- Example: For
a = b + c;
, the intermediate code may be:t1 = b + c
,a = t1
Code Optimization
- Refines the intermediate code to improve efficiency, reducing runtime or memory usage.
- Example: Removing redundant instructions or loops.
Code Generation
- Converts the optimized intermediate code into target machine code or assembly code.
- Example: Converts
t1 = b + c
intoADD R1, B, C
for a specific processor.
Complete Diagram of Compiler Phases
Source Code
↓
Lexical Analysis → Tokens
↓
Syntax Analysis → Parse Tree
↓
Semantic Analysis → Annotated Syntax Tree
↓
Intermediate Code Generation → Intermediate Code
↓
Code Optimization → Optimized Intermediate Code
↓
Code Generation → Target Machine Code
2. Parsing Techniques: LL & LR Parsers
LL Parsers (Top-Down Parsing)
- Reads input from left to right and constructs a leftmost derivation.
- Suitable for simple grammars without left recursion.
- Example: Predictive parsing for the grammar
E → T + E | T
. Parsingid + id
involves constructing the parse tree step by step.
LR Parsers (Bottom-Up Parsing)
- Reads input from left to right and constructs a rightmost derivation in reverse.
- Can handle a larger set of grammars, including left-recursive grammars.
- Example: Shift-reduce parsing for
E → E + T | T
.
Differences:
Aspect | LL Parser | LR Parser |
---|---|---|
Parsing Direction | Top-down | Bottom-up |
Grammar Restrictions | No left recursion, simple CFG | All CFGs |
Error Detection | Limited | Strong |
Examples:
- LL Parser: Parsing
E → T + E | T
involves expanding the leftmost symbol first. - LR Parser: Shift-reduce parsing handles
E → T + E | T
using a stack-based approach.
3. Optimization Techniques
Basic Blocks and Flow Graphs
- Basic Block: A sequence of instructions without branches, except at the start and end.
- Example:
t1 = a + b
,t2 = c * d
,t3 = t1 + t2
- Flow Graph: Represents control flow among basic blocks, with nodes as blocks and edges showing the control flow.
Loop Optimization
- Enhances loop performance by reducing redundancy.
- Techniques:
- Code Motion: Moves invariant code outside the loop.
- Example:
Before:for (i = 0; i < n; i++) { x = 10; sum += arr[i] * x; }
After:x = 10; for (i = 0; i < n; i++) { sum += arr[i] * x; }
Peephole Optimization
- Local optimization of small code segments.
- Techniques:
- Redundant Instruction Removal: Eliminates unnecessary instructions.
- Strength Reduction: Replaces expensive operations with simpler ones.
- Example: Replace
x * 2
withx << 1
.
By using these optimization techniques, compilers improve the efficiency of the generated code.