Compiler Construction: Phases and Parsing Techniques

Compiler Design and Phases: Lexical Analysis to Code Generation

1. Phases of Compilation

The compilation process is divided into six key phases:

Lexical Analysis

  • Converts the source code into tokens such as keywords, identifiers, operators, and delimiters.
  • Example: For the code int x = 5;, the tokens generated are int, x, =, 5, ;.

Syntax Analysis

  • Checks the structure of code against the grammar rules of the programming language and builds a parse tree.
  • Example: Parsing E → E + T | T for the expression a + b produces a parse tree with + as the root.

Semantic Analysis

  • Ensures the meaning of the code is valid, such as type compatibility, variable declarations, and scope.
  • Example: In int a = "hello";, the type mismatch is detected.

Intermediate Code Generation

  • Converts the code into an intermediate representation that is machine-independent.
  • Example: For a = b + c;, the intermediate code may be: t1 = b + c, a = t1

Code Optimization

  • Refines the intermediate code to improve efficiency, reducing runtime or memory usage.
  • Example: Removing redundant instructions or loops.

Code Generation

  • Converts the optimized intermediate code into target machine code or assembly code.
  • Example: Converts t1 = b + c into ADD R1, B, C for a specific processor.

Complete Diagram of Compiler Phases

Source Code

Lexical Analysis → Tokens

Syntax Analysis → Parse Tree

Semantic Analysis → Annotated Syntax Tree

Intermediate Code Generation → Intermediate Code

Code Optimization → Optimized Intermediate Code

Code Generation → Target Machine Code

2. Parsing Techniques: LL & LR Parsers

LL Parsers (Top-Down Parsing)

  • Reads input from left to right and constructs a leftmost derivation.
  • Suitable for simple grammars without left recursion.
  • Example: Predictive parsing for the grammar E → T + E | T. Parsing id + id involves constructing the parse tree step by step.

LR Parsers (Bottom-Up Parsing)

  • Reads input from left to right and constructs a rightmost derivation in reverse.
  • Can handle a larger set of grammars, including left-recursive grammars.
  • Example: Shift-reduce parsing for E → E + T | T.

Differences:

AspectLL ParserLR Parser
Parsing DirectionTop-downBottom-up
Grammar RestrictionsNo left recursion, simple CFGAll CFGs
Error DetectionLimitedStrong

Examples:

  • LL Parser: Parsing E → T + E | T involves expanding the leftmost symbol first.
  • LR Parser: Shift-reduce parsing handles E → T + E | T using a stack-based approach.

3. Optimization Techniques

Basic Blocks and Flow Graphs

  • Basic Block: A sequence of instructions without branches, except at the start and end.
  • Example: t1 = a + b, t2 = c * d, t3 = t1 + t2
  • Flow Graph: Represents control flow among basic blocks, with nodes as blocks and edges showing the control flow.

Loop Optimization

  • Enhances loop performance by reducing redundancy.
  • Techniques:
    • Code Motion: Moves invariant code outside the loop.
    • Example:
      Before: for (i = 0; i < n; i++) { x = 10; sum += arr[i] * x; }
      After: x = 10; for (i = 0; i < n; i++) { sum += arr[i] * x; }

Peephole Optimization

  • Local optimization of small code segments.
  • Techniques:
    • Redundant Instruction Removal: Eliminates unnecessary instructions.
    • Strength Reduction: Replaces expensive operations with simpler ones.
    • Example: Replace x * 2 with x << 1.

By using these optimization techniques, compilers improve the efficiency of the generated code.