Regular and Context-Free Grammars
ITEM 3 – Context-Independent Language
3.1 Regular Grammars
Regular features include a special case of length independent of context, and therefore part defined by the generated AF may be for grammars (LR -> expr. regular AF; LIC -> gram. Indep. automata to context and battery).
A regular grammar G is a 4-tuple G = (N, Σ, P, S) where Σ is an alphabet, N is a collection of nonterminals, S is a nonterminal called the start symbol and P is a collection of substitution rules, called productions (production rules or rewriting rules) that are of the form A -> w, where A ∈ N and w ∈ (Σ U N)* that satisfies the following:
- w contains at most one non-terminal.
- If w contains a nonterminal, then this is the symbol that is in the far right of w.
The language generated by the regular grammar G is denoted by L(G). L(G) = {w ∈ Σ* | S =*> w}
- “->” is interpreted as “is replaced by”.
- “|” Is interpreted as “or”.
- “=>” Is interpreted as “drift”, “occur” or “generates”.
- “=*>” Indicates 0 or more derivation steps.
- “=+>” Indicates 1 or more derivation steps.
- “=p>” Indicates the result of p derivation steps.
3.2 Regular Grammars (GR’s) and Regular Languages (LR’s)
Suppose that L is an LR. A GR can be obtained from L that generates an AF M = (Q, Σ, δ, q0, F) for which L = L(M).
AF -> GR
We define G = (N, Σ, S, P): N = Q, Σ = Σ, S = q0, P = {q -> a p | δ(q, a) = p} U {q -> ε | q ∈ F}, i.e., q -> a p whenever δ(q, a) = p and q -> ε if q is an accepting state of the AF.
All LR is generated by a GR.
GR -> AF
It can be constructed from a regular grammar G = (N, Σ, P, S) to construct an AF M = (Q, Σ, δ, s, F) of the form that L(G) = L(M): Q = N U {f} where f is a new symbol, s = S, F = {f} and δ is constructed as follows from the productions of P:
- If A -> aB is a production of P with A and B as non-terminal, then Q will be added to the new states q1, q2,…, qn-1 and the following transitions: δ(A, a) = q1, δ(q1, ε) = q2,…, δ(qn-1, ε) = B.
- If A -> a is a production of P, then Q will be added to the new states q1, q2,…, qn-1 and do δ(A, a) = q1, δ(q1, ε) = q2,…, δ(qn-1, ε) = f.
Theorem: L is regular <=> L is generated by a GR.
Given an AF, you can build a CR in 2 ways:
- Get AFD equivalent to that given, and from that construct the grammar.
- Get regular expression associated with AF and from it, build the grammar.
3.3 Context-Free Grammar (GIC’s)
A GIC is a 4-tuple G = (N, Σ, P, S) where N is a finite collection of non-terminals, Σ is an alphabet (set of terminals), S is a nonterminal (start symbol) and P content N x (N U Σ)* (we can put all the terminals on the right) is a set of productions.
GIC’s are such that there is always a non-terminal and a production for it, so you can run (“I make the change”).
The language generated by the GIC G is denoted by L(G) and is called language independent context (LIC).
The LR is a subset of LR LIC.
All that is generated by a GR and GR is independent of any context -> Everything is context-independent LR.
3.4 Derivation Trees and Ambiguity
Sometimes it is useful to perform a graph that indicates the derivation such that each non-terminal has helped shape the final string of terminal symbols. Such a graph is called a derivation tree.
A derivation tree for a given derivation is constructed by creating a root node that is labeled with the initial symbol. The root node has child nodes that correspond to each symbol that appears on the right side of the production used to replace the initial symbol. Every child labeled with a nonterminal also has some children nodes labeled with symbols of the right side of the production used to replace that nonterminal. The nodes that have no children (leaf nodes) must be labeled with terminal symbols.
There is sometimes a grammar that for a derivation of a string can have different derivation trees -> a grammar is ambiguous if there are 2 or more distinct derivation trees for the same string. A grammar in which for every string w, all derivations have the same derivation tree is unambiguous.
In some cases, if the grammar is ambiguous, you can find another grammar that produces the same language (equivalent) but that is not ambiguous (not always possible).
If all the GIC’s for a language are ambiguous, we say that the language is an inherently ambiguous LIC (you cannot build an unambiguous grammar for that language).
- A derivation by the left: the nonterminal that expands is always the leftmost.
- In a derivation by the right: the nonterminal that expands is always the rightmost.
3.5 Simplification of GIC’s
Given G = (N, P, S) to transform it into a G’ = (N’, P’, S) such that L(G) = L(G’).
Algorithms for Cleaning Up
A) Elimination of Useless Non-Terminals
- Initialize N’ to all nonterminals A such that A -> w ∈ P, with w ∈ Σ*.
- Initialize P’ with every production A -> w for which A ∈ N’ and w ∈ Σ*.
- Repeat: add to N’ all the nonterminals A that A -> w, with w ∈ (Σ U N’)* is a rule of P, and add this rule to P’ also until you cannot add more non-terminals to N’ (This loop always ends because N and P are finite).