Left Recursion Elimination for Context Free Grammars (CFG) implementation in java.
In this project the context-free grammar (CFG) left-recursion elimination algorithm was implemented.
Recall that a CFG is a quadruple (V, Σ, R, S) where V and Σ are disjoint alphabets (respectively, containing variables and terminals), R ⊆ V × (V ∪ Σ)∗ is a set of rules, and S ∈ V is the start variable.
- We make the following assumptions about input CFGs for simplicity.
- The set V of variables consists of upper-case English symbols.
- The start variable is the symbol S.
- The set Σ of terminals consists of lower-case English symbols (except the letter e).
- We only consider CFGs with no cycles and no ε-rules.
- A function LRE which takes an input string encoding a CFG and returns a string encoding an equivalent CFG which is not left-recursive is implemented.
- A string encoding a CFG is a semi-colon separated sequence of items. Each item represents a largest set of rules with the same left-hand side and is a comma-separated sequence of strings. The first string of each item is a member of V , representing the common left-hand side. The first string of the first item is S.
For example, consider the CFG ({S, T, L}, {i, a, b, c, d}, R, S), where R is given by the following productions.
S −→ S c T | S a | T | b
T −→ a S b | i a L b | i
L −→ S d L | S
This CFG will have the following string encoding, spaces are added for the ease of reading.
S, ScT, Sa, T, b; T, aSb, iaLb, i; L, SdL, S
- The function LRE will assume the ordering of variables as they appear in the string encoding of the CFG. Thus, in the above example, the variables are ordered thus: S, T, L.
- LRE returns a string encoding the resulting CFG where a newly-introduced variable, for the elimination of immediate left-recursion for variable A, is the string A. The letter e denotes the empty string.
- Thus, for the above example, the output should be as follows, spaces are added for the ease of reading.
S, TS' , bS' ; S', cTS', aS', e; T, aSb, iaLb, i;L, aSbS'dL, iaLbS'dL, iS'dL, bS'dL, aSbS', iaLbS', iS', bS'