This note presumes general familiarity with the elements of a context free grammar:
If you are a professional compiler writer, you probably
already solved your recursion problems, or have found ways
around them, and either way don't need these notes...
but then again, coverage of this topic is pretty thin in the
textbooks. The techniques presented here are very general
and might help with building a more robust compiler. For
the academics, of course, the topic is well covered in
the literature. Let that be a comfort to the
rest of us.
The following notations are used:
(A, B, C)(t,u,v,w...)(a,b,c...)/1/The following fake grammar will serve as a running example through this note.
start -> A end A -> B F a B -> C w C -> A t /c/ | b D x | D c D -> q A /d/ F -> F u | G t /f/ | C x G -> /g/ F t | b
Make the following modifications to the production rules.
Applying this to the original grammar yields the following equivalent grammar.
A -> /A0/ B F a B -> /B0/ C w C -> /C1/ A t /c/ C -> /C2/ b D x C -> /C3/ D c D -> /D0/ q A /d/ F -> /F1/ F u F -> /F2/ G t /f/ F -> /F3/ C x G -> /G1/ /g/ F t G -> /G2/ b
As we further modify the grammar for purposes of analyzing left recursion, we can track which productions are involved using the null terminal tags. The production rule set is modified as follows to produce a reduced rule set:
Applying these changes to the example grammar yields the following modified set of rules for study:
A -> /A0/ B B -> /B0/ C C -> /C1/ A F -> /F1/ F F -> /F2/ G F -> /F3/ C G -> /G1/ F
These production rules can be interpreted as a list of edges in
a directed graph, and recursions will be cycles in this graph. We
find cycles in the directed graph using the following algorithm:
Set up empty path and recursion sets.
Set up a copy of the production rules.
Repeat
If there are no remaining productions,
Done.
Take a production rule from the set.
Use its expansion to start a new path in the path set.
While the path set is not empty
Remove a path from the path set.
If the final nonterminal has no remaining expansion rules,
Drop the path from the set
Else if the final nonterminal generates one of the
preceeding terms in the path list
Drop the terms prior to the matching term
Drop the final nonterminal
Place the rest as a recursion list in the recursion set
Remove the production rules that generated recursion terms.
Else
For each alternative expansion of the nonterminal
Make an extended list by replacing the final
nonterminal with its expansion
Put the expanded path into the path set
End for
Endif
Endwhile
Endrepeat
After this has been completed, the recursion set will contain all of the left recursions cycles.
To see this algorithm in action, we can play computer for a while and step though the algorithm as it is applied. As this is done, we can observe the following sequence of path sets.
{ }
{ /A0/ B }
{ /A0/ /B0/ C }
{ /A0/ /B0/ /C1/ A }
Rule A generates A0. The sequence A0, B0 and C1 form a left recursion cycle. Drop the A term, move the recursion path terms into the recursion set, and drop rules A0, B0 and C1. The rules remaining in the rule set are
F -> /F1/ F F -> /F2/ G F -> /F3/ C G -> /G1/ F
The path set is now empty. Continue with the next pass of the algorithm.
{ }
{ /F1/ F; /F2/ G; /F3/ C }
Nonterminal F generates /F1/ with is a production for F, so this is a direct recursive cycle. Add the path /F1/ to the recursion set, drop rule /F1/ from the rule set, and continue.
{ /F2/ G; /F3/ C }
{ /F2/ /G1/ F; /F3/ C }
Add cycle /F2/ /G1/ to the recursion set. Remove rules /F2/ and /G1/ from the rule set and continue.
{ /F3/ C }
{ }
Nonterminal C has no remaining production rules in the rule list, so this path is dropped. The path list is empty, and no production rules remain, so the algorithm terminates. At the completion of the algorithm, the following recursion cycles have been found.
/A0/ /B0/ /C1/ /F1/ /F2/ /G1/
Now all left recursions are known, along with all of the production rules that cause them. Now we can address the problem of removing them.
To the next part
Back to the technology page
Site: RidgeRat's Barely Operating Site http://home.earthlink.net/~ltrammell Author: Larry Trammell Created: Nov 14, 2002 Revised: Feb 3, 2003 Status: Experimental Contact: NOSPAM ltramme1476 AT earthlink DOT net NOSPAM Related: Parts 2 and 3 Restrictions: This information is public