Detecting left recursion cycles

To the next part
Back to the technology page

This note presumes general familiarity with the elements of a context free grammar:

If you are a professional compiler writer, you probably already solved your recursion problems, or have found ways around them, and either way don't need these notes... but then again, coverage of this topic is pretty thin in the textbooks. The techniques presented here are very general and might help with building a more robust compiler. For the academics, of course, the topic is well covered in the literature. Let that be a comfort to the rest of us.

The following notations are used:

UPPER CASE (A, B, C)
nonterminal, target symbol of a production in the grammar
lower case high (t,u,v,w...)
a sequence that can be mixed terminal and nonterminal symbols
lower case low (a,b,c...)
the name of a terminal token to be matched exactly
term in slashes /1/
a null terminal placeholder identifying actions or locations without matching any of the input text

The following fake grammar will serve as a running example through this note.

  start -> A end
  A -> B F a
  B -> C w
  C -> A t /c/ |  b D x  |  D c
  D -> q A /d/
  F -> F u  |  G t /f/  |  C x
  G -> /g/ F t  |  b

Preliminary transformations

Make the following modifications to the production rules.

Applying this to the original grammar yields the following equivalent grammar.

  A -> /A0/ B F a
  B -> /B0/ C w
  C -> /C1/ A t /c/
  C -> /C2/ b D x
  C -> /C3/ D c
  D -> /D0/ q A /d/
  F -> /F1/ F u
  F -> /F2/ G t /f/
  F -> /F3/ C x
  G -> /G1/ /g/ F t
  G -> /G2/  b

As we further modify the grammar for purposes of analyzing left recursion, we can track which productions are involved using the null terminal tags. The production rule set is modified as follows to produce a reduced rule set:

Applying these changes to the example grammar yields the following modified set of rules for study:

  A -> /A0/ B
  B -> /B0/ C
  C -> /C1/ A
  F -> /F1/ F
  F -> /F2/ G
  F -> /F3/ C
  G -> /G1/ F

These production rules can be interpreted as a list of edges in a directed graph, and recursions will be cycles in this graph. We find cycles in the directed graph using the following algorithm:

  Set up empty path and recursion sets.
  Set up a copy of the production rules.
  Repeat
    If there are no remaining productions,
      Done.
    Take a production rule from the set.
    Use its expansion to start a new path in the path set.
    While the path set is not empty
      Remove a path from the path set.
      If the final nonterminal has no remaining expansion rules,
        Drop the path from the set
      Else if the final nonterminal generates one of the
      preceeding terms in the path list
        Drop the terms prior to the matching term
        Drop the final nonterminal
        Place the rest as a recursion list in the recursion set
        Remove the production rules that generated recursion terms.
      Else
        For each alternative expansion of the nonterminal
          Make an extended list by replacing the final
          nonterminal with its expansion
          Put the expanded path into the path set
        End for
      Endif
    Endwhile
  Endrepeat

After this has been completed, the recursion set will contain all of the left recursions cycles.

To see this algorithm in action, we can play computer for a while and step though the algorithm as it is applied. As this is done, we can observe the following sequence of path sets.

  { }
  { /A0/ B }
  { /A0/ /B0/ C }
  { /A0/ /B0/ /C1/ A }

Rule A generates A0. The sequence A0, B0 and C1 form a left recursion cycle. Drop the A term, move the recursion path terms into the recursion set, and drop rules A0, B0 and C1. The rules remaining in the rule set are

  F -> /F1/ F
  F -> /F2/ G
  F -> /F3/ C
  G -> /G1/ F

The path set is now empty. Continue with the next pass of the algorithm.

  {  }
  { /F1/ F; /F2/ G; /F3/ C }

Nonterminal F generates /F1/ with is a production for F, so this is a direct recursive cycle. Add the path /F1/ to the recursion set, drop rule /F1/ from the rule set, and continue.

  { /F2/ G; /F3/ C }
  { /F2/ /G1/ F; /F3/ C }

Add cycle /F2/ /G1/ to the recursion set. Remove rules /F2/ and /G1/ from the rule set and continue.

  { /F3/ C }
  { }

Nonterminal C has no remaining production rules in the rule list, so this path is dropped. The path list is empty, and no production rules remain, so the algorithm terminates. At the completion of the algorithm, the following recursion cycles have been found.

  /A0/ /B0/ /C1/
  /F1/
  /F2/ /G1/

Now all left recursions are known, along with all of the production rules that cause them. Now we can address the problem of removing them.

To the next part
Back to the technology page


Site:    RidgeRat's Barely Operating Site    http://home.earthlink.net/~ltrammell
Author:  Larry Trammell   Created: Nov 14, 2002    Revised: Feb 3, 2003   Status: Experimental
Contact: NOSPAM ltramme1476 AT earthlink DOT net NOSPAM
Related: Parts 2 and 3    Restrictions: This information is public