LR parsing

views updated

LR parsing A bottom-up parsing technique, LR standing for Left-to-right Rightmost derivation sequence. Originally developed by D. E. Knuth, it is the most powerful left-to-right, no backtracking parsing method for context-free grammars.

An LR parser consists of a pushdown stack, a parsing table, and a driving routine. The driving routine is the same for all grammars. The stack is manipulated by the driving routine using the information contained in the top stack element and the next k symbols in the input stream (called the k lookahead); k is an integer ≥0, but for most practical purposes k = 1. The stack consists of a string s₀X₀s₁X₁…s_nX_ns_n+1

where each X_i is a symbol of the input grammar and each s_i is called a state.

The parsing table is indexed by pairs (s,a) where s is a state and a is the lookahead. Each entry in the table has two parts: (a) an action, which may be shift, reduce p (for some production p), accept, or error, and (b) a state, called the goto state. When the action is shift, the next input symbol and goto state are pushed onto the stack (in that order). When the action is reduce p the top 2l elements of the stack will spell the right-hand side of p but with goto states interspersed, where l is the length of this right-hand side. These 2l elements are popped from the stack and replaced by the left-hand side of p and the new goto state. This operation corresponds to adding a new node to the parse tree for the input string. The accept action is only encountered when the start symbol S is the only symbol on the stack (i.e. the stack contains s₀Ss₁ for some states s₀ and s₁) and the lookahead is the end-of-input symbol. It signifies that parsing has been successfully completed. On the other hand an error entry in the parse table indicates an error in the input string.

A grammar that can be parsed by an LR parser using k-symbol lookahead is called an LR(k) grammar. The power of the LR parsing method comes from the fact that the LR(1) grammars properly include other grammar types like precedence grammars and LL(1) grammars (see LL parsing). This and its efficiency make it a popular choice of parsing method in compiler-compilers. If a grammar is not LR(1) this will be evidenced as multiply defined entries in the parsing tables called shift-reduce conflicts or reduce-reduce conflicts.

Many different parsing tables may be constructed for one grammar, each differing in the number of states it defines. The so-called canonical LR table tends to be too long for practical purposes and it is commonly replaced by an SLR (simple LR) or LALR (lookahead LR) table. A grammar that is LR(1) may not however be SLR(1) or LALR(1).