Simple but Powerful Pratt Parsing › 递归下降和左递归 Recursive descent and left-recursion [#265]

手写解析器的最简单技术是 递归下降 ( recursive descent ) ,它将文法建模为一组相互递归的函数。例如,上述文法片段可以写成这样:
The simplest technique for hand-writing a parser is recursive descent, which models the grammar as a set of mutually recursive functions. For example, the above item grammar fragment can look like this:

fn item(p: &mut Parser) {
    match p.peek() {
        STRUCT_KEYWORD => struct_item(p),
        ENUM_KEYWORD   => enum_item(p),
        ...
    }
}

fn struct_item(p: &mut Parser) {
    p.expect(STRUCT_KEYWORD);
    name(p);
    p.expect(L_CURLY);
    field_list(p);
    p.expect(R_CURLY);
}

...

传统上,教科书指出左递归文法是这种方法的 阿喀琉斯之踵 ( Achilles heel ) ,并利用这一缺点来促使产生更高级的 LR 解析技术。一种有问题的文法示例如下:
Traditionally, text-books point out left-recursive grammars as the Achilles heel of this approach, and use this drawback to motivate more advanced LR parsing techniques. An example of problematic grammar can look like this:

Sum ::=
    Sum '+' Int
  | Int

确实,如果我们朴素地编码 sum 函数,这不会太有用:
Indeed, if we naively code the sum function, it wouldn’t be too useful:

fn sum(p: &mut Parser) {
    // Try first alternative
    sum(p); 
    p.expect(PLUS);
    int(p);

    // If that fails, try the second one
    ...
}

  1. 到这里,我们会立即进入循环并导致堆栈溢出
(1): At this point we immediately loop and overflow the stack

理论上讲,解决这个问题需要重写语法以消除左递归。然而在实际操作中,对于手写解析器,解决方案要简单得多⸺打破纯递归范式,使用循环:
A theoretical fix to the problem involves rewriting the grammar to eliminate the left recursion. However in practice, for a hand-written parser, a solution is much simpler — breaking away with a pure recursive paradigm and using a loop:

fn sum(p: &mut Parser) {
    int(p);
    while p.eat(PLUS) {
        int(p);
    }
}