Simple but Powerful Pratt Parsing › BNF [#264]

语法分析理论的顶峰在于发现将线性结构解码为树状结构的上下文无关文法表示法(通常使用 BNF 具体语法):
The pinnacle of syntactic analysis theory is discovering the context free grammar notation (often using BNF concrete syntax) for decoding linear structures into trees:

Item ::=
    StructItem
  | EnumItem
  | ...

StructItem ::=
    'struct' Name '{' FieldList '}'

...

我记得我当时对这个想法非常着迷,特别是它与自然语言句子结构的相似之处。然而,当我们开始描述表达式时,我的乐观情绪很快就消失了。自然表达式文法确实可以让人们看出什么是表达式。
I remember being fascinated by this idea, especially by parallels with natural language sentence structure. However, my optimism quickly waned once we got to describing expressions. The natural expression grammar indeed allows one to see what is an expression.

Expr ::=
    Expr '+' Expr
  | Expr '*' Expr
  | '(' Expr ')'
  | 'number'

虽然这种文法看起来很棒,但实际上它是模糊且不精确的,需要重新编写以适用于自动解析器生成。具体来说,我们需要指定操作符的 优先级 ( precedence ) 结合性 ( associativity ) 。修正后的文法如下:
Although this grammar looks great, it is in fact ambiguous and imprecise, and needs to be rewritten to be amendable to automated parser generation. Specifically, we need to specify precedence and associativity of operators. The fixed grammar looks like this:

Expr ::=
    Factor
  | Expr '+' Factor

Factor ::=
    Atom
  | Factor '*' Atom

Atom ::=
    'number'
  | '(' Expr ')'

对我来说,在这种新的表述中,表达式的“形状”完全丢失了。此外,我在学习了三四门形式语言课程后,才能够可靠地自己创建这种文法。
To me, the “shape” of expressions feels completely lost in this new formulation. Moreover, it took me three or four courses in formal languages before I was able to reliably create this grammar myself.

这就是我为什么喜欢 Pratt 解析⸺它是对递归下降解析算法的改进,使用了优先级和结合性的自然术语来解析表达式,而不是文法混淆技术。
And that’s why I love Pratt parsing — it is an enhancement of recursive descent parsing algorithm, which uses the natural terminology of precedence and associativity for parsing expressions, instead of grammar obfuscation techniques.