当前位置：首页 > news >正文

别再死记硬背了！用Python手把手教你自动构造LR(0)和SLR(1)分析表

news 2026/6/11 18:51:01

用Python自动化构建LR(0)和SLR(1)分析表：从理论到代码实现

编译原理中的语法分析阶段常常让学习者望而生畏，尤其是构造LR分析表的过程。传统的手工绘制状态机和填写表格不仅耗时耗力，还容易出错。本文将带你用Python实现LR(0)和SLR(1)分析表的自动构造，通过代码理解算法本质，告别死记硬背的学习方式。

1. 基础准备：文法表示与数据结构

在开始编码前，我们需要设计合适的数据结构来表示文法和分析表。Python的类和字典结构非常适合这种场景。

class Production: def __init__(self, head, body): self.head = head # 产生式左部 self.body = body # 产生式右部（列表形式） def __str__(self): return f"{self.head} -> {' '.join(self.body)}" class Grammar: def __init__(self): self.productions = [] # 产生式集合 self.terminals = set() # 终结符集合 self.non_terminals = set() # 非终结符集合 self.start_symbol = None # 开始符号

关键数据结构说明：

Production类封装了文法的单个产生式
Grammar类管理整个文法的元信息
使用集合存储终结符和非终结符便于快速查找

2. 拓广文法与项目表示

拓广文法是构造LR分析表的第一步，我们需要在原始文法基础上添加一个新的开始符号。

def augment_grammar(grammar): """拓广文法：添加新的开始符号S'""" augmented = Grammar() augmented.start_symbol = grammar.start_symbol + "'" augmented.productions.append(Production( augmented.start_symbol, [grammar.start_symbol] )) # 复制原始文法的产生式 for prod in grammar.productions: augmented.productions.append(prod) return augmented

LR项目是带有点标记的产生式，我们可以扩展Production类来表示：

class LRItem: def __init__(self, production, dot_pos=0): self.production = production self.dot_pos = dot_pos # 点的位置 def next_symbol(self): """获取点后面的符号""" if self.dot_pos < len(self.production.body): return self.production.body[self.dot_pos] return None def is_reduce_item(self): """判断是否为归约项目""" return self.dot_pos == len(self.production.body)

3. 构造项目集规范簇

项目集规范簇(Canonical Collection)是LR分析的核心，我们需要实现闭包(closure)和转移(goto)操作。

def closure(items, grammar): """计算项目集的闭包""" closure_set = set(items) changed = True while changed: changed = False for item in list(closure_set): next_sym = item.next_symbol() if next_sym in grammar.non_terminals: # 添加该非终结符的所有产生式 for prod in grammar.productions: if prod.head == next_sym: new_item = LRItem(prod, 0) if new_item not in closure_set: closure_set.add(new_item) changed = True return frozenset(closure_set) def goto(items, symbol, grammar): """计算转移函数""" new_items = set() for item in items: if item.next_symbol() == symbol: new_items.add(LRItem(item.production, item.dot_pos + 1)) return closure(new_items, grammar)

算法要点：

闭包操作会递归添加所有可能的项目
转移函数模拟DFA的状态转移
使用frozenset保证项目集的不可变性

4. 构建LR(0)分析表

有了项目集规范簇后，我们可以构造LR(0)分析表。分析表通常包含ACTION和GOTO两部分。

def build_lr0_table(grammar): """构造LR(0)分析表""" augmented = augment_grammar(grammar) start_item = LRItem(augmented.productions[0], 0) c0 = closure({start_item}, augmented) # 初始化分析表 action_table = {} goto_table = {} # 状态编号到项目集的映射 state_id = {c0: 0} queue = [c0] next_id = 1 while queue: current = queue.pop(0) current_id = state_id[current] # 处理归约项目 for item in current: if item.is_reduce_item(): if item.production.head == augmented.start_symbol: # 接受动作 action_table[(current_id, '#')] = ('acc',) else: # 归约动作 prod_num = augmented.productions.index(item.production) for term in augmented.terminals.union({'#'}): action_table[(current_id, term)] = ('r', prod_num) # 处理转移 symbols = set() for item in current: sym = item.next_symbol() if sym: symbols.add(sym) for sym in symbols: new_items = goto(current, sym, augmented) if new_items not in state_id: state_id[new_items] = next_id queue.append(new_items) next_id += 1 new_state = state_id[new_items] if sym in augmented.terminals: action_table[(current_id, sym)] = ('s', new_state) else: goto_table[(current_id, sym)] = new_state return action_table, goto_table

5. 扩展为SLR(1)分析表

SLR(1)在LR(0)基础上增加了FOLLOW集的处理，能解决部分冲突。我们需要先实现FOLLOW集的计算。

def compute_follow(grammar): """计算非终结符的FOLLOW集""" follow = {nt: set() for nt in grammar.non_terminals} follow[grammar.start_symbol].add('#') changed = True while changed: changed = False for prod in grammar.productions: for i, sym in enumerate(prod.body): if sym in grammar.non_terminals: # 规则1：A -> αBβ，将FIRST(β)-{ε}加入FOLLOW(B) first_beta = set() for next_sym in prod.body[i+1:]: if next_sym in grammar.terminals: first_beta.add(next_sym) break else: first_beta.update( first.get(next_sym, set()) - {None} ) if None not in first.get(next_sym, set()): break else: first_beta.add(None) if None in first_beta: first_beta.remove(None) old_size = len(follow[sym]) follow[sym].update(follow[prod.head]) if len(follow[sym]) > old_size: changed = True old_size = len(follow[sym]) follow[sym].update(first_beta) if len(follow[sym]) > old_size: changed = True return follow

基于FOLLOW集，我们可以修改分析表构造逻辑：

def build_slr1_table(grammar): """构造SLR(1)分析表""" action, goto = build_lr0_table(grammar) follow = compute_follow(grammar) # 解决冲突：检查每个状态中的归约项目 for state in range(len(set(s for s, _ in action.keys()))): reduce_items = [] for (s, sym), act in list(action.items()): if s == state and act[0] == 'r': reduce_items.append((sym, act[1])) # 检查移进-归约冲突 for (s, sym), act in list(action.items()): if s == state and act[0] == 's': for (rsym, rprod) in reduce_items: if sym == rsym: # 检查FOLLOW集是否相交 prod_head = grammar.productions[rprod].head if sym in follow[prod_head]: raise ValueError( f"SLR(1)冲突：状态{state}符号{sym}" ) return action, goto

6. 实际应用与冲突处理

在实际编码中，我们经常会遇到各种冲突情况。让我们通过一个具体例子来演示如何处理。

示例文法：

E -> E + T | T T -> T * F | F F -> ( E ) | id

首先定义文法：

grammar = Grammar() grammar.start_symbol = 'E' grammar.non_terminals = {'E', 'T', 'F'} grammar.terminals = {'+', '*', '(', ')', 'id'} grammar.productions = [ Production('E', ['E', '+', 'T']), Production('E', ['T']), Production('T', ['T', '*', 'F']), Production('T', ['F']), Production('F', ['(', 'E', ')']), Production('F', ['id']) ]

构造分析表并测试：

# 构造LR(0)分析表 lr0_action, lr0_goto = build_lr0_table(grammar) # 构造SLR(1)分析表 slr1_action, slr1_goto = build_slr1_table(grammar) # 比较两者的差异 print("LR(0) ACTION表大小:", len(lr0_action)) print("SLR(1) ACTION表大小:", len(slr1_action))

常见问题处理：

移进-归约冲突：当同一单元格需要同时填移进和归约时，SLR(1)会检查FOLLOW集
归约-归约冲突：当同一单元格有多个归约项时，通常需要更强大的分析器如LR(1)或LALR(1)
表项缺失：可能意味着文法不是LR(0)或SLR(1)的，需要考虑错误恢复机制

7. 可视化与分析表调试

为了更好理解分析表的构造过程，我们可以添加可视化功能：

def print_table(action, goto, grammar): """打印分析表""" terminals = sorted(grammar.terminals.union({'#'})) non_terminals = sorted(grammar.non_terminals) print("状态\t" + "\t".join(terminals) + "\t" + "\t".join(non_terminals)) max_state = max(s for s, _ in action.keys()) for state in range(max_state + 1): row = [str(state)] for term in terminals: key = (state, term) if key in action: act = action[key] if act[0] == 's': row.append(f"s{act[1]}") elif act[0] == 'r': row.append(f"r{act[1]}") else: row.append(act[0]) else: row.append("") for nt in non_terminals: key = (state, nt) if key in goto: row.append(str(goto[key])) else: row.append("") print("\t".join(row))

这个可视化函数可以帮助我们直观地比较LR(0)和SLR(1)分析表的差异，特别是在冲突处理上的不同。

查看全文

http://www.rkmt.cn/news/1505624.html