Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
6a99107
Parsing VM initial structure
gvanrossum May 26, 2020
8ffc315
Hook things up so we can test it
gvanrossum May 27, 2020
a9ba946
Add debugging printf()s in anger; fix bugs
gvanrossum May 27, 2020
8635a4c
Implement actions
gvanrossum May 27, 2020
1a34aeb
Support optional tokens
gvanrossum May 27, 2020
b4a2292
Support optional rules
gvanrossum May 27, 2020
b1f2a03
Add vmreadme.md; OP_SUCCESS has an argument
gvanrossum May 28, 2020
50aa868
Do optional items differently (with a postfix op)
gvanrossum May 28, 2020
320266b
Compute start/end line/col numbers; add some ideas to vmreadme.md
gvanrossum May 28, 2020
5bf3f9c
Tighten the code; add some speculation to vmreadme
gvanrossum May 28, 2020
816db75
Add OP_NOOP; add enums for rules & actions
gvanrossum May 28, 2020
d2a980f
Implement loops
gvanrossum May 29, 2020
2befb12
Add a few more rules to the grammar
gvanrossum May 29, 2020
175a127
Drop debug printf()s, more flexibility in parse_string()
gvanrossum May 29, 2020
f3f1665
Add memoization, some debug niceties
gvanrossum May 29, 2020
bda3517
Inline helper functions
gvanrossum May 29, 2020
881d756
Explain OP_OPTIONAL better
gvanrossum May 29, 2020
94cfb95
Skeleton of code generator
pablogsal May 29, 2020
4ba6c61
Simplify structure of OP_SUCCESS
gvanrossum May 29, 2020
db628e1
Move opcodes around
gvanrossum May 29, 2020
7798629
Add a 'grammar' for operations
gvanrossum May 29, 2020
412c741
Move generated part of vm.h into vmparse.h
gvanrossum May 29, 2020
020fbd1
Merge branch 'pegenvm_generator' into pegenvm
lysnikolaou May 29, 2020
8c7cffc
Merge branch 'master' into pegenvm
lysnikolaou May 29, 2020
ef1fabd
Clean skeleton of vm_generator
pablogsal May 30, 2020
b497b23
Merge branch 'pegenvm' of github.com:we-like-parsers/cpython into peg…
lysnikolaou May 30, 2020
8eab7e0
Better formatting of generated file; remove unneeded indentation
lysnikolaou May 30, 2020
4a5f823
Add OP_LOOP_COLLECT_NONEMPTY -- used for a+
gvanrossum May 30, 2020
3cee73c
Expand description of root rules
gvanrossum May 30, 2020
9b96df0
Initial support for repeat_0
pablogsal May 30, 2020
2f44ee9
Fix name rules for repeat0 nodes
pablogsal May 30, 2020
6dc7092
Eliminate OP_LOOP_START
gvanrossum May 30, 2020
9e7e12e
Do fewer reallocs (at the cost of an extra int per frame)
gvanrossum May 30, 2020
52f5a75
Speculate how to implement a.b+
gvanrossum May 30, 2020
3b93237
Make memo rule types distinct from token types
gvanrossum May 30, 2020
33522ae
Fix small issues in vmreadme.pm
gvanrossum May 30, 2020
cbd45a5
Add generation of root rules (very coarssely)
gvanrossum May 31, 2020
1a6531d
Add enum for rule types (R_)
gvanrossum May 31, 2020
0226dd5
Generate actions (primitively)
gvanrossum May 31, 2020
c64ff2f
Implement code generation for keywords
lysnikolaou May 31, 2020
e2c4a36
Refactor add_opcode to optionally accept a second argument
lysnikolaou May 31, 2020
21d8b83
Translate item names in actions; use the generated vmparse.h!
gvanrossum May 31, 2020
6fe4f0e
Fix mypy (in vm_generator)
gvanrossum May 31, 2020
6205002
Avoid name conflict for 'f'
gvanrossum May 31, 2020
f72c7b6
Generate code for repeat1 loops
gvanrossum Jun 1, 2020
fc3d4c4
Implement delimited loops (b.a+)
gvanrossum Jun 1, 2020
6c50468
Generate code for delimited loop
gvanrossum Jun 1, 2020
a10babb
Implement soft keywords (hand-written and code generation) (#129)
lysnikolaou Jun 1, 2020
b13169b
Update generated vmparse.h
gvanrossum Jun 1, 2020
c2a7cf6
Fix code generation for if_stmt
gvanrossum Jun 1, 2020
a862a69
Implement lookahead ops
gvanrossum Jun 1, 2020
ab863df
Generate code for lookaheads (only one token supported!)
gvanrossum Jun 1, 2020
1be4b62
Implement left-recursion (with hand-coded vmparse.h)
gvanrossum Jun 2, 2020
b53c2a4
Code generation for left-recursive rules
gvanrossum Jun 2, 2020
7a59aa0
Allow specifying different grammars
gvanrossum Jun 2, 2020
cac4149
Generate code for 'cut'
gvanrossum Jun 2, 2020
180dfff
Support groups and optional in code generator
gvanrossum Jun 2, 2020
e01e643
There's no need to special-case -> in actions
gvanrossum Jun 2, 2020
13c3bbb
Treat TYPE_COMMENT as a token (since it is)
gvanrossum Jun 3, 2020
65304e6
Generate code for Grammar/parser.gram
gvanrossum Jun 3, 2020
10f7be1
Group every opcode with its argument (#131)
pablogsal Jun 3, 2020
a9a4115
Add vm target to pegen script to generate the vm parser (#130)
lysnikolaou Jun 3, 2020
5bb2f57
Selective memoization
gvanrossum Jun 3, 2020
ba8783b
Don't call is_memoized in OP_RETURN_LEFT_REC
gvanrossum Jun 3, 2020
bccd5c8
Different way of doing left-recursion
gvanrossum Jun 5, 2020
2a138cc
Merge remote-tracking branch 'upstream/master' into pegenvm
gvanrossum Aug 16, 2020
2394b96
Remove leftover conflict markers
gvanrossum Aug 16, 2020
be34499
Fix deps for vm.o
gvanrossum Aug 16, 2020
b9394e4
Fix includes for vm.c
gvanrossum Aug 16, 2020
b64113b
Regenerated vmparse.h
gvanrossum Aug 16, 2020
9afa67e
Merge remote-tracking branch 'pegen/pegenvm'
pablogsal Aug 1, 2022
a65604e
bpo-40222: Mark exception table function in the dis module as private
pablogsal Aug 13, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add vm target to pegen script to generate the vm parser (#130)
  • Loading branch information
lysnikolaou authored Jun 3, 2020
commit a9a4115a9aa4ee54688b0d4f284fa23c0117c5b5
3 changes: 3 additions & 0 deletions Tools/peg_generator/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@ build: peg_extension/parse.c
peg_extension/parse.c: $(GRAMMAR) $(TOKENS) pegen/*.py peg_extension/peg_extension.c ../../Parser/pegen/pegen.c ../../Parser/pegen/parse_string.c ../../Parser/pegen/*.h pegen/grammar_parser.py
$(PYTHON) -m pegen -q c $(GRAMMAR) $(TOKENS) -o peg_extension/parse.c --compile-extension

generate_vm: $(GRAMMAR) $(TOKENS) pegen/*.py ../../Parser/pegen/pegen.c ../../Parser/pegen/parse_string.c ../../Parser/pegen/*.h
$(PYTHON) -m pegen -q vm $(GRAMMAR) $(TOKENS) -o ../../Parser/pegen/vmparse.h

clean:
-rm -f peg_extension/*.o peg_extension/*.so peg_extension/parse.c
-rm -f data/xxl.py
Expand Down
37 changes: 37 additions & 0 deletions Tools/peg_generator/pegen/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,31 @@
from pegen.build import Grammar, Parser, Tokenizer, ParserGenerator


def generate_vm_code(
args: argparse.Namespace,
) -> Tuple[Grammar, Parser, Tokenizer, ParserGenerator]:
from pegen.build import build_vm_parser_and_generator

verbose = args.verbose
verbose_tokenizer = verbose >= 3
verbose_parser = verbose == 2 or verbose >= 4
try:
grammar, parser, tokenizer, gen = build_vm_parser_and_generator(
args.grammar_filename,
args.tokens_filename,
args.output,
verbose_tokenizer,
verbose_parser,
)
return grammar, parser, tokenizer, gen
except Exception as err:
if args.verbose:
raise # Show traceback
traceback.print_exception(err.__class__, err, None)
sys.stderr.write("For full traceback, use -v\n")
sys.exit(1)


def generate_c_code(
args: argparse.Namespace,
) -> Tuple[Grammar, Parser, Tokenizer, ParserGenerator]:
Expand Down Expand Up @@ -116,6 +141,18 @@ def generate_python_code(
"--skip-actions", action="store_true", help="Suppress code emission for rule actions",
)

vm_parser = subparsers.add_parser("vm", help="Generate the new VM parser generator")
vm_parser.set_defaults(func=generate_vm_code)
vm_parser.add_argument("grammar_filename", help="Grammar description")
vm_parser.add_argument("tokens_filename", help="Tokens description")
vm_parser.add_argument(
"-o",
"--output",
metavar="OUT",
default="vmparse.h",
help="Where to write the generated parser",
)


def main() -> None:
from pegen.testutil import print_memstats
Expand Down
37 changes: 37 additions & 0 deletions Tools/peg_generator/pegen/build.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
from typing import Optional, Tuple, List, IO, Set, Dict

from pegen.c_generator import CParserGenerator
from pegen.vm_generator import VMParserGenerator
from pegen.grammar import Grammar
from pegen.grammar_parser import GeneratedParser as GrammarParser
from pegen.parser import Parser
Expand Down Expand Up @@ -181,6 +182,19 @@ def build_python_generator(
return gen


def build_vm_generator(
grammar: Grammar, grammar_file: str, tokens_file: str, output_file: str,
) -> ParserGenerator:
with open(tokens_file, "r") as tok_file:
all_tokens, exact_tok, non_exact_tok = generate_token_definitions(tok_file)
with open(output_file, "w") as file:
gen: ParserGenerator = VMParserGenerator(
grammar, all_tokens, exact_tok, non_exact_tok, file
)
gen.generate(grammar_file)
return gen


def build_c_parser_and_generator(
grammar_file: str,
tokens_file: str,
Expand Down Expand Up @@ -246,3 +260,26 @@ def build_python_parser_and_generator(
grammar, parser, tokenizer = build_parser(grammar_file, verbose_tokenizer, verbose_parser)
gen = build_python_generator(grammar, grammar_file, output_file, skip_actions=skip_actions,)
return grammar, parser, tokenizer, gen


def build_vm_parser_and_generator(
grammar_file: str,
tokens_file: str,
output_file: str,
verbose_tokenizer: bool = False,
verbose_parser: bool = False,
) -> Tuple[Grammar, Parser, Tokenizer, ParserGenerator]:
"""Generate rules, C parser, tokenizer, parser generator for a given grammar

Args:
grammar_file (string): Path for the grammar file
tokens_file (string): Path for the tokens file
output_file (string): Path for the output file
verbose_tokenizer (bool, optional): Whether to display additional output
when generating the tokenizer. Defaults to False.
verbose_parser (bool, optional): Whether to display additional output
when generating the parser. Defaults to False.
"""
grammar, parser, tokenizer = build_parser(grammar_file, verbose_tokenizer, verbose_parser)
gen = build_vm_generator(grammar, grammar_file, tokens_file, output_file)
return grammar, parser, tokenizer, gen
28 changes: 20 additions & 8 deletions Tools/peg_generator/pegen/vm_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,9 @@
import tokenize
from collections import defaultdict
from itertools import accumulate
from typing import Any, Dict, Iterator, List, Optional, Tuple, Union
from typing import Any, Dict, Iterator, List, Optional, Tuple, Set, IO, Text, Union

from pegen import grammar
from pegen.build import build_parser
from pegen.grammar import (
Alt,
Cut,
Expand Down Expand Up @@ -77,9 +76,14 @@ def __init__(self, name: str, startrulename: str):

class VMCallMakerVisitor(GrammarVisitor):
def __init__(
self, parser_generator: ParserGenerator,
self,
parser_generator: ParserGenerator,
exact_tokens: Dict[str, int],
non_exact_tokens: Set[str],
):
self.gen = parser_generator
self.exact_tokens = exact_tokens
self.non_exact_tokens = non_exact_tokens
self.cache: Dict[Any, Any] = {}
self.keyword_cache: Dict[str, int] = {}
self.soft_keyword_cache: List[str] = []
Expand All @@ -101,8 +105,8 @@ def visit_StringLeaf(self, node: StringLeaf) -> Tuple[str, str]:
return self.keyword_helper(val)
else:
return self.soft_keyword_helper(val)
tok_num: int = token.EXACT_TOKEN_TYPES[val] # type: ignore [attr-defined]
return "OP_TOKEN", token.tok_name[tok_num]
tok_num: int = self.exact_tokens[val]
return "OP_TOKEN", self.gen.tokens[tok_num]

def visit_Repeat0(self, node: Repeat0) -> str:
if node in self.cache:
Expand Down Expand Up @@ -149,12 +153,19 @@ def can_we_inline(node: Rhs) -> int:

class VMParserGenerator(ParserGenerator, GrammarVisitor):
def __init__(
self, grammar: grammar.Grammar,
self,
grammar: grammar.Grammar,
tokens: Dict[str, int],
exact_tokens: Dict[str, int],
non_exact_tokens: Set[str],
file: Optional[IO[Text]],
):
super().__init__(grammar, token.tok_name, sys.stdout)
super().__init__(grammar, tokens, file)

self.opcode_buffer: Optional[List[Opcode]] = None
self.callmakervisitor: VMCallMakerVisitor = VMCallMakerVisitor(self)
self.callmakervisitor: VMCallMakerVisitor = VMCallMakerVisitor(
self, exact_tokens, non_exact_tokens,
)

@contextlib.contextmanager
def set_opcode_buffer(self, buffer: List[Opcode]) -> Iterator[None]:
Expand Down Expand Up @@ -517,6 +528,7 @@ def visit_Gather(self, node: Gather) -> None:


def main() -> None:
from pegen.build import build_parser
filename = "../../Grammar/python.gram"
if sys.argv[1:]:
filename = sys.argv[1]
Expand Down