gh-110309: prune empty constant in format specs#110320
gh-110309: prune empty constant in format specs#110320pablogsal merged 4 commits intopython:mainfrom
Conversation
|
🤖 New build scheduled with the buildbot fleet by @sunmy2019 for commit 19d1301 🤖 If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
|
Thanks a lot for the PR @sunmy2019 ! |
|
🤖 New build scheduled with the buildbot fleet by @sunmy2019 for commit 60b4666 🤖 If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
|
🤖 New build scheduled with the buildbot fleet by @sunmy2019 for commit 19c68e2 🤖 If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
|
I'm wondering why we cannot do this in the tokenizer directly. Wouldn't something like this work? cpython on main [!] via C v15.0.0-clang via 🐍 pyenv 3.11.3 (venv) took 8s
❯ git diff
diff --git a/Parser/tokenizer.c b/Parser/tokenizer.c
index 41d0d16a47..504bc9bed9 100644
--- a/Parser/tokenizer.c
+++ b/Parser/tokenizer.c
@@ -2639,6 +2639,12 @@ tok_get_fstring_mode(struct tok_state *tok, tokenizer_mode* current_tok, struct
tok->first_lineno = tok->lineno;
tok->starting_col_offset = tok->col_offset;
+ int in_format_spec = (
+ current_tok->last_expr_end != -1
+ &&
+ INSIDE_FSTRING_EXPR(current_tok)
+ );
+
// If we start with a bracket, we defer to the normal mode as there is nothing for us to tokenize
// before it.
int start_char = tok_nextc(tok);
@@ -2655,6 +2661,10 @@ tok_get_fstring_mode(struct tok_state *tok, tokenizer_mode* current_tok, struct
return tok_get_normal_mode(tok, current_tok, token);
}
}
+ else if (start_char == '}' && in_format_spec) {
+ tok_backup(tok, start_char);
+ return tok_get_normal_mode(tok, current_tok, token);
+ }
else {
tok_backup(tok, start_char);
}
@@ -2726,11 +2736,6 @@ tok_get_fstring_mode(struct tok_state *tok, tokenizer_mode* current_tok, struct
end_quote_size = 0;
}
- int in_format_spec = (
- current_tok->last_expr_end != -1
- &&
- INSIDE_FSTRING_EXPR(current_tok)
- );
if (c == '{') {
int peek = tok_nextc(tok);
if (peek != '{' || in_format_spec) {Didn't thoroughly test it, but one pass through the tests only shows one failure that's got to do with the error message when we have a lambda in the expression part. |
I have no objection to this. One thing to note: the same special handling of empty str constants is at the other parts of the code. If your idea is adopted, we'd better clean up those special handling. |
|
I'd propose we land this first and backport to 3.12. Then maybe we can optimize further on the 3.13 branch. |
Agreed. Let's land this and then iterate on it in |
|
I think we need to regenerate some files (run |
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
close #110309