gh-139516: Fix lambda colon start format spec in f-string in tokenizer#139657
gh-139516: Fix lambda colon start format spec in f-string in tokenizer#139657pablogsal merged 4 commits intopython:mainfrom
Conversation
|
Ping @pablogsal. I added the test to |
|
Please add a rest for the f-string test file as well as this will be a semantic test that needs to hold true even if we change the tokenizer of some other implementation doesn't have the same tokenizer |
Lib/test/test_fstring.py
Outdated
| # gh-139516 | ||
| # The '\n' is explicit to ensure no trailing whitespace which would invalidate the test. | ||
| # Must use tokenize instead of compile so that source is parsed by line which exposes the bug. | ||
| list(tokenize.tokenize(BytesIO('''f"{f(a=lambda: 'à'\n)}"'''.encode()).readline)) |
There was a problem hiding this comment.
I am confused. Isn't it possible to trigger this in an exec or eval call? Or perhaps a file with an encoding?
Done. But I had to use Let me know if this test is good enough or if you want something else. |
yes, going via the tokenizer makes no sense here. The pourpose of what I asked is that alternative implementations will still run these tests files to check if they are compliant and we need to provide a way to run a file or exec some code and say "this is what we expect". You are triggering the bug via a specific aspect of CPython but I would prefer if we could trigger it end-to-end via a file. There are more tests executing python over files, check in |
|
Running error as script. |
|
LGTM Thank you very much @tom-pytel ! |
|
Thanks @tom-pytel for the PR, and @pablogsal for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13, 3.14. |
…kenizer (pythonGH-139657) (cherry picked from commit 539461d) Co-authored-by: Tomasz Pytel <tompytel@gmail.com>
|
Sorry, @tom-pytel and @pablogsal, I could not cleanly backport this to |
|
GH-139701 is a backport of this pull request to the 3.14 branch. |
@tom-pytel can you make the backport following the instructions? |
Sure, in a bit. |
|
GH-139726 is a backport of this pull request to the 3.13 branch. |
A
=followed by a:in an f-string expression could cause the tokenizer to erroneously think it was starting a format spec, leading to incorrect internal state and possible decode errors if this results in split unicode characters on copy. This PR fixes this by disallowing=to setin_debugstate unless it is encountered at the top level of an f-string expression.This problem exists back to py 3.13 and this PR can probably be backported easily enough.