gh-139516: Fix lambda colon start format spec in f-string in tokenizer by tom-pytel · Pull Request #139657 · python/cpython

tom-pytel · 2025-10-06T13:13:48Z

A = followed by a : in an f-string expression could cause the tokenizer to erroneously think it was starting a format spec, leading to incorrect internal state and possible decode errors if this results in split unicode characters on copy. This PR fixes this by disallowing = to set in_debug state unless it is encountered at the top level of an f-string expression.

This problem exists back to py 3.13 and this PR can probably be backported easily enough.

Issue: Parser gives UnicodeDecodeError on what should be good code #139516

tom-pytel · 2025-10-06T13:16:56Z

Ping @pablogsal. I added the test to test_tokenize instead of test_fstring as it seems to fit there better.

pablogsal · 2025-10-06T16:09:45Z

Please add a rest for the f-string test file as well as this will be a semantic test that needs to hold true even if we change the tokenizer of some other implementation doesn't have the same tokenizer

pablogsal · 2025-10-06T18:12:08Z

Lib/test/test_fstring.py

+        # gh-139516
+        # The '\n' is explicit to ensure no trailing whitespace which would invalidate the test.
+        # Must use tokenize instead of compile so that source is parsed by line which exposes the bug.
+        list(tokenize.tokenize(BytesIO('''f"{f(a=lambda: 'à'\n)}"'''.encode()).readline))


I am confused. Isn't it possible to trigger this in an exec or eval call? Or perhaps a file with an encoding?

See below VVV

tom-pytel · 2025-10-06T18:14:12Z

Please add a rest for the f-string test file as well as this will be a semantic test that needs to hold true even if we change the tokenizer of some other implementation doesn't have the same tokenizer

Done. But I had to use tokenize() because of an interesting quirk. The bug shows up with tokenize() or executing a python script directly with the bad source or typing it into the repl. It does not show up with compile() or ast.parse() or eval() or exec() or import .... The difference seems to be if the source is read line by line or not, in which case if the full string is available on parse then the tail end of the string past the NL is present to offset from on copy and the bug doesn't present.

Let me know if this test is good enough or if you want something else.

pablogsal · 2025-10-06T18:17:13Z

Let me know if this test is good enough or if you want something else.

yes, going via the tokenizer makes no sense here. The pourpose of what I asked is that alternative implementations will still run these tests files to check if they are compliant and we need to provide a way to run a file or exec some code and say "this is what we expect". You are triggering the bug via a specific aspect of CPython but I would prefer if we could trigger it end-to-end via a file. There are more tests executing python over files, check in test_syntax or test_grammar or test_compile.

tom-pytel · 2025-10-06T18:43:47Z

Running error as script.

pablogsal · 2025-10-07T16:27:53Z

LGTM

Thank you very much @tom-pytel !

miss-islington-app · 2025-10-07T16:28:19Z

Thanks @tom-pytel for the PR, and @pablogsal for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13, 3.14.
🐍🍒⛏🤖

…kenizer (pythonGH-139657) (cherry picked from commit 539461d) Co-authored-by: Tomasz Pytel <tompytel@gmail.com>

miss-islington-app · 2025-10-07T16:28:28Z

Sorry, @tom-pytel and @pablogsal, I could not cleanly backport this to 3.13 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 539461d9ec8e5322ead638f7be733fd196aa6c79 3.13

bedevere-app · 2025-10-07T16:28:33Z

GH-139701 is a backport of this pull request to the 3.14 branch.

…okenizer (GH-139657) (#139701) gh-139516: Fix lambda colon start format spec in f-string in tokenizer (GH-139657) (cherry picked from commit 539461d) Co-authored-by: Tomasz Pytel <tompytel@gmail.com>

pablogsal · 2025-10-07T17:35:36Z

Sorry, @tom-pytel and @pablogsal, I could not cleanly backport this to 3.13 due to a conflict. Please backport using cherry_picker on command line.
cherry_picker 539461d9ec8e5322ead638f7be733fd196aa6c79 3.13

@tom-pytel can you make the backport following the instructions?

tom-pytel · 2025-10-07T17:37:20Z

Sorry, @tom-pytel and @pablogsal, I could not cleanly backport this to 3.13 due to a conflict. Please backport using cherry_picker on command line.
cherry_picker 539461d9ec8e5322ead638f7be733fd196aa6c79 3.13
@tom-pytel can you make the backport following the instructions?

Sure, in a bit.

bedevere-app · 2025-10-07T20:00:27Z

GH-139726 is a backport of this pull request to the 3.13 branch.

#139726) [3.13] gh-139516: Fix lambda colon start format spec in f-string in tokenizer (GH-139657) (cherry picked from commit 539461d)

pythongh-139516: fix lambda colon start format spec in f-string

7cdd725

tom-pytel requested review from lysnikolaou and pablogsal as code owners October 6, 2025 13:13

bedevere-app bot mentioned this pull request Oct 6, 2025

Parser gives UnicodeDecodeError on what should be good code #139516

Closed

bedevere-app bot added the awaiting review label Oct 6, 2025

📜🤖 Added by blurb_it.

f6fbb7e

add test to test_fstring

a13aaea

pablogsal reviewed Oct 6, 2025

View reviewed changes

test_fstring test using script

e6d23e7

pablogsal approved these changes Oct 7, 2025

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting review labels Oct 7, 2025

pablogsal added needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes labels Oct 7, 2025

pablogsal merged commit 539461d into python:main Oct 7, 2025
53 checks passed

bedevere-app bot removed the awaiting merge label Oct 7, 2025

miss-islington-app bot assigned pablogsal Oct 7, 2025

bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label Oct 7, 2025

tom-pytel mentioned this pull request Oct 7, 2025

[3.13] gh-139516: Fix lambda colon start format spec in f-string in t… #139726

Merged

bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Oct 7, 2025

pablogsal pushed a commit that referenced this pull request Oct 7, 2025

[3.13] gh-139516: Fix lambda colon start format spec in f-string in t… (

b7bc977

#139726) [3.13] gh-139516: Fix lambda colon start format spec in f-string in tokenizer (GH-139657) (cherry picked from commit 539461d)

Uh oh!

Conversation

tom-pytel commented Oct 6, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tom-pytel commented Oct 6, 2025

Uh oh!

pablogsal commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pablogsal Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

tom-pytel Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

tom-pytel commented Oct 6, 2025

Uh oh!

pablogsal commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tom-pytel commented Oct 6, 2025

Uh oh!

pablogsal commented Oct 7, 2025

Uh oh!

Uh oh!

miss-islington-app bot commented Oct 7, 2025

Uh oh!

miss-islington-app bot commented Oct 7, 2025

Uh oh!

bedevere-app bot commented Oct 7, 2025

Uh oh!

pablogsal commented Oct 7, 2025

Uh oh!

tom-pytel commented Oct 7, 2025

Uh oh!

bedevere-app bot commented Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tom-pytel commented Oct 6, 2025 •

edited by bedevere-app bot

Loading

pablogsal commented Oct 6, 2025 •

edited

Loading

pablogsal commented Oct 6, 2025 •

edited

Loading