Skip to content

gh-146455: Fix O(N²) in add_const() after constant folding moved to CFG#146456

Open
zSirius wants to merge 2 commits intopython:mainfrom
zSirius:fix-add-const-quadratic
Open

gh-146455: Fix O(N²) in add_const() after constant folding moved to CFG#146456
zSirius wants to merge 2 commits intopython:mainfrom
zSirius:fix-add-const-quadratic

Conversation

@zSirius
Copy link

@zSirius zSirius commented Mar 26, 2026

Fix O(N²) performance regression in add_const() introduced by moving constant folding from AST to CFG optimizer (gh-126835).

Problem

After gh-130769 moved tuple folding to CFG, fold_tuple_of_constants() calls add_const() once per inner tuple element. add_const() does a linear scan over the consts list to find the index, so N calls × O(N) scan = O(N²).

The same issue affects unary/binary op folding moved in gh-129550 (fold_const_unaryop, fold_const_binop).

perf profiling shows add_const taking 83.76% of CPU time when compiling 100K nested constant tuples.

Fix

Maintain an auxiliary _Py_hashtable_t (pointer → index mapping) alongside the consts list, providing O(1) constant lookup. The hashtable:

  • Uses _Py_hashtable_hash_ptr / _Py_hashtable_compare_direct — pure pointer ops, no Python object overhead
  • Is created at the start of _PyCfg_OptimizeCodeUnit() and destroyed after optimize_cfg(), before remove_unused_consts() reindexes the list
  • Works correctly because _PyCompile_ConstCacheMergeOne() already guarantees identity uniqueness (equal-valued constants share the same pointer)

All modified functions are static — no public API changes.

Performance (N=100K)

Scenario 3.11 3.14 (before) 3.14 (after fix)
Nested tuples ((f, f), ...) 1.24s 10.38s 1.48s
Unary neg (-1, -2, ..., -N) 0.22s 6.86s 1.12s
Binary add (0+1, 0+2, ..., 0+N) 0.27s 2.59s 1.40s

All existing tests pass: test_compile, test_peepholer, test_ast, test_dis.

…d to CFG

The add_const() function in flowgraph.c uses a linear search over the
consts list to find the index of a constant. After pythongh-126835 moved
constant folding from the AST optimizer to the CFG optimizer, this
function is now called N times for N inner tuple elements during
fold_tuple_of_constants(), resulting in O(N²) total time.

Fix by maintaining an auxiliary _Py_hashtable_t that maps object
pointers to their indices in the consts list, providing O(1) lookup.

For a file with 100,000 constant 2-tuples:
- Before: 10.38s (add_const occupies 83.76% of CPU time)
- After:  1.48s
@bedevere-app
Copy link

bedevere-app bot commented Mar 26, 2026

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@Eclips4 Eclips4 self-requested a review March 26, 2026 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant