GH-115802: JIT "small" code for Windows#115964
Conversation
gvanrossum
left a comment
There was a problem hiding this comment.
I'm sorry, I realized halfway through the review that this touches upon several large blank spaces in my brain. So this is not a very useful review. I do think that _PyOptimize_Optimize() belongs in an internal header though.
| conversion_func conv_fn; | ||
| assert(oparg >= FVC_STR && oparg <= FVC_ASCII); | ||
| conv_fn = CONVERSION_FUNCTIONS[oparg]; | ||
| conv_fn = _PyEval_ConversionFuncs[oparg]; |
There was a problem hiding this comment.
We could make inst(CONVERT_VALUE replicate(4) inst(CONVERT_VALUE and rely on clang removing the lookup.
There was a problem hiding this comment.
Is there a way to do that where a fifth generic version isn't generated too? Otherwise, it doesn't help, so we might as well stay with this?
markshannon
left a comment
There was a problem hiding this comment.
Looks good, I've a few comments.
As a general comment regarding JIT development:
Don't be afraid to make small changes to bytecodes.c and other interpreter files to enable the JIT to work more smoothly.
| PyAPI_DATA(const binaryfunc) _PyEval_BinaryOps[]; | ||
| PyAPI_DATA(const conversion_func) _PyEval_ConversionFuncs[]; | ||
|
|
||
| PyAPI_FUNC(int) _PyEval_CheckExceptStarTypeValid(PyThreadState *tstate, PyObject* right); |
There was a problem hiding this comment.
Rather than exposing all these symbols, could we put the function pointers in a struct, and pass that to the JIT as an argument?
There was a problem hiding this comment.
The issue isn't finding symbols when jitting the code (jit.c is linked into the main executable and can find everything just fine).
The purpose of adding PyAPI_FUNC(...) and PyAPI_DATA(...) to the declarations is so they are declared with __declspec(dllimport) when compiling the templates. This makes Clang emit indirect memory accesses on Windows (similar to position-independent code on other platforms).
|
For 32-bit Windows, the small code model works fine out-of-the-box.
For 64-bit Windows, we don't have position-independent code with GOT relocations like Linux and macOS. Instead, we compile the stencils like a
Py_BUILD_CORE_MODULEbinary extension module, which creates a level of indirection similar to a GOT (__impl_Py_XXXholds the address ofPy_XXX). We process this just like the GOT on other platforms, and everything just works.This does require adding
PyAPI_FUNCandPyAPI_DATAto some symbols in internal headers to get the correct visibility, which is why this PR touches so many files.Looks like this makes the Windows JIT ~3-4% faster.