-
-
Notifications
You must be signed in to change notification settings - Fork 34.3k
Description
Crash report
What happened?
Semi-reliable(Interstingly, python in macos doesn't segfault for me, but docker/linux aarch64 does reliably) crash with this code:
from xml.parsers import expat
p = expat.ParserCreate(encoding="utf-16")
def start(name, attrs):
p.CharacterDataHandler = lambda data: p.Parse(data, 0)
p.StartElementHandler = start
data = b"\xff\xfe<\x00a\x00>\x00x\x00"
for i in range(len(data)):
try:
p.Parse(data[i:i+1], i == len(data) - 1)
except Exception:
passThis code /is/ doing some pretty naughty stuff, but the main problem seems to be that the handler is being set to re-enter the parser. The expat docs do say:
To state the obvious: the three parsing functions XML_Parse, XML_ParseBuffer and XML_GetBuffer must not be called from within a handler unless they operate on a separate parser instance, that is, one that did not call the handler. For example, it is OK to call the parsing functions from within an XML_ExternalEntityRefHandler, if they apply to the parser created by XML_ExternalEntityParserCreate.
and I see that the python expat parser code tracks in_callback:
Line 91 in 52c0186
| int in_callback; /* Is a callback active? */ |
So I wonder if we can avoid the segfault by preventing Parse calls when in_callback==true?
There's also a secondary issue in play here, that Parse() seems to call
XML_SetEncoding
Line 872 in 52c0186
| (void)XML_SetEncoding(self->itself, "utf-8"); |
Without the check outlined in the expat docs:
Set the encoding to be used by the parser. It is equivalent to passing a non-NULL encoding argument to the parser creation functions. It must not be called after XML_Parse or XML_ParseBuffer have been called on the given parser. Returns XML_STATUS_OK on success or XML_STATUS_ERROR on error.
This is almost definitely not going to cause issues unless the encoding is actually changing, (not that common) at which point the UB will rear its head as the internal state of the parser becomes inconsistent.
Dockerfile reproducer
ARG REPO=https://github.com/python/cpython.git
ARG BRANCH=main
FROM ubuntu:24.04
ARG REPO
ARG BRANCH
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y \
build-essential git pkg-config \
libssl-dev libbz2-dev libreadline-dev libsqlite3-dev \
liblzma-dev libffi-dev zlib1g-dev uuid-dev \
&& rm -rf /var/lib/apt/lists/*
RUN git clone --branch ${BRANCH} --depth 1 \
${REPO} /cpython
RUN cd /cpython && \
./configure --prefix=/python --without-ensurepip && \
make -j$(nproc) && \
make install
# ── TEST SCRIPT ──────────────────────────────────────────────────
RUN cat > /test.py << 'EOF'
from xml.parsers import expat
p = expat.ParserCreate(encoding="utf-16")
def start(name, attrs):
p.CharacterDataHandler = lambda data: p.Parse(data, 0)
p.StartElementHandler = start
data = b"\xff\xfe<\x00a\x00>\x00x\x00"
for i in range(len(data)):
try:
p.Parse(data[i:i+1], i == len(data) - 1)
except Exception:
pass
EOF
# ──────────────────────────────────────────────────────────────────────────────
CMD ["/bin/sh", "-c", "uname -m && /python/bin/python3 -VV && /python/bin/python3 /test.py"]
Gives on my pc:
docker run --rm -it expattest
aarch64
Python 3.15.0a7+ (heads/main:52c0186, Mar 19 2026, 13:06:19) [GCC 13.3.0]
52c01864c4778a351e5aa3584e86ed6fd212a5a4
Segmentation fault (core dumped)
CPython versions tested on:
CPython main branch
Operating systems tested on:
macOS
Output from running 'python -VV' on the command line:
Python 3.15.0a7+ (heads/main:52c0186, Mar 19 2026, 13:06:19) [GCC 13.3.0]