Skip to content

gh-72680: Fix false positives when using zipfile.is_zipfile()#134250

Merged
gpshead merged 4 commits intopython:mainfrom
thatch:is_zipfile_verify
May 21, 2025
Merged

gh-72680: Fix false positives when using zipfile.is_zipfile()#134250
gpshead merged 4 commits intopython:mainfrom
thatch:is_zipfile_verify

Conversation

@thatch
Copy link
Contributor

@thatch thatch commented May 19, 2025

Rebased #5053 and fixed the impl to pass tests. Original PR and description below by @jjolly

Fix zipfile validation issue by ... providing more validation!

Originally, zipfile.is_zipfile() only checked the End Central Directory
signature. If the signature could be found in the last 64k of the file,
success! This produced false positives on any file with 'PK\x05\x06' in the
last 64k of the file - including PDFs and PNGs.

This is now corrected by actually validating the Central Directory location
and size based on the information provided by the End Central Directory
along with verifying the Central Directory signature of the first entry.

This should be sufficient for the vast number of zipfiles, but more could be
done to absolutely validate the zipfile content - such as validating all
local file headers and Central Directory entries.

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants