Skip to content

gh-98188: Fix EmailMessage.get_payload to decode data#127547

Merged
bitdancer merged 7 commits intopython:mainfrom
RanKKI:fix-issue-98188
Jan 6, 2025
Merged

gh-98188: Fix EmailMessage.get_payload to decode data#127547
bitdancer merged 7 commits intopython:mainfrom
RanKKI:fix-issue-98188

Conversation

@RanKKI
Copy link
Contributor

@RanKKI RanKKI commented Dec 3, 2024

Fix email.message.EmailMessage.get_payload failing to decode data when there is trailing whitespace and/or extra text following the <mechanism> of Content-Transfer-Encoding

>>> msg = email.message_from_string(textwrap.dedent("""\
... Content-Transfer-Encoding: base64 some text
... 
... SGVsbG8uIFRlc3Rpbmc=
... """), policy=policy.default)
>>> msg.get_payload(decode=True)
b'SGVsbG8uIFRlc3Rpbmc=\n'
>>> header = msg.get("content-transfer-encoding")
>>> print(f'"{header.cte}"')
"base64"
>>> print(f'"{str(header)}"')
"base64 some text"
>>> header.defects
(InvalidHeaderDefect('Extra text after content transfer encoding'),)

The header.defects attribute does have an InvalidHeaderDefect error, but header.cte is still a valid mechanism. Therefore, it is better to decode the content even if there is an error.

The fix in ietf-tools/mailarchive#3550 overrides the __str__ method to return the self.cte, which resolves this issue. However, it might have some backward compatibility issues. So, it is better to ensure str(header) still returns the original value while using header.cte to retrieve the parsed CTE in the get_payload(decode=True) method.

The output of msg.get_payload(decode=True) is b'Hello. Testing' after this fix

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants