Skip to content
Merged
Changes from 1 commit
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
a44faf3
Add strace helper for tracing system calls made by python running spe…
cmaloney Jun 28, 2024
a462039
Update test_subprocess to use strace_helper
cmaloney Jun 28, 2024
2ab832d
Add test to FileIO that validates set of syscalls
cmaloney Jun 29, 2024
ef298f2
Move from assert to .assertEqual
cmaloney Jun 29, 2024
283a077
Allow libc to use different fstat variants
cmaloney Jun 29, 2024
3b6c094
Exit early if strace exited non-zero
cmaloney Jun 29, 2024
97b294f
Add myself to ACKS
cmaloney Jun 29, 2024
e5bdc6b
Add tests around pathilb read_*() behavior
cmaloney Jun 29, 2024
397cd9e
Remove subprocess._USE_VFORK strace test
cmaloney Jun 29, 2024
e88d414
Merge remote-tracking branch 'origin/main' into cmaloney/systrace_hel…
cmaloney Jul 4, 2024
d99157f
Update call sequence after gh-120755
cmaloney Jul 4, 2024
5664558
Add specific python bug links
cmaloney Jul 9, 2024
736d5bc
Reduce annotations, stay bytes longer, make raw_events non-private
cmaloney Jul 9, 2024
47ed7fe
Move _strace_working checks to all be in requires_strace
cmaloney Jul 9, 2024
55d1cec
formatting fixes, reduce annotations further
cmaloney Jul 10, 2024
6fe0961
Small cleanups from self review
cmaloney Jul 10, 2024
943b07d
Merge branch 'main' into cmaloney/systrace_helper_wip
cmaloney Jul 10, 2024
2ea2bc8
Adjust test cases to match more general system call shape
cmaloney Jul 10, 2024
cdf449a
raw_events -> event_bytes
cmaloney Jul 10, 2024
c44bca6
Add bits I forgot to commit
cmaloney Jul 10, 2024
0210d16
Merge remote-tracking branch 'main' into cmaloney/systrace_helper_wip
cmaloney Aug 1, 2024
a1b4028
Merge branch 'main' into cmaloney/systrace_helper_wip
cmaloney Aug 18, 2024
0c6ebe6
Switch to functools.cache, simplifying the code
cmaloney Aug 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Add strace helper for tracing system calls made by python running spe…
…cific code
  • Loading branch information
cmaloney committed Jun 29, 2024
commit a44faf3d7cd870fdbc55faf9906d0ed7c1f5673b
170 changes: 170 additions & 0 deletions Lib/test/support/strace_helper.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
import re
import sys
import textwrap
import unittest
from dataclasses import dataclass
from test import support
from test.support.script_helper import run_python_until_end
from typing import Dict, List
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI - modern typing does not need these two. use dict[] and list[] directly. (but read below, we aren't ready to accept non-trivial annotations in test.support)


_strace_binary = "/usr/bin/strace"
_syscall_regex = re.compile(
r"(?P<syscall>[^(]*)\((?P<args>[^)]*)\)\s*[=]\s*(?P<returncode>.+)")
_returncode_regex = re.compile(
r"\+\+\+ exited with (?P<returncode>\d+) \+\+\+")

# Cached value of whether or not there is a compatible strace binary
_strace_working: bool | None = None


@dataclass
class StraceEvent:
syscall: str
args: List[str]
returncode: str


@dataclass
class StraceResult:
strace_returncode: int
python_returncode: int
_raw_events: str
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this gets used, so perhaps don't make it private. Add a comment describing what exactly it is for. for the most part it looks like it'll be the same as stderr.

stdout: str
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use str for child process output. The process emits binary output and no codec is declared or guaranteed.

It is okay to decode names once parsed out for high level use in StraceEvent.syscall, but the process itself shouldn't be expected to produce output matching any encoding.

stderr: str

def events(self) -> List[StraceEvent]:
"""Extract the call list information from _raw_events"""
matches = [
_syscall_regex.match(event)
for event in self._raw_events.splitlines()
]
return [
StraceEvent(match["syscall"],
[arg.strip() for arg in (match["args"].split(","))],
match["returncode"]) for match in matches if match
]

def sections(self) -> Dict[str:List[StraceEvent]]:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a valid annotation. Use -> dict[str, list[StraceEvent]]... But it'd better to just not annotate this file at all. Mention what gets returned in doc strings instead.

BUT This is a good example of why we generally do not have type annotations anywhere within CPython. We are not setup to run multiple type checkers over the stdlib to validate them. You can see what little is configured in .github/workflows/mypy.yml - but do not try to use this PR to enable it for test.support. it isn't ready.

PEP 649 being implemented is the only reason this was not a Python syntax error.

"""Find all "MARK <X>" writes and use them to make groups of events.

This is useful to avoid variable / overhead strace events, like that
at interpreter startup, so a test can just check does the small case
under study work."""
current_section = "__startup"
sections = {current_section: []}
for event in self.events():
if event.syscall == 'write' and len(
event.args) > 2 and event.args[1].startswith("\"MARK "):
# Found a new section, don't include the write in the section
# but all events until next mark should be in that section
current_section = event.args[1].split(
" ", 1)[1].removesuffix('\\n"')
if current_section not in sections:
sections[current_section] = list()
else:
sections[current_section].append(event)

return sections


@support.requires_subprocess()
def strace_python(code: str,
strace_flags: List[str],
check: bool = True) -> StraceResult:
"""Run strace and return the trace.

Sets strace_returncode and python_returncode to `-1` on error
"""
res = None

def _make_error(reason, details):
return StraceResult(
strace_returncode=-1,
python_returncode=-1,
_raw_events=f"error({reason},details={details}) = -1",
stdout=res.out if res else "",
stderr=res.err if res else "")

# Run strace, and get out the raw text
try:
res, cmd_line = run_python_until_end(
"-c",
textwrap.dedent(code),
__run_using_command=[_strace_binary] + strace_flags)
except OSError as err:
return _make_error("Caught OSError", err)

# Get out program returncode
decoded = res.err.decode().strip()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) Always specify a codec on decode. BUT... (2) Instead of decoding here, it'd be better to leave output as bytes. There will be processes that do not emit valid encoded data.


output = decoded.rsplit("\n", 1)
if len(output) != 2:
return _make_error("Expected strace events and exit code line",
decoded[-50:])

returncode_match = _returncode_regex.match(output[1])
if not returncode_match:
return _make_error("Expected to find returncode in last line.",
output[1][:50])

python_returncode = int(returncode_match["returncode"])
if check and (res.rc or python_returncode):
res.fail(cmd_line)

return StraceResult(strace_returncode=res.rc,
python_returncode=python_returncode,
_raw_events=output[0],
stdout=res.out,
stderr=res.err)


def _get_events(code: str, strace_flags: List[str], prelude: str,
cleanup: str) -> List[StraceEvent]:
# NOTE: The flush is currently required to prevent the prints from getting
# buffered and done all at once at exit
prelude = textwrap.dedent(prelude)
code = textwrap.dedent(code)
cleanup = textwrap.dedent(cleanup)
to_run = f"""
print("MARK prelude", flush=True)
{prelude}
print("MARK code", flush=True)
{code}
print("MARK cleanup", flush=True)
{cleanup}
print("MARK __shutdown", flush=True)
"""
trace = strace_python(to_run, strace_flags)
all_sections = trace.sections()
return all_sections['code']


def get_syscalls(code: str,
strace_flags: List[str],
prelude: str = "",
cleanup: str = "") -> List[str]:
"""Get the syscalls which a given chunk of python code generates"""
events = _get_events(code, strace_flags, prelude=prelude, cleanup=cleanup)
return [ev.syscall for ev in events]


def _can_strace():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i would just stick a functools.cache on this

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to functools.cache, definitely simplified a bit. Initially had been worried about adding a lot more dependencies to the I/O tests, but also realized when things like read are broken, interpreter build breaks (can't load modules, etc.)

res = strace_python("import sys; sys.exit(0)", [], check=False)
assert res.events(), "Should have parsed multiple calls"

global _strace_working
_strace_working = res.strace_returncode == 0 and res.python_returncode == 0


def requires_strace():
if sys.platform != "linux":
return unittest.skip("Linux only, requires strace.")
# Moderately expensive (spawns a subprocess), so share results when possible.
if _strace_working is None:
_can_strace()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move the assignment to the global _strace_working into this function so that the is None test and code assigning it to something other than that all lives together in one scope. (have _can_strace() return a boolean to be assigned perhaps)


assert _strace_working is not None, "Should have been set by _can_strace"
return unittest.skipUnless(_strace_working, "Requires working strace")


__all__ = ["requires_strace", "strace_python", "StraceResult", "StraceEvent"]