-
-
Notifications
You must be signed in to change notification settings - Fork 757
Accelerate Scheduler with Cython, PyPy, or C #854
Description
We are sometimes bound by the administrative of the distributed scheduler. The scheduler is Pure-Python, and a bundle of core data structures (lists, sets, dicts). It generally has an overhead of a few hundred microseconds per task. When graphs become large (hundreds of thousands) this overhead can become troublesome.
There are a few potential solutions:
- Use Cython in a few places
- Run the entire scheduler in PyPy (workers, clients, and user code can still be in CPython)
- Rewrite everything in C/Go/Julia/whatever
Generally efforts here have to be balanced with the fact that the scheduler will continue to change, and we're likely to continue writing it in Python, so any performance improvement would have the extra constraint that it can't add significant development inertia or friction.
Here are a couple of cProfile-able scripts that stress scheduler performance: https://gist.github.com/mrocklin/eb9ca64813f98946896ec646f0e4a43b