Accelerate Scheduler with Cython, PyPy, or C

We are sometimes bound by the administrative of the distributed scheduler.  The scheduler is Pure-Python, and a bundle of core data structures (lists, sets, dicts).  It generally has an overhead of a few hundred microseconds per task.  When graphs become large (hundreds of thousands) this overhead can become troublesome.

There are a few potential solutions:

1.  Use Cython in a few places
2.  Run the entire scheduler in PyPy (workers, clients, and user code can still be in CPython)
3.  Rewrite everything in C/Go/Julia/whatever

Generally efforts here have to be balanced with the fact that the scheduler will continue to change, and we're likely to continue writing it in Python, so any performance improvement would have the extra constraint that it can't add significant development inertia or friction.

Here are a couple of cProfile-able scripts that stress scheduler performance: https://gist.github.com/mrocklin/eb9ca64813f98946896ec646f0e4a43b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Accelerate Scheduler with Cython, PyPy, or C #854

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Accelerate Scheduler with Cython, PyPy, or C #854

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions