Skip to content

Accelerate Scheduler with Cython, PyPy, or C #854

@mrocklin

Description

@mrocklin

We are sometimes bound by the administrative of the distributed scheduler. The scheduler is Pure-Python, and a bundle of core data structures (lists, sets, dicts). It generally has an overhead of a few hundred microseconds per task. When graphs become large (hundreds of thousands) this overhead can become troublesome.

There are a few potential solutions:

  1. Use Cython in a few places
  2. Run the entire scheduler in PyPy (workers, clients, and user code can still be in CPython)
  3. Rewrite everything in C/Go/Julia/whatever

Generally efforts here have to be balanced with the fact that the scheduler will continue to change, and we're likely to continue writing it in Python, so any performance improvement would have the extra constraint that it can't add significant development inertia or friction.

Here are a couple of cProfile-able scripts that stress scheduler performance: https://gist.github.com/mrocklin/eb9ca64813f98946896ec646f0e4a43b

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions