[RFC] Deadline setting performance #1234

Tronic · 2019-10-06T17:47:05Z

Experimenting with optimised deadline setting. Initial commit increased HTTP server performance from 7700 req/s to 8800 req/s, but is also likely to break something and may cause latencies elsewhere.

The approach taken is to no longer keep deadlines in a sorted dict, but to only track the earliest deadline. The earliest deadline is recalculated by traversing all deadlined cancel scopes whenever the old deadline expires, which usually also involves cancelling a scope.

Not ready for merging, I intend to keep working on this. Posted here for comments.

codecov · 2019-10-06T21:01:18Z

Codecov Report

Merging #1234 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #1234      +/-   ##
==========================================
+ Coverage   99.57%   99.57%   +<.01%     
==========================================
  Files         106      106              
  Lines       12823    12832       +9     
  Branches      984      989       +5     
==========================================
+ Hits        12768    12777       +9     
  Misses         35       35              
  Partials       20       20

Impacted Files	Coverage Δ
trio/_core/_run.py	`99.73% <100%> (ø)`	⬆️

njsmith · 2019-10-06T22:35:39Z

I'm really excited you're working on this :-). And that speedup sounds impressive!

Like I mentioned in gitter, the main thing I want to watch out for here is to make sure we don't introduce any new pathological cases that people might hit if they're unlucky. The algorithm here does great if deadlines never expire; but every time a deadline does expire, then it takes O(total deadlines) time to re-scan the list of pending deadlines. So if you have a program with lots of deadlines that do expire, then it becomes O(total deadlines ** 2), which can get very slow.

Here's a little test program to demonstrate – it spawns 10k background tasks that sleep for a long time, plus 1 task that does lots of short sleeps:

import trio
import trio.testing
import time
import contextlib

print(trio.__file__)

LONG_SLEEP_COUNT = 10000
SHORT_SLEEP_COUNT = 10000

async def deep_sleeper(*, task_status=trio.TASK_STATUS_IGNORED):
    await trio.sleep(99999)

async def heavy_sleeper():
    for _ in range(SHORT_SLEEP_COUNT):
        await trio.sleep(1e-6)

async def main():
    async with trio.open_nursery() as nursery:
        print("Starting up")
        for _ in range(LONG_SLEEP_COUNT):
            nursery.start_soon(deep_sleeper)
        print("Running")
        start = time.process_time()
        await heavy_sleeper()
        end = time.process_time()
        print(f"Running portion took: {end - start:.2f} s CPU time")
        print("Shutting down")
        nursery.cancel_scope.cancel()

# Use MockClock to make things deterministic
clock = trio.testing.MockClock(autojump_threshold=0)
trio.run(main, clock=clock)

With Trio v0.12.1, I get:

~/trio$ time python /tmp/s.py        
/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/__init__.py
Starting up
Running
Running portion took: 1.44 s CPU time
Shutting down
python /tmp/s.py  2.61s user 0.05s system 99% cpu 2.666 total

With this branch, I get:

~/trio$ time PYTHONPATH=. python /tmp/s.py
/home/njs/trio/trio/__init__.py
Starting up
Running
Running portion took: 12.94 s CPU time
Shutting down
PYTHONPATH=. python /tmp/s.py  13.94s user 0.08s system 99% cpu 14.018 total

So it's about 5-10x slower, and with more background tasks the difference would get more extreme.

This case isn't necessarily common, but it's not totally implausible either. (E.g., imagine a sanic server holding a bunch of idle websocket connections that each have some far-distant timeout, plus some task somewhere in the same process that does short sleeps for some reason.) So I could be convinced that it's a good tradeoff to optimize for more common cases even if they make a case like this slower... but usually that should be like, a 20% slowdown, not a 10x slowdown :-).

njsmith · 2019-10-06T23:21:54Z

Hmm, that benchmark might somewhat overestimate the impact, because the autojump clock causes more lookups of the next-expiring-deadline than you would have otherwise. But it's only a factor of 2 at most.

Tronic · 2019-12-28T17:30:36Z

Closing in favour of a cleaner implementation in #1351 (which however does not use this method nor get any speedups yet).

Experiment with different deadline handling.

78a83d7

Tronic mentioned this pull request Oct 6, 2019

Introduce performance measurements #604

Open

Fix deadlocks that happened when using autojump clock

71f0c1c

njsmith mentioned this pull request Oct 6, 2019

Don't crash when CancelScopes are nested *very* deeply #1235

Merged

Tronic mentioned this pull request Dec 28, 2019

Deadline setting performance #1351

Closed

Tronic closed this Dec 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Deadline setting performance #1234

[RFC] Deadline setting performance #1234

Tronic commented Oct 6, 2019

codecov bot commented Oct 6, 2019 •

edited

Loading

njsmith commented Oct 6, 2019

njsmith commented Oct 6, 2019

Tronic commented Dec 28, 2019

[RFC] Deadline setting performance #1234

[RFC] Deadline setting performance #1234

Conversation

Tronic commented Oct 6, 2019

codecov bot commented Oct 6, 2019 • edited Loading

Codecov Report

njsmith commented Oct 6, 2019

njsmith commented Oct 6, 2019

Tronic commented Dec 28, 2019

codecov bot commented Oct 6, 2019 •

edited

Loading