- 27 Mar, 2018 3 commits
-
-
Jason Madden authored
The good news is that current master is about 10-15% faster for sendall than 1.2.2 was (e.g., 301ws vs 256ms in Python 3.6). udp sendto is roughly unaffected (within the margins, based on the native performance). Moving the chunking implementation of sendall to Cython doesn't show any improvements (so that's not a bottleneck, at least in these benchmarks). The "bad" news is that both UDP and (especially) sendall perform much worse than native (native does about 47ms for sendall). This is probably related to the fact that we're doing everything in one process and one thread, and it is CPU bound; the native process can use 150% CPU or so, but the gevent version cannot. So the comparison is not directly meaningful. [skip ci]
-
Jason Madden authored
Compile the important hub operations that use Waiters with Cython
-
Jason Madden authored
-
- 26 Mar, 2018 4 commits
-
-
Jason Madden authored
-
Jason Madden authored
Since we've come this far, might as well keep taking advantage of the effort... There are substantial improvements on the micro benchmarks for things that wait and switch: | Benchmark | 27_hub_master2 | 27_hub_cython5 | |---------------------|----------------|------------------------------| | multiple wait ready | 1.96 us | 1.10 us: 1.77x faster (-44%) | | wait ready | 1.47 us | 897 ns: 1.64x faster (-39%) | | cancel wait | 2.93 us | 1.81 us: 1.61x faster (-38%) | | switch | 2.33 us | 1.94 us: 1.20x faster (-17%) | | Benchmark | 36_hub_master2 | 36_hub_cython6 | |---------------------|----------------|------------------------------| | multiple wait ready | 1.28 us | 820 ns: 1.56x faster (-36%) | | wait ready | 939 ns | 722 ns: 1.30x faster (-23%) | | cancel wait | 1.76 us | 1.37 us: 1.29x faster (-23%) | | switch | 1.60 us | 1.35 us: 1.18x faster (-16%) |
-
Jason Madden authored
Compile gevent.queue and gevent.hub.waiter with Cython
-
Jason Madden authored
-
- 25 Mar, 2018 5 commits
-
-
Jason Madden authored
-
Jason Madden authored
-
Jason Madden authored
This gives massive performance benefits to queues: | Benchmark | 27_queue_master | 27_queue_cython2 | |----------------------------------------|-----------------|------------------------------| | bench_unbounded_queue_noblock | 2.09 us | 622 ns: 3.37x faster (-70%) | | bench_bounded_queue_noblock | 2.55 us | 634 ns: 4.02x faster (-75%) | | bench_bounded_queue_block | 36.1 us | 7.29 us: 4.95x faster (-80%) | | bench_channel | 15.4 us | 6.40 us: 2.40x faster (-58%) | | bench_bounded_queue_block_hub | 13.6 us | 3.89 us: 3.48x faster (-71%) | | bench_channel_hub | 7.55 us | 3.38 us: 2.24x faster (-55%) | | bench_unbounded_priority_queue_noblock | 5.02 us | 3.18 us: 1.58x faster (-37%) | | bench_bounded_priority_queue_noblock | 5.48 us | 3.22 us: 1.70x faster (-41%) | In a "real" use caes (pool.imap) it shows up as a 10-20% improvement: | Benchmark | 36_pool_event5 | 36_pool_ubq_cython | |--------------------|----------------|-----------------------------| | imap_unordered_seq | 553 us | 461 us: 1.20x faster (-17%) | | imap_unordered_par | 301 us | 265 us: 1.14x faster (-12%) | | imap_seq | 587 us | 497 us: 1.18x faster (-15%) | | imap_par | 326 us | 275 us: 1.19x faster (-16%) | | spawn | 310 us | 284 us: 1.09x faster (-8%) | Not significant (3): map_seq; map_par; apply
-
Jason Madden authored
Timing as of this commit (macOS 10.13.3, MacBook Pro retina 15-inch, mid 2015, default loop impls): | Benchmark | 27_queue_master | 27pypy_queue_master | 36_queue_master | 37_queue_master | |----------------------------------------|-----------------|---------------------------------|------------------------------|------------------------------| | bench_unbounded_queue_noblock | 2.09 us | 10.8 ns: 193.75x faster (-99%) | 1.34 us: 1.56x faster (-36%) | 1.24 us: 1.69x faster (-41%) | | bench_bounded_queue_noblock | 2.55 us | 10.9 ns: 234.91x faster (-100%) | 1.67 us: 1.53x faster (-35%) | 1.55 us: 1.65x faster (-39%) | | bench_bounded_queue_block | 36.1 us | 2.28 us: 15.81x faster (-94%) | not significant | 12.9 us: 2.80x faster (-64%) | | bench_channel | 15.4 us | 1.91 us: 8.03x faster (-88%) | 9.96 us: 1.54x faster (-35%) | 8.17 us: 1.88x faster (-47%) | | bench_bounded_queue_block_hub | 13.6 us | 1.07 us: 12.64x faster (-92%) | 8.61 us: 1.57x faster (-36%) | 7.66 us: 1.77x faster (-44%) | | bench_channel_hub | 7.55 us | 760 ns: 9.94x faster (-90%) | 5.11 us: 1.48x faster (-32%) | 4.33 us: 1.75x faster (-43%) | | bench_unbounded_priority_queue_noblock | 5.02 us | 186 ns: 26.97x faster (-96%) | 1.63 us: 3.08x faster (-68%) | 1.60 us: 3.14x faster (-68%) | | bench_bounded_priority_queue_noblock | 5.48 us | 183 ns: 29.91x faster (-97%) | 1.98 us: 2.77x faster (-64%) | 1.79 us: 3.07x faster (-67%) | [skip ci]
-
Jason Madden authored
Compile IMap[Unordered] with Cython
-
- 24 Mar, 2018 8 commits
-
-
Jason Madden authored
-
Jason Madden authored
This gets us another 20-30% faster: | Benchmark | 27_pool_opts | 27_pool_cython2 | |--------------------|--------------|-----------------------------| | imap_unordered_seq | 897 us | 694 us: 1.29x faster (-23%) | | imap_unordered_par | 539 us | 363 us: 1.49x faster (-33%) | | imap_seq | 1.00 ms | 714 us: 1.41x faster (-29%) | | imap_par | 612 us | 404 us: 1.52x faster (-34%) | | map_seq | 382 us | 349 us: 1.09x faster (-9%) | | map_par | 267 us | 252 us: 1.06x faster (-6%) | | apply | 427 us | 406 us: 1.05x faster (-5%) | | spawn | 397 us | 360 us: 1.10x faster (-9%) |
-
Jason Madden authored
Optimizations for threadpool
-
Jason Madden authored
Here's the improvement for the greenlet pools: | Benchmark | 36_pool_master | 36_pool_opts | +--------------------+----------------+-----------------------------+ | imap_unordered_seq | 803 us | 686 us: 1.17x faster (-15%) | | imap_unordered_par | 445 us | 389 us: 1.14x faster (-13%) | | imap_seq | 793 us | 729 us: 1.09x faster (-8%) | | imap_par | 407 us | 398 us: 1.02x faster (-2%) | | map_seq | 715 us | 293 us: 2.44x faster (-59%) | | map_par | 388 us | 199 us: 1.96x faster (-49%) | Not significant (2): apply; spawn
-
Jason Madden authored
-
Jason Madden authored
Compared to the previous commit: | Benchmark | 36_threadpool_opt_PR | 36_threadpool_opt_cond10 | +--------------------+----------------------+-----------------------------+ | imap_unordered_seq | 1.06 ms | 1.02 ms: 1.04x faster (-4%) | | imap_unordered_par | 965 us | 928 us: 1.04x faster (-4%) | | imap_seq | 1.08 ms | 1.03 ms: 1.04x faster (-4%) | | map_seq | 785 us | 870 us: 1.11x slower (+11%) | | map_par | 656 us | 675 us: 1.03x slower (+3%) | | apply | 1.14 ms | 1.12 ms: 1.02x faster (-2%) |
-
Jason Madden authored
-
Jason Madden authored
Especially for map. None of the pools really need map to go through imap since they have to wait for everything anyway and they return results ordererd. | Benchmark | 36_threadpool_master | 36_threadpool_opt_cond5 | |--------------------|----------------------|-----------------------------| | imap_unordered_seq | 1.15 ms | 1.07 ms: 1.08x faster (-7%) | | imap_unordered_par | 1.02 ms | 950 us: 1.08x faster (-7%) | | imap_seq | 1.17 ms | 1.10 ms: 1.06x faster (-6%) | | imap_par | 1.07 ms | 1000 us: 1.07x faster (-7%) | | map_seq | 1.16 ms | 724 us: 1.60x faster (-37%) | | map_par | 1.07 ms | 646 us: 1.66x faster (-40%) | | apply | 1.22 ms | 1.14 ms: 1.07x faster (-7%) | | spawn | 1.21 ms | 1.13 ms: 1.07x faster (-7%) |
-
- 23 Mar, 2018 3 commits
-
-
Jason Madden authored
Add monitoring for memory usage, emitting events as it moves around the threshold.
-
Jason Madden authored
-
Jason Madden authored
-
- 22 Mar, 2018 5 commits
-
-
Jason Madden authored
Needs specific tests.
-
Jason Madden authored
Start having the monitor thread emit events for monitored conditions.
-
Jason Madden authored
This is the only reasonable path I could think of to enable memory monitoring, with gevent just being responsible for monitoring the memory and detecting overage conditions, while users plug in their own policies.
-
Jason Madden authored
I wasn't even using them to debug anymore, they were too verbose to be blanket enabled. Fixes #1146. Also tweak the leakcheck tests for threadpools.
-
Jason Madden authored
-
- 21 Mar, 2018 4 commits
-
-
Jason Madden authored
Fix libuv multiplex io watchers from polling too much when one event …
-
Jason Madden authored
Fix libuv multiplex io watchers from polling too much when one event has been completely turned off. Fixes #1144.
-
Jason Madden authored
Add support for a background monitoring thread to be associated with each hub.
-
Jason Madden authored
And give it unittest coverage. (Using it as an actual thread didn't produce coverage metrics because we use 'greenlet' concurrency.) Add a rudimentary scheduler so that functions that don't want to run as often as the minimum stay (very roughly) on their period.
-
- 20 Mar, 2018 2 commits
-
-
Jason Madden authored
-
Jason Madden authored
The threadpolo worker greenlet itself is meant to spend much of its time blocking, so don't report on it if it does. Only if we switch greenlets while running the user's code should we start looking for blocking.
-
- 19 Mar, 2018 3 commits
-
-
Jason Madden authored
-
Jason Madden authored
Right now, it is used to detect blocked event loops, and it is extensible by users. In the future there will be some more default monitoring options (e.g., memory). Refs #1021.
-
Jason Madden authored
Make more performance-sensitive places use _get_hub_noargs, since under Python 3 it is twice as fast to call as get_hub.
-
- 17 Mar, 2018 2 commits
-
-
Jason Madden authored
Introduce GEVENT_TRACK_GREENLET_TREE to disable greenlet tree features
-
Jason Madden authored
As a performance optimization for applications where spawning greenlets is critical. Plus some other optimizations to speed up spawning in the general case. CPython 3.6 with 1.2.2 vs these changes with tracking disabled: | Benchmark | 36_122_bench_spawn | 36config_bench_spawn_tree_off | +------------------------+--------------------+-------------------------------+ | eventlet spawn | 12.6 us | 12.2 us: 1.04x faster (-4%) | | eventlet sleep | 5.22 us | 4.97 us: 1.05x faster (-5%) | | gevent spawn | 4.27 us | 5.06 us: 1.19x slower (+19%) | | gevent sleep | 2.63 us | 1.25 us: 2.11x faster (-53%) | | geventpool spawn | 9.00 us | 8.31 us: 1.08x faster (-8%) | | geventpool sleep | 4.82 us | 2.83 us: 1.70x faster (-41%) | | geventraw spawn | 2.51 us | 2.81 us: 1.12x slower (+12%) | | geventraw sleep | 649 ns | 679 ns: 1.05x slower (+5%) | | geventpool join | 3.47 us | 1.42 us: 2.44x faster (-59%) | | geventpool spawn kwarg | 11.0 us | 8.95 us: 1.23x faster (-19%) | | geventraw spawn kwarg | 3.87 us | 4.20 us: 1.08x slower (+8%) | The differences compared to master are hard to quantify because the standard deviation ends up being more than 10% of the mean in many cases---and about a 10% improvement is what we typically see, so it goes back and forth.
-
- 14 Mar, 2018 1 commit
-
-
Jason Madden authored
-