Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace gevent dep. for parallelism #55

Open
heynemann opened this issue Nov 17, 2012 · 13 comments
Open

Replace gevent dep. for parallelism #55

heynemann opened this issue Nov 17, 2012 · 13 comments

Comments

@heynemann
Copy link
Owner

Gevent is too big of a dependency. We need a somewhat softer dependency.

Not sure what to use here, but we need something.


Edited by @Zearin (2013-01-28):
Tweaked phrasing (for clarity)

@Zearin
Copy link
Collaborator

Zearin commented Nov 17, 2012

Can you elaborate on what you mean by…

  • “too big of a dependency”, and
  • “softer dependency” … ?

@heynemann
Copy link
Owner Author

Installing gevent is hard. It has a dependency on libevent (or libev) depending on the version.

It is also not accomplishing what I wanted it for (parallel execution). If we just want to pretend to be parallel, we should
be using plain old Python queues with threads.

I'm thinking of a strategy using forks, but haven't quite got there.

Cheers,
Bernardo Heynemann


Edited by @Zearin (2013-01-28):
Tweaked markup

@Zearin
Copy link
Collaborator

Zearin commented Dec 24, 2012

I do know that parallel execution involves minimal blocking of I/O, and that it means fast, fast, fast!.

However, I’m afraid that I don’t have a thorough understanding of the problems with depending on libevent/libev, or the number and nature of tradeoffs between true parallelism and “fake” parallelism.

In the interest of helping PyVows towards this end, I’ve written up a quick list of candidates. I’m not qualified to judge whether any of these are a good fit; but hopefully the summaries will help you to decide whether any of them are worth further investigation.


Asynchronous Python Module Search: Round 1

async

https://github.com/gitpython-developers/async

Async aims to make writing asyncronous processing easier. It provides a task-graph with interdependent tasks that communicate using blocking channels, allowing to delay actual computations until items are requested. Tasks will automatically be distributed among 0 or more threads for the actual computation.

Even though the GIL effectively prevents true concurrency, operations which block, such as file IO, can be sped up with it already. In conjuction with custom c extensions which release the GIL, true concurrency can be obtained as well.

async_subprocess

http://pypi.python.org/pypi/async_subprocess/
Cross-platform wrapper around subprocess.Popen to provide an asynchronous version of Popen.communicate().

cogen

http://code.google.com/p/cogen/

cogen is a crossplatform library for network oriented, coroutine based programming using the enhanced generators from Python 2.5. The project aims to provide a simple straightforward programming model similar to threads but without all the problems and costs.

Features

  • wsgi server with coroutine extensions - enabling asynchronous wsgi apps in a regular wsgi stack
  • fast network multiplexing with epoll, kqueue, select, poll or io completion ports (on windows)
  • epoll/kqueue support via the wrappers in the python 2.6's stdlib or separate modules py-kqueue, py-epoll
  • iocp support via ctypes wrappers or pywin32
  • sendfile/TransmitFile support (the wsgi server also uses this for wsgi.file_wrapper)
  • timeouts for socket calls, signal waits etc
  • various mechanisms to work with (signals, joins, a Queue with the same features as the stdlib one) and some other stuff you can find in the docs :)

bluelet

https://github.com/sampsyo/bluelet

Bluelet is a simple, pure-Python solution for writing intelligible asynchronous socket applications. It uses PEP 342 coroutines to make concurrent I/O look and act like sequential programming.

In this way, it is similar to the Greenlet green-threads library and its associated packages Eventlet and Gevent. Bluelet has a simpler, 100% Python implementation that comes at the cost of flexibility and performance when compared to Greenlet-based solutions. However, it should be sufficient for many applications that don't need serious scalability; it can be thought of as a less-horrible alternative to asyncore or an asynchronous replacement for SocketServer (and more).

desync

https://github.com/bgilmore/desync

Decouple your asynchronous code from your event loop implementation.

The intended function of this framework is to allow developers to write async applications and components that will run without modification on a wide range of mainstream async/evented frameworks (Twisted and Tornado being the two initially targeted frameworks for support).

This is currently a pre-alpha experiment and shouldn't be used by anyone.

monocle

https://github.com/saucelabs/monocle

An async programming framework with a blocking look-alike syntax.

monocle straightens out event-driven code using Python's generators. It aims to be portable between event-driven I/O frameworks, and currently supports Twisted and Tornado.

It's for Python 2.5 and up; the syntax it uses isn't supported in older versions of Python. (Versions before 2.7 require the ordereddict module.)

teena

https://github.com/zacharyvoase/teena

Python ports of useful syscalls, using asynchronous I/O.

Teena aims to be a collection of ports of UNIX and Linux syscalls to pure Python, with an emphasis on performance and correctness. Windows support is not a primary concern—I’m initially targeting only POSIX-compliant operating systems. The library uses Tornado to do efficient asynchronous I/O.

The first version of this library will contain implementations of tee and splice which operate on files, sockets, and file descriptors. There’s also a Capture class which behaves like StringIO, but it has a fileno() and so can be used where a real file descriptor is needed.

@heynemann
Copy link
Owner Author

Thanks a lot, man! As soon as I get some spare time, I'll check these.

Cheers,

Bernardo Heynemann
Developer @ globo.com

On Mon, Dec 24, 2012 at 3:35 PM, Tony notifications@github.com wrote:

I do know that parallel execution involves minimal blocking of I/O, and
that it means fast, fast, fast!.

However, I’m afraid that I don’t have a thorough understanding of the
problems with depending on libevent/libev, or the number and nature of
tradeoffs between true parallelism and “fake” parallelism.

In the interest of helping PyVows towards this end, I’ve written up a
quick list of candidates. I’m not qualified to judge whether any of these
are a good fit; but hopefully the summaries will help you to decide whether

any of them are worth further investigation.

Asynchronous Python Module Search: Round 1 async

https://github.com/gitpython-developers/async

Async aims to make writing asyncronous processing easier. It provides a
task-graph with interdependent tasks that communicate using blocking
channels, allowing to delay actual computations until items are requested.
Tasks will automatically be distributed among 0 or more threads for the
actual computation.

Even though the GIL effectively prevents true concurrency, operations
which block, such as file IO, can be sped up with it already. In conjuction
with custom c extensions which release the GIL, true concurrency can be
obtained as well.
async_subprocess

http://pypi.python.org/pypi/async_subprocess/
Cross-platform wrapper around subprocess.Popen to provide an asynchronous
version of Popen.communicate().
cogen

http://code.google.com/p/cogen/

cogen is a crossplatform library for network oriented, coroutine based
programming using the enhanced generators from Python 2.5. The project aims
to provide a simple straightforward programming model similar to threads
but without all the problems and costs.
Features

  • wsgi server with coroutine extensions - enabling asynchronous wsgi
    apps in a regular wsgi stack
  • fast network multiplexing with epoll, kqueue, select, poll or io
    completion ports (on windows)
  • epoll/kqueue support via the wrappers in the python 2.6's stdlib or
    separate modules py-kqueue, py-epoll
  • iocp support via ctypes wrappers or pywin32
  • sendfile/TransmitFile support (the wsgi server also uses this for
    wsgi.file_wrapper)
  • timeouts for socket calls, signal waits etc
  • various mechanisms to work with (signals, joins, a Queue with the
    same features as the stdlib one) and some other stuff you can find in the
    docs :)

bluelet

https://github.com/sampsyo/bluelet

Bluelet is a simple, pure-Python solution for writing intelligible
asynchronous socket applications. It uses PEP 342 coroutines to make
concurrent I/O look and act like sequential programming.

In this way, it is similar to the Greenlet green-threads library and its
associated packages Eventlet and Gevent. Bluelet has a simpler, 100% Python
implementation that comes at the cost of flexibility and performance when
compared to Greenlet-based solutions. However, it should be sufficient for
many applications that don't need serious scalability; it can be thought of
as a less-horrible alternative to asyncore or an asynchronous replacement
for SocketServer (and more).
desync

https://github.com/bgilmore/desync

Decouple your asynchronous code from your event loop implementation.

The intended function of this framework is to allow developers to write
async applications and components that will run without modification on a
wide range of mainstream async/evented frameworks (Twisted and Tornado
being the two initially targeted frameworks for support).

This is currently a pre-alpha experiment and shouldn't be used by anyone.
monocle

https://github.com/saucelabs/monocle

An async programming framework with a blocking look-alike syntax.

monocle straightens out event-driven code using Python's generators. It
aims to be portable between event-driven I/O frameworks, and currently
supports Twisted and Tornado.

It's for Python 2.5 and up; the syntax it uses isn't supported in older
versions of Python. (Versions before 2.7 require the ordereddict module.)
teena

https://github.com/zacharyvoase/teena

Python ports of useful syscalls, using asynchronous I/O.

Teena aims to be a collection of ports of UNIX and Linux syscalls to pure
Python, with an emphasis on performance and correctness. Windows support is
not a primary concern—I’m initially targeting only POSIX-compliant
operating systems. The library uses Tornado to do efficient asynchronous
I/O.

The first version of this library will contain implementations of tee and
splice which operate on files, sockets, and file descriptors. There’s also
a Capture class which behaves like StringIO, but it has a fileno() and so
can be used where a real file descriptor is needed.


Reply to this email directly or view it on GitHubhttps://github.com//issues/55#issuecomment-11663821.

@Zearin Zearin mentioned this issue Jan 6, 2013
2 tasks
@heynemann
Copy link
Owner Author

I'm thinking about trying a simple threaded approach. See how it goes. Even knowing that python does not have true paralellism.

@Zearin
Copy link
Collaborator

Zearin commented Jan 28, 2013

Even knowing that python does not have true paralellism.

I KNOW!!!

Considering all the attention Node.js is getting these days, you’d think Python would at least do something to overcome the limitations of the GIL (Global Interpreter Lock).

Don’t get me wrong. I love what Node.js is doing. I really admire the nonblocking I/O built right into the language itself.

I’m just not a fan of JavaScript’s syntax. Node.js has done a lot to make it more palatable, but it’s still JavaScript at heart. (I’ve toyed with CoffeeScript. I like it. A lot. But, I still like it less than Python.)

I really really really, really want to have Python’s syntax with Node.js’s über-async performance.

Sigh. Maybe Python is finally succumbing to old age. ☹ The future is asynchronous and parallel.

(At least PyVows still outperforms other testing by lots. :P)

@Zearin
Copy link
Collaborator

Zearin commented Feb 3, 2013

@heynemann:

The futures module looks promising!

Description:

Backport of the concurrent.futures package from Python 3.2

For documentation, it simply refers you to the official Python 3 docs. I take that as a good sign; they are aiming to make the backport so true-to-the-original that it doesn’t require its own list of caveats and warnings about its feature set.

Does this have Gevent-replacing potential?


Update

Oops! Look like it actually does have its own documentation.

Still, it looks promising…

@heynemann
Copy link
Owner Author

It does look good! I'll test it as soon as I get some time.

If you want to go ahead and try to use it, I'd be happy to evaluate a pool
request.

Cheers,
Bernardo Heynemann

Bernardo Heynemann
Developer @ globo.com

On Sun, Feb 3, 2013 at 2:42 PM, Tony notifications@github.com wrote:

@heynemann https://github.com/heynemann:

The futures module http://pypi.python.org/pypi/futures/2.1.3 looks
promising!

Description:

Backport of the concurrent.futures package from Python 3.2

For documentation, it simply refers you to the official Python 3 docs. I
take that as a good sign; they are aiming to make the backport so
true-to-the-original that it doesn’t require its own list of caveats and
warnings about its feature set.

Does this have Gevent-replacing potential?


Reply to this email directly or view it on GitHubhttps://github.com//issues/55#issuecomment-13049475.

@pplante
Copy link

pplante commented Apr 20, 2013

I am curious why gevent was even chosen for test parallelization. I find it to be a clumsy dependency that leads to some really difficult to debug issues.

For instance we sunk a few hours the other week trying to figure out why a test was broken only sometimes. We ended up finding its because we forgot to tell our Vows.Context subclass to mark a method as ignored. The error we were experiencing was something so buried in the spaghetti gevent stuff that it literally took 2 hours to track down. When we finally found the solution its only because we were using the spaghetti method of debugging (throw something at the wall until it sticks).

We really love the test organization that pyVows offers since it closely mirrors our CoffeeScript/JavaScript test suites. However this dependency choice makes using pyVows a headache at times. How difficult would it be to rip out gevent, or make an optional non-gevent test runner that runs each context and test sequentially? I think you came up with a great way to handle testing in Python without crazy bytecode or VM hacks, so I really want to see pyVows development furthered.

Thanks!

@Zearin
Copy link
Collaborator

Zearin commented Apr 21, 2013

Well, check out the early parts of this thread. Although I don't know why Gevent was chosen in the first place (aside from parallel execution), it's not here to stay.

Actually, I’ve been trying to refactor PyVows to work with concurrent.futures (which is Python 2–3 compatible). The only reason I haven’t yet is that is that I’m completely inexperienced in this kind of programming. ☺ That’s not stopping me from trying, but it does slow me down.

Near as I can tell, using concurrent.futures is a good choice, but it will require a significant reorganization of the code in the runner module. From articles I’ve read and videos I’ve watched in order to learn more about this subject, some of runner’s giant methods that need to be broken down into smaller bits of execution if we’re going to keep execution time fast.

If I’ve understood everything correctly, Gevent uses coroutines, which is an entirely different concurrency strategy than threads. I think that’s why PyVows performs so fast with Gevent with the execution code structured as is, whereas the same structure would be slow/broken using threads.

If you have experience concurrent.futures, I’d love to learn more about it. My early attempts to use it resulted in the runtime only executing a small subset of the tests (i.e. it said “I’m done testing!” after only a few tests had completed…way too early). After that, I spent a couple days reading and learning more about this stuff, but that’s when I realized I needed to use callbacks…

Which requires all that breaking down into smaller bits of execution and stuff. ☺

Fear not. Gevent is not here to stay.

@heynemann
Copy link
Owner Author

I think a good solution would be for us to implement different runners,
instead of messing with the gevent one. That way we can do "best-case"
runners:

  • Is gevent available? Use it
  • Is futures available? Use it
  • Use sequential

What do you guys think?

Cheers,

Bernardo Heynemann
Developer @ globo.com

On Sun, Apr 21, 2013 at 11:31 AM, Tony notifications@github.com wrote:

Well, check out the early parts of this thread<#13e2d015ddbcb5d6_issuecomment-10476505>.
Although I don't know why Gevent was chosen in the first place (aside
from parallel execution), it's not here to stay.

Actually, I’ve been trying to refactor PyVows to work with
concurrent.futures (which is Python 2–3 compatible). The only reason I
haven’t yet is that is that I’m completely inexperienced in this kind of
programming. ☺ That’s not stopping me from trying, but it does slow me
down.

Near as I can tell, using concurrent.futures is a good choice, but it
will require a significant reorganization of the code in the runnermodule. From articles I’ve read and videos I’ve watched in order to learn
more about this subject, some of runner’s giant methods that need to be
broken down into smaller bits of execution if we’re going to keep execution
time fast.

If I’ve understood everything correctly, Gevent uses coroutines, which is
an entirely different concurrency strategy than threads. I think that’s
why PyVows performs so fast with Gevent with the execution code structured
as is, whereas the same structure would be slow/broken using threads.

If you have experience concurrent.futures, I’d love to learn more about
it. My early attempts to use it resulted in the runtime only executing a
small subset of the tests (i.e. it said “I’m done testing!” after only a
few tests had completed…way too early). After that, I spent a couple days
reading and learning more about this stuff, but that’s when I realized I
needed to use callbacks…

Which requires all that breaking down into smaller bits of execution and
stuff. ☺

Fear not. Gevent is not here to stay.


Reply to this email directly or view it on GitHubhttps://github.com//issues/55#issuecomment-16726604
.

@pplante
Copy link

pplante commented Apr 21, 2013

That sounds perfect!

@Zearin
Copy link
Collaborator

Zearin commented Apr 22, 2013

I think a good solution would be for us to implement different runners,
instead of messing with the gevent one. That way we can do "best-case"
runners:

  • Is gevent available? Use it
  • Is futures available? Use it
  • Use sequential

What do you guys think?

Agreed!

(That thought had actually occurred to me…but I’m not having a lot of success, so I decided to keep my big mouth shut. ☺)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants