Discussion: the urllib3 bleach-spike branch #1

njsmith · 2017-07-28T07:03:21Z

Edit: current status

There's a lot of discussion and updates below, making this issue hard to read if you just want to know what the current status is. So tl;dr:

This is a project to add support to urllib3 (and ultimately the packages that depend on it like requests, boto, etc.) for async mode with multiple IO library backends (currently trio + twisted, and it's easy to add more), while keeping the original sync API mostly intact. For a big picture overview and demo see: urllib3/urllib3#1323

We're definitely looking for help to move this forward. If you want to help see below for some summary of what needs to be done, and you can post in this thread or join us in our chat if you want more information.

Things that are working:

Basic HTTP requests using sync mode, trio mode, twisted mode (again, see RFC: what do you think of this branch adding async support to urllib3? urllib3/urllib3#1323 for a demo, or check out the demo/ directory)
The test suite runs, though lots of tests are temporarily marked xfail or skip.

Things that need to be done (incomplete list)

~~Get the tests running on travis, so we can make sure new PRs aren't breaking things~~
Review all the tests marked xfail or skip and one-by-one make them pass (or delete them if they no longer apply). This will require fixing a bunch of things (timeouts, https, etc.)
~~Make it so trio and twisted aren't import-time requirements.~~
~~Take the syncifier code in setup.py and factor it out into its own project.~~ Also it would be nice to get it running under python 2.
Get the tests running on python 2 (they should work if you can get things set up, but we currently can't build on python 2 because of the previous bullet point, so testing is awkward)
~~Get tox working (for now, use runtests.py instead)~~
Maybe make close methods async? (see Discussion: the urllib3 bleach-spike branch #1 (comment) and replies and links therein)
Currently we're only testing the synchronous mode; we need to add tests for the different async backends. This probably requires making parts of the test suite async (with some kind of parametrization to test against multiple backends) and then using the syncifier script to generate a synchronous version of the tests to test the synchronous mode.
Add tests for new features, in particular servers that send an early response.
Eventually: we'll need to update the docs.
Perhaps set up some kind of more useful way of tracking this todo list

Getting started

So you want to help? Awesome!

Check out the bleach-spike branch from the python-trio/urllib3 repo
You should be able to run the tests with python runtests.py (this currently requires python 3.5+)
You should see lots and lots of output, and eventually get a message from pytest that looks like "683 passed, 241 skipped, 31 xfailed, 20 warnings".
Hack on the code to make the "passed" number go up, and the other numbers go down. Or start working on any of the other suggestions up above.

The below is the original post that started off this thread.

Original post

(See urllib3/urllib3#1228 for context)

The bleach-spike branch in this repo is based off of @Lukasa's v2 branch in the main urllib3 repo. The idea is to experiment with converting urllib3 to support three backends/public APIs: plain sockets (like it does now, using select/poll internally), trio (with an async-based public API), twisted (with an async-based public API). [These were chosen as a representative sampling of approaches, i.e. if it works for these it'll probably also work for asyncio/tornado/curio/..., modulo specific limitations like asyncio's lack of starttls.] I'm putting some notes here because @merrellb asked if he could help, so :-).

Here's the diff link: urllib3/urllib3@v2...python-trio:bleach-spike

The overall goal here is not to immediately create a polished product, but just to test the viability of this approach, so a bit of sloppiness is OK; the goal is to get a proof-of-concept as efficiently as possible.

The basic idea is: we define a common "backend API" that provides a common set of socket operations, and acts as the abstraction boundary between urllib3 and our different network backends. notes.org has some details on what my current draft of the API looks like. urllib3/backends/ has the actual backend implementations. The backend uses async/await, but the plain socket implementation of it never actually takes advantage of this -- its methods never yield. The trio and twisted backends of course assume that they're running in a trio or twisted async context.

Done:

Define a plausible backend API that I think provides everything we need for HTTP/1 support
Implemented the three backends, but there's probably a bunch of terrible bugs because I haven't tried running the code at all
Rewrote sync_connection.py in terms of the backend API, but again haven't actually tried running it. I think it's much more readable and reliable (once we fix the silly bugs) than the version in v2. (The filename is now something of a misnomer, but I figured I'd leave it alone for now to make the diff more readable. See above re: proof-of-concept.)

Todo:

Thread through these changes to the rest of urllib3; in particular:
- we need to pass the backend object through from the connection pool through to the individual connection objects
- we need to propagate async/await markers out from sync_connection.py out to all their callers
Figure out how connection pooling interactions (return connection to pool, retrieve connection from pool) are supposed to work... that's the part of sync_connection.py that I didn't really understand, so I put in some best-guesses but it's probably wrong. Probably there are other interactions with the rest of the world that aren't quite settled as well.
Attempt a basic GET request on all the backends, and then iterate on fixing all the syntax errors etc. until it works
- https exercises unique paths
- proxy support exercises unique paths

The trio code currently assumes the existence of trio.open_tcp_stream, which, uh, doesn't exist yet. I realized today though that contrary to previous reports we don't need to robustify SSLStream to make this work (i.e., python-trio/trio#198 is not actually a blocker). (The reason is that I changed the backend interface here a little bit in a way that both simplified the code here and also removed this problem. Specifically, before I was trying to use read_and_write_for_a_while to send the request and then switch to receive_some to read the response headers; now I just use a single call to read_and_write_for_a_while to do both.)

Further items that maybe aren't on the critical path:

Implement a "bleaching script" that uses the tokenize module to automatically transform the library back into a synchronous library by deleting async and await tokens, plus replacing __aiter__ → __iter__, __anext__ → __next__
Go through all the todo items marked XX in the source

CC: @merrellb

The text was updated successfully, but these errors were encountered:

brettcannon · 2017-08-04T20:32:05Z

I actually can't find the notes.org you reference that contains the API you're proposing. Could you provide a link?

njsmith · 2017-08-04T20:37:11Z

I actually can't find the notes.org you reference

Sorry, it's in the root of the source tree in the bleach-spike branch: notes.org raw, notes.org formatted

brettcannon · 2017-08-04T21:27:43Z

Ah, OK. I thought you had changed the default branch for your repo and that I was looking at bleach-spike already. My bad. 😄

brettcannon · 2017-08-04T21:36:23Z

Questions:

Do you expect AbstractBackend to grow, else couldn't it just be a callable and you drop the need for an ABC?
Why define receive_some() but not send_all()?
For send_and_receive_for_a_while() the parameter names through me initially; maybe produce_bytes, and consume_bytes?

njsmith · 2017-08-04T21:46:41Z

Do you expect AbstractBackend to grow, else couldn't it just be a callable and you drop the need for an ABC?

Currently I'm treating the backend API as an internal (non-public) interface, so I haven't worried too much about such refinements :-). (In fact ATM there isn't even any ABC created in the code, only in the notes.) Now, possibly it would make sense to make this a public API, at which point we would need to think about such things, but for now I don't care that much.

The SyncBackend has state (timeouts), which makes it a little more objecty. Of course it could be SyncConnector with a __call__ method... whatever, let's see if this works at all first :-)

Why define receive_some() but not send_all()?

Because I needed received_some but not send_all :-). This isn't designed as a generic networking API suitable for all purposes; it's designed to exactly match what urllib3 needs.

(It turns out that the way HTTP/1.1 works – and specifically given the way @Lukasa wants to handle it in v2 – any time you're sending you also want to be watching for an early response from the server, so send_and_receive_for_a_while is what you use instead of send_all.)

For send_and_receive_for_a_while() the parameter names through me initially; maybe produce_bytes, and consume_bytes?

Sure, those are probably better, I'll rename.

njsmith · 2017-08-04T21:47:54Z

changed the default branch

Oh, that's also a good idea :-). Done.

merrellb · 2017-08-13T06:08:51Z

Is it possible to pull together some sample code showing how we would select a backend and interact with it?

njsmith · 2017-08-13T08:13:25Z

sample code

Here's the tiny example on the home page, currently:

>>> import urllib3
>>> http = urllib3.PoolManager()
>>> r = http.request('GET', 'http://httpbin.org/robots.txt')
>>> r.status
200
>>> r.data
'User-agent: *\nDisallow: /deny\n'

For purposes of the proof-of-concept, let's say we want it to instead look like:

>>> import urllib3
>>> from urllib3.backends.trio import TrioBackend
>>> http = urllib3.PoolManager(TrioBackend())
>>> r = await http.request('GET', 'http://httpbin.org/robots.txt')  # <-- notice the await
>>> r.status
200
>>> r.data
'User-agent: *\nDisallow: /deny\n'

Oh, hmm, the data property is messy actually, b/c the first time you access it it can block. That doesn't work with an async API. Maybe the answer is that if you set preload_content=False, then .data isn't available? But the goal here isn't to fix up every corner of urllib3's API for a glorious async future, but just to get a proof-of-concept, so maybe we should say that our first goal is to get this working:

import trio
import urllib3
from urllib3.backends.trio import TrioBackend

async def main():
    http = urllib3.PoolManager(TrioBackend())
    r = await http.request('GET', 'http://httpbin.org/robots.txt', preload_content=False)
    print(r.status)  # prints "200"
    print(await r.read())  # prints "User-agent: *\nDisallow: /deny\n"

trio.main(run)

Twisted version:

# Cribbing from https://twistedmatrix.com/documents/current/core/howto/defer-intro.html#coroutines-with-async-await
import urllib3
from urllib3.backends.twisted import TwistedBackend
from twisted.internet import reactor
from twisted.internet.defer import ensureDeferred

async def main():
    http = urllib3.PoolManager(TrioBackend())
    r = await http.request('GET', 'http://httpbin.org/robots.txt', preload_content=False)
    print(r.status)  # prints "200"
    print(await r.read())  # prints "User-agent: *\nDisallow: /deny\n"
    reactor.stop()

ensureDeferred(main())
reactor.run()

And for the sync backend:

import urllib3
from urllib3.backends.sync_backend import SyncBackend

async def main():
    http = urllib3.PoolManager(SyncBackend(100, 100))
    r = await http.request('GET', 'http://httpbin.org/robots.txt', preload_content=False)
    print(r.status)  # prints "200"
    print(await r.read())  # prints "User-agent: *\nDisallow: /deny\n"

def run_secretly_sync_async_fn(async_fn, *args):
    coro = async_fn(*args)
    try:
        coro.send(None)
    except StopIteration as exc:
        return exc.value
    else:
        raise RuntimeError("you lied, this async function is not secretly synchronous")

run_secretly_sync_async_fn(main)

pquentin · 2017-12-27T15:00:47Z

Is this what you had in mind regarding the bleaching script? tokenize does not preserve formatting, I guess lib2to3 would be needed for that.

from tokenize import tokenize, untokenize, ASYNC, AWAIT, NAME

ASYNC_TO_SYNC = {
    '__aenter__': '__enter__',
    '__aexit__': '__exit__',
    '__aiter__': '__iter__',
    '__anext__': '__next__',
    # '__await__': '__iter__' ?
}


if __name__ == '__main__':
    with open(sys.argv[1], 'rb') as f:
        result = []
        g = tokenize(f.readline)
        for toknum, tokval, _, _, _ in g:
            if toknum == NAME and tokval in ASYNC_TO_SYNC:
                result.append((toknum, ASYNC_TO_SYNC[tokval]))
            elif toknum not in [ASYNC, AWAIT]:
                result.append((toknum, tokval))
        print(untokenize(result).decode('utf-8'))

njsmith · 2017-12-27T16:40:41Z

That's the basic idea, yeah, and probably fine for testing a proof of concept. However, it is possible to preserve formatting with a bit more cleverness, and without going all the way to 2to3. Here's the trick: the tokenize module tells us the line and column for each token it finds. So what one could do is to first use the tokenize module to find the actual character spans for each token that should be replaced or deleted, and then go back to the original formatted source string and make just those changes. It's kind of fiddly, but not *too* bad I think. And at least it only has to be implemented once :-).

…

On Dec 27, 2017 07:00, "Quentin Pradet" ***@***.***> wrote: Is this what you had in mind regarding the bleaching script? tokenize does not preserve formatting, I guess lib2to3 would be needed for that. from tokenize import tokenize, untokenize, ASYNC, AWAIT, NAME ASYNC_TO_SYNC = { '__aenter__': '__enter__', '__aexit__': '__exit__', '__aiter__': '__iter__', '__anext__': '__next__', # '__await__': '__iter__' ? } if __name__ == '__main__': with open(sys.argv[1], 'rb') as f: result = [] g = tokenize(f.readline) for toknum, tokval, _, _, _ in g: if toknum == NAME and tokval in ASYNC_TO_SYNC: result.append((toknum, ASYNC_TO_SYNC[tokval])) elif toknum not in [ASYNC, AWAIT]: result.append((toknum, tokval)) print(untokenize(result).decode('utf-8')) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <njsmith/urllib3#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAlOaMmFnQx26KEyc7JPeU0-ZpjkMnijks5tEluggaJpZM4OmOs5> .

pquentin · 2017-12-28T17:34:33Z

I see! https://gist.github.com/pquentin/188f139be4c1bd5731b54bb2f7c29e41 should be closer to what you are describing. I realize this is not needed for the proof of concept but it's a nice self-contained problem and it was fun to think about it.

njsmith · 2018-01-02T03:49:46Z

Note that gist puts all the tokens in the right position on the page, but it doesn't preserve comments or anything – it just fills in the empty space with space characters. It's possible to get even more fancy, by using the original source file to fill in all the whitespace/comments around the tokens. (One way to think of it: use tokenize to annotate that original string of source code with notes on which spans of characters are meaningful, and then use this to compute a transformation of that string by replacing or deleting certain spans.)

njsmith · 2018-01-02T04:22:14Z

@haikuginger Not sure if you've seen this, but as the new urllib3 maintainer I figured I should give you a heads up :-). This is an experiment in porting urllib3 to support both sync and async operation within a single codebase using h11, and it's just reached the point where it managed to complete its first async http request.

njsmith · 2018-01-02T09:42:07Z

There's some more discussion of whether close should be synchronous or asynchronous in #4

njsmith · 2018-01-04T21:02:05Z

@pquentin (and anyone else who's interested): I've been thinking about the question of what a real deployment of this would look like, and in particular how we want to expose the sync + async APIs. On Python 2, of course, we want to just export a sync API, and that has to be done by "bleaching" the async codebase and shipping that. On Python 3, it's a little more subtle: we want to provide the async API, but we also want to provide a sync API for backwards compatibility and because, well, it's useful. Here we have two options: we could wrap all of the async API in a little async→sync shim like my run_secretly_sync_async_fn above. Or, we could re-use the bleaching trick, and ship both sync and async copies of the code on Python 3: the sync code for use by synchronous programs, and the async code for use by async programs.

Going through and shimming the entire public API sounds like a fair amount of work: shimming a function is easy, shimming a method requires creating a whole proxy class to wrap around the original class and reproduce its full API, and then there are more complicated cases like the upload code that will expect an async iterator to be passed. AFAICT this is all doable, but Py2 means we have to get the bleaching trick working, and then once we have that it's not clear that implementing shimming as well will give us any additional benefits to pay back all the additional work. And since by definition they produce the same public API, we can always add shimming later if we decide it's worth it. So for now I think we should forget about shimming, and instead figure out how to ship a bleached+unbleached library on Py3.

Here's one way to do it:

Move all the code that has async in it to urllib3/_async/. (I guess this is most of the code, but we can keep some common code outside. In particular urllib3/exceptions.py should probably stay outside, so that we only have one copy of the exception classes. Maybe some utility classes like Timeout and Retry similarly.)
At build time, use a script to generate urllib3/_sync/ as a copy of urllib3/_async/ with all the async taken out.
In urllib3/__init__.py:
- do from ._sync import * to preserve the old public API.
- if we're on a python that can support it (sys.version_info >= (3, 5) or whatever; actually (3, 6) right now because we use an async generator, but maybe that will change), then from . import _async as async_api. (Or something like that. async_api is cumbersome, but sadly we can't call it async, because that's a keyword in 3.7+.) Or maybe all we need is from ._async import PoolManager as AsyncPoolManager, ProxyManager as AsyncProxyManager?

Anyway, the overall strategy should be clear. And it ends up giving us a single package that can be distributed as a py2 + py3 compatible wheel, provides a backwards-compatible urllib3 sync API everywhere, and provides a full async API on pythons that can handle it. Does that make sense? Am I missing anything?

pquentin · 2018-01-05T06:06:19Z

👍 on the overall strategy. To make this less confusing to contributors, maybe we can actually leave the async code in the root of the package?

njsmith · 2018-01-05T09:46:41Z

leave the async code in the root of the package?

Hmm, can you elaborate on what you're thinking of? Like we'd have urllib3/connectionpool.py, but then at build time it would be removed and replaced with urllib3/_async/connectionpool.py and urllib3/_sync/connectionpool.py?

pquentin · 2018-01-05T12:19:02Z

I had not realized that we would have to remove the async version. Moving more files around than needed is bad, please ignore my suggestion.

I have one question left regarding your suggestion, though. We not only want to preserve from urllib3 import HTTPConnectionPool but also imports like from urllib3.connectionpool import HTTPConnectionPool and from urllib3.contrib.socks import SOCKSProxyManager which are documented and used. (Since the full hierarchy is documented, it has to work.)

With the way Python works, as far as I can tell, when you type from urllib3.contrib.socks import SOCKSProxyManager, the contrib directory and the socks.py file have to exist inside urllib3. So we would need to actually copy the synchronous code in the root of the package, which would look like:

    urllib3
    ├── __init__.py
    ├── _async
    │   ├── connectionpool.py
    │   └── ...
    ├── connectionpool.py
    ├── contrib
    │   └── ...
    └── ...

Did I miss anything?

njsmith · 2018-01-06T11:17:41Z

(Since the full hierarchy is documented, it has to work.)

Oh ick, that's unfortunate. Good catch though. And I think you're right that the way import works, you actually have to have the files there. (Barring exotic tricks like import hooks or directly mutating sys.modules, but let's not go there.)

I guess one option would be to say hey, as part of v2 we're hiding all that internal stuff! But... I have no idea if the urllib3 maintainers would go for that. At least some breakage is unavoidable – e.g. at the top of that page we have documented that urllib3.connection.HTTPConnection is a compatible subclass of httplib.HTTPConnection, which is certainly not going to be true in any kind of urllib3 v2, given that the original motivation feature of v2 is to get rid of httplib.

I guess in general this is going to have to be a long-term discussion, with a wide range of possible outcomes including "change nothing", "change some things", "okay this is urllib4 now", ". So for now we should pick something plausible, and just keep in mind that we might end up changing it later.

Off the top of my head, I see three options that could qualify as "plausible" (maybe there are more?):

Put our code in urllib3/_async/..., and then have the script copy everything into urllib3/..., like you suggest. It feels weird, but it would work. One annoying thing through is that it will be tricky to import common code like urllib3.exceptions – if you're in urllib3/_async/foo.py, you want to write from ..exceptions import ..., but if you're in urllib3/foo.py, you have to write from .exceptions import ... instead (one dot versus two). I guess we could wrap every exceptions import in a if _BLEACHED: ... else: .... Bleh.
Actually split the async and sync variants into two packages, so we write async_urllib3/..., and then the script generates urllib3/..., and I guess they'd be uploaded to pypi as two projects with synchronized release schedules.
Stick with my original suggestion, in the process breaking anything that's using the "private" (but documented) APIs.

The last actually seems simplest for right now, though we know we might have to change it in the future...

pquentin · 2018-01-08T05:12:24Z

First, I would like to say that I'm really happy that we are discussing this, it means we have made meaningful progress. Let's not forget that! 😀

To avoid a Python 3 like situation, I would like to avoid breaking anything that we don't have to break, and since we're not renaming .close() to .aclose(), the only breakage for now is getting rid of httplib: not breaking more things is a worthwhile goal, I think.

Regarding exceptions, I do believe it's important to have only one instance of each exception, so that an except clause with the "async version" of say, TimeoutError catches the "sync version" too. This could be handled in exceptions.py itself, where the async version of exceptions.py could import the exceptions defined in the sync version. This would contain the ugliness. (This is actually my only grip with the otherwise quite tempting async_urllib3 package.) For other classes such as Retry, I guess having two copies is not that bad, even if not sans I/O friendly.

The first users of those documented internal APIs are tests, so we are going to feel the pain of migrating first hand! It looks like putting the sync version at the root of package means most synchronous tests will continue working as before! If that's true, it looks like it's going to be so much more easier that I would not be able to justify a cleaner approach.

But of course, the most important thing is to choose an approach, and be prepared to change when we have more data.

njsmith · 2018-01-08T06:43:56Z

This could be handled in exceptions.py itself, where the async version of exceptions.py could import the exceptions defined in the sync version.

Hmm, so we'd have a file at urllib3/_async/exceptions.py that looks like:

_BLEACHED = False

if _BLEACHED:
    class HTTPError(Exception):
    ...
else:
    # Unbleached
    from ..exceptions import *

and then when we build, then that copies the file to urllib3/exceptions.py except that the copy has _BLEACHED = False? That could work, though it's definitely more "tricky" than I was hoping – we're already spending a huge amount of our trickiness budget on this wacky code generation idea, so I'd like to keep things really really boring otherwise if we can :-).

And I suspect we'll eventually end up hacking at the tests quite a bit, since we'll want to migrate them into testing both the sync and async versions. But okay, yeah, backcompat is pretty hard to argue with.

I just had another idea that I think maybe is the best; let's see what you think. What if we (a) put the async code in urllib3/_async/, (b) put the generated sync code in urllib3/_sync/, and (c) common code like exceptions go into urllib3/exceptions.py, (d) for each of the submodules that are documented as public, we create a file like:

# urllib3/connectionpool.py
from ._sync.connectionpool import ConnectionPool, HTTPConnectionPool, HTTPSConnectionPool

This is clunky and low tech, but there are a finite number of documented submodules, and each one contains a finite list of documented exports, so it can be done. It's also incredibly obvious and boring (which is good), and it gives a very explicit dividing line between public and private API (which seems like an important step for making urllib3 more maintainable going forward).

pquentin · 2018-01-08T06:46:19Z

@njsmith This sounds great! 👍 Let's try that.

smurfix · 2018-01-18T23:19:58Z

Hmm. Flask used to include code so that flask.ext.* is actually loaded from flask_*or flaskext.*. Presumably you could do something similar. See pallets/flask@715a9a3e7 where this code was removed (because it was no longer needed).

njsmith · 2018-01-19T04:43:26Z

Right, it's possible to play games with sys.modules to make it look like modules that live in urllib3._sync.* are actually urllib3.*; that would be another option. But this is a bit complicated and tricky (esp. keeping in mind that urllib3 is often vendored into other projects's namespaces), and with the whole bleaching thing we're already adding some barriers to contributions, so I think we want to try to minimize further barriers as much as we can. Plus this way it's easier to avoid new symbols accidentally leaking out and becoming public. But ultimately it'll be up to the urllib3 maintainers to decide :-)

njsmith · 2018-01-20T02:50:29Z

I just remembered a potentially compelling reason why we might want to make shutdown async (so it'd be async with AsyncPoolManager instead of with AsyncPoolManager): if we ever want to add HTTP/2 support (which I know is one of @Lukasa's goals), then (a) you need background tasks to monitor for PING frames, and that works better with async with (e.g. on trio it lets you make a nursery), and (b) the protocol has a graceful shutdown mode via GOAWAY frames, and if you want to support that then you need an async close. I'm not sure why GOAWAY helps clients, but the spec is unambiguous that both clients and servers SHOULD send GOAWAY when possible.

smurfix · 2018-01-20T09:33:18Z

GOAWAY helps clients when the server decides it needs to gracefully shut down the connection (server restart, load balancing). The client can then transparently open another socket and use that for new traffic, thus avoiding the latency associated with looking up the name and connecting.

GOAWAY helps the server because with HTTP/2 servers can spontaneously send additional resources which the client is going to need (e.g. style sheets or fonts). Not doing that on a connection that will go away ASAP is a good idea.

In general, sometimes the underlying connection is more complex than a simple socket that can be closed indiscriminately. It's a good idea to always support the async version, rather than having to change client code when the server requires more complex handling.

njsmith · 2018-01-20T09:51:11Z

@smurfix if you're interested in this question you might want to check out #4 as well for context – it's a remarkably nuanced question. My bias was also initially for async close, but then I switched when we realized how many complications it would make in the API (e.g. there's a public class that implements the mapping interface and calls close on values from __setitem__ and __del__, which cannot be made async...). But yeah, with these actual semantic motivations I'm considering switching back...

pquentin · 2018-01-22T07:58:17Z

Note that gist puts all the tokens in the right position on the page, but it doesn't preserve comments or anything – it just fills in the empty space with space characters. It's possible to get even more fancy, by using the original source file to fill in all the whitespace/comments around the tokens. (One way to think of it: use tokenize to annotate that original string of source code with notes on which spans of characters are meaningful, and then use this to compute a transformation of that string by replacing or deleting certain spans.)

@njsmith I'm sorry that you essentially had to say this twice.

I don't think we have to look at the content of the file: the tokenize modules does preserve comments (in the COMMENT token) and backslashes (implicitly by omitting the NEWLINE token). My script needs to be simplified (!) to handle a few edge cases like this, though.

I also don't think we want to look at the content of the file. Sure, the bleaching script is only going to remove characters, so an in-place O(n) algorithm exists, but since we're dealing with Unicode and Python lists, it's not clear that this algorithm is any better that just using ''.join(chunks) that I think is more natural since it allows you to drop async chunks in a dedicated generator. (My scripts needs to be modified to be able to do this in one pass).

If I'm not missing anything and 1/ we don't have to do it and 2/ we do not want to do it, maybe we won't do it! :)

njsmith · 2018-01-22T09:53:17Z

I'm sorry that you essentially had to say this twice.

Oh, sorry, I just figured we were ignoring the issue until later, since it's not really crucial :-). I forgot that tokenize keeps comments though! That does potentially simplify things. (Maybe it would be interesting as a test to run the script over a bunch of code that doesn't use async/await, like the 3.4 stdlib or something, and check whether it preserves the text? I don't know that we necessarily have to preserve the text exactly, but it'd at least be nice to know what any changes look like.)

I'm not worried about speed at all, just correctness.

thinkjson · 2018-02-05T11:53:11Z

Caught up on the past few months of conversation. I have no input on direction at this time, but am making myself available for a few hours a week if you need someone to just get stuff done.

pquentin · 2018-02-05T12:29:03Z

@thinkjson Cool! I'm currently waiting on direction to know what the strategy is to get the 215 commits from the urllib3 master branch. But I think the next step is to work on tests, any improvement would be welcome here.

njsmith · 2018-02-05T13:30:45Z

I think for to the master branch, we'll probably just need to go through them one at a time at some point and forward port each one. It'll be a bit tedious, but it's fewer PRs than it looks, and most of them looked pretty simple. But before we touch that I think we need to get the test suite and more functionality running first, because otherwise we won't be able to tell if our ports worked.

On the test suite, I think the next goal is to get it running in some form, any form, from start to finish, under pytest. Like, we should define

broken = pytest.mark.xfail
extra_broken = pytest.mark.skipif(True, reason="freezes")

and then start hacking at the test suite adding @broken and @extra_broken tags everywhere and doing whatever else we have to do until it runs in some form. Because then at that point we have a baseline, and can start progressively increasing how much stuff works, one test at a time. Just make sure any gross hacks are clearly labeled so we can grep for them later :-).

njsmith · 2018-02-21T05:23:07Z

I guess I should post an update, since the last comment here is way out of date. We are now up to date with urllib3 master, and you can run the test suite by doing python3 ./runtests.py, and it should pass (with lots and lots of skips and xfails).

kennethreitz · 2018-03-16T11:24:40Z

Playing with this today — very impressed!

RazerM · 2018-07-30T22:36:09Z

FYI pip is installing pluggy 0.7.1. It's resolving the pluggy<1.0,>=0.3.0 requirement from tox and ignoring the pluggy<0.7,>=0.5 requirement from pytest. Travis log here

Workaround:

diff --git a/tox.ini b/tox.ini
index b98e128..2c3f040 100644
--- a/tox.ini
+++ b/tox.ini
@@ -2,7 +2,9 @@
 envlist = flake8-py3, py27, py34, py35, py36, py37, pypy, pypy3

 [testenv]
-deps= -r{toxinidir}/dev-requirements.txt
+deps=
+    pluggy==0.6.0
+    -r{toxinidir}/dev-requirements.txt
 commands=
     # Print out the python version and bitness
     pip --version

This was referenced Jul 28, 2017

Urllib3 with h11 and trio urllib3/urllib3#1228

Closed

Make SSLStream more robust against cancellation python-trio/trio#198

Open

njsmith mentioned this issue Sep 17, 2017

Any interest in infiltrating requests/urllib3 development? theelous3/asks#38

Open

njsmith mentioned this issue Dec 20, 2017

Add readable method to Socket dabeaz/curio#238

Closed

pquentin mentioned this issue Jan 2, 2018

Send request through trio backend #3

Merged

njsmith mentioned this issue Jan 21, 2018

Document exceptions raised by Kernel.run dabeaz/curio#242

Closed

njsmith mentioned this issue Feb 1, 2018

RFC: what do you think of this branch adding async support to urllib3? urllib3/urllib3#1323

Closed

miracle2k mentioned this issue Mar 6, 2018

Add support for trio and curio in Python async. ccxt/ccxt#2139

Closed

njsmith mentioned this issue Apr 2, 2018

<repo moved> njsmith/urllib3#1

Closed

pquentin mentioned this issue Aug 1, 2019

Use the existing runloop for sync calls in async handler encode/httpx#179

Closed

jab mentioned this issue Aug 18, 2019

Does werkzeug have plans to support ASGI? pallets/werkzeug#1322

Closed

pquentin mentioned this issue Nov 8, 2019

Incompatible with jupyter encode/httpx#508

Closed

pquentin mentioned this issue Nov 19, 2019

Should we drop Python 2.7 support? #147

Open

garrettheel mentioned this issue Sep 28, 2020

[Experimental] add asyncio support pynamodb/PynamoDB#853

Closed

pquentin mentioned this issue Sep 8, 2023

Async support elastic/elasticsearch-dsl-py#1672

Closed

Discussion: the urllib3 bleach-spike branch #1

Discussion: the urllib3 bleach-spike branch #1

Comments

njsmith commented Jul 28, 2017 • edited by pquentin Loading

Edit: current status

Things that are working:

Things that need to be done (incomplete list)

Getting started

Original post

brettcannon commented Aug 4, 2017

njsmith commented Aug 4, 2017

brettcannon commented Aug 4, 2017

brettcannon commented Aug 4, 2017

njsmith commented Aug 4, 2017

njsmith commented Aug 4, 2017

merrellb commented Aug 13, 2017

njsmith commented Aug 13, 2017 • edited Loading

pquentin commented Dec 27, 2017

njsmith commented Dec 27, 2017 via email

pquentin commented Dec 28, 2017

njsmith commented Jan 2, 2018

njsmith commented Jan 2, 2018

njsmith commented Jan 2, 2018

njsmith commented Jan 4, 2018

pquentin commented Jan 5, 2018

njsmith commented Jan 5, 2018

pquentin commented Jan 5, 2018

njsmith commented Jan 6, 2018

pquentin commented Jan 8, 2018

njsmith commented Jan 8, 2018

pquentin commented Jan 8, 2018

smurfix commented Jan 18, 2018 • edited Loading

njsmith commented Jan 19, 2018

njsmith commented Jan 20, 2018

smurfix commented Jan 20, 2018

njsmith commented Jan 20, 2018

pquentin commented Jan 22, 2018

njsmith commented Jan 22, 2018

thinkjson commented Feb 5, 2018

pquentin commented Feb 5, 2018

njsmith commented Feb 5, 2018

njsmith commented Feb 21, 2018

kennethreitz commented Mar 16, 2018

RazerM commented Jul 30, 2018

njsmith commented Jul 28, 2017 •

edited by pquentin

Loading

njsmith commented Aug 13, 2017 •

edited

Loading

smurfix commented Jan 18, 2018 •

edited

Loading