-
-
Notifications
You must be signed in to change notification settings - Fork 857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve async performance. #3215
Comments
Found some related discussions: Opening a proper issue is warranted to get better visibility for this. So the issue is easier to find for others. In its current state |
Oh, interesting. There's some places I can think of where we might want to be digging into here...
Possibly points of interest here...
Also, the tracing support in both aiohttp and in httpx are likely to be extremely valuable to us here. |
Thank you for the good points!
My original benchmark hit AWS S3. There I got very similar results where |
Okay, thanks. Was that also testing small |
Yes pretty much, GET of a file with size of a couple KB. In the real system the sizes ofcourse vary alot. |
@tomchristie you were right, this is the issue ^! When I just do a simple patch into |
There is another hot spot in The logic in connection pool is quite heavy as it rechecks all of the connections every time requests are assigned to the connectors. It might be possible to skip the Probably it would be good idea to add some performance tests to httpx/httpcore CI. |
I can probably help with a PR if you give me pointers about how to proceed :) I could eg replace the synchronization primitives to use the native asyncio. |
See encode/httpcore#344, #1511, and encode/httpcore#345 for where/why we switched over to anyio.
A good first pass onto this would be to add an You might want to work from the last version that had an Docs... https://www.encode.io/httpcore/network-backends/ Other context...
|
Thanks @tomchristie What about this case I pointed:
There switching network backend won't help as the lock is not defined by the network implementation. The lock implementation is a global one. Should we just change the synchronization to use asyncio? |
I'm able to push the performance of Previously (in You can see the benchmark here. Here are the changes. There are 3 things required to improve the performance to get it as fast as
I'm happy to open a PR from these. What do you think @tomchristie? |
@MarkusSintonen - Nice one. Let's work through those as individual PRs. Is it worth submitting a PR where we add a |
I think it would be beneficial to have benchmark run in CI so we would see the difference. Previously I have contributed to Pydantic and they use codspeed. That outputs benchmark diffs to PR when the benchmarked behaviour changes. It should be free for open-source projects. |
That's an interesting idea. I'd clearly be in agreement with adding a |
@tomchristie I have now opened the 2 fix PRs:
Maybe Ill open the network backend addition after these as its the most complex one. |
Maybe you can refer to the implementation of aiohttp |
Isn't usage of http.CookieJar a part of the problem? Line 1020 in db9072f
|
@rafalkrupinski I haven't run benchmarks when requests/responses uses cookies but atleast it doesnt cause performance issues in general. I run similar benchmarks from
(Waiting for review from @tomchristie) |
TBH I'm surprised by httpx ditching anyio. Sure anyio comes with performance overhead, but this is breaking compatibility with Trio. |
I'm not aware of it ditching it completely. It will still support using it, it's just optional. Trio will be also supported by httpcore. |
These are really cool speed-ups. Can't wait for httpx to overtake aiohttp ;) |
Since the benchmark seems to be using http I think below is also a related issue where creation of ssl context in httpx had some overhead compared to aiohttp. Ref : #838 |
Hi, any movements on the PRs? We're having to use both aiohttp and httpx in our project because of this reason, whereas we'd like to only have 1 set of API. |
I use aiohttp to encapsulate a chain call method, which I personally feel is pretty good. url = "https://juejin.cn/"
resp = await AsyncHttpClient().get(url).execute()
# json_data = await AsyncHttpClient().get(url).json()
text_data = await AsyncHttpClient(new_session=True).get(url).text()
byte_data = await AsyncHttpClient().get(url).bytes() example:https://github.com/HuiDBK/py-tools/blob/master/demo/connections/http_client_demo.py |
httpx.AsyncClient
has much worse performance than aiohttp.ClientSession
with concurrent requests
Is there any progress on this issue? |
In what scenarios might httpx encounter performance bottlenecks? Is there a more general explanation? |
Hello guys, I think I've encountered the same issue. However, our production code heavily relies on import asyncio
import typing
import time
import aiohttp
from aiohttp import ClientSession
import httpx
from concurrent.futures import ProcessPoolExecutor
import statistics
ADDRESS = "https://www.baidu.com"
async def request_with_aiohttp(session):
async with session.get(ADDRESS) as rsp:
return await rsp.text()
async def request_with_httpx(client):
rsp = await client.get(ADDRESS)
return rsp.text
# 性能测试函数
async def benchmark_aiohttp(n):
async with ClientSession() as session:
# make sure code is right
print(await request_with_aiohttp(session))
start = time.time()
tasks = []
for i in range(n):
tasks.append(request_with_aiohttp(session))
await asyncio.gather(*tasks)
return time.time() - start
async def benchmark_httpx(n):
async with httpx.AsyncClient(
timeout=httpx.Timeout(
timeout=10,
),
) as client:
# make sure code is right
print(await request_with_httpx(client))
start = time.time()
tasks = []
for i in range(n):
tasks.append(request_with_httpx(client))
await asyncio.gather(*tasks)
return time.time() - start
class AiohttpTransport(httpx.AsyncBaseTransport):
def __init__(self, session: typing.Optional[aiohttp.ClientSession] = None):
self._session = session or aiohttp.ClientSession()
self._closed = False
async def handle_async_request(self, request: httpx.Request) -> httpx.Response:
if self._closed:
raise RuntimeError("Transport is closed")
# 转换headers
headers = dict(request.headers)
# 准备请求参数
method = request.method
url = str(request.url)
content = request.content
async with self._session.request(
method=method,
url=url,
headers=headers,
data=content,
allow_redirects=False,
) as aiohttp_response:
# 读取响应内容
content = await aiohttp_response.read()
# 转换headers
headers = [(k.lower(), v) for k, v in aiohttp_response.headers.items()]
# 构建httpx.Response
return httpx.Response(
status_code=aiohttp_response.status,
headers=headers,
content=content,
request=request
)
async def aclose(self):
if not self._closed:
self._closed = True
await self._session.close()
async def benchmark_httpx_with_aiohttp_transport(n):
async with httpx.AsyncClient(
timeout=httpx.Timeout(
timeout=10,
),
transport=AiohttpTransport(),
) as client:
start = time.time()
tasks = []
for i in range(n):
tasks.append(request_with_httpx(client))
await asyncio.gather(*tasks)
return time.time() - start
async def run_benchmark(requests=1000, rounds=3):
aiohttp_times = []
httpx_times = []
httpx_aio_times = []
print(f"开始测试 {requests} 并发请求...")
for i in range(rounds):
print(f"\n第 {i+1} 轮测试:")
# aiohttp 测试
aiohttp_time = await benchmark_aiohttp(requests)
aiohttp_times.append(aiohttp_time)
print(f"aiohttp 耗时: {aiohttp_time:.2f} 秒")
# 短暂暂停让系统冷却
await asyncio.sleep(1)
# httpx 测试
httpx_time = await benchmark_httpx(requests)
httpx_times.append(httpx_time)
print(f"httpx 耗时: {httpx_time:.2f} 秒")
# 短暂暂停让系统冷却
await asyncio.sleep(1)
# httpx 测试
httpx_time = await benchmark_httpx_with_aiohttp_transport(requests)
httpx_aio_times.append(httpx_time)
print(f"httpx (aiohttp transport) 耗时: {httpx_time:.2f} 秒")
print("\n测试结果汇总:")
print(f"aiohttp 平均耗时: {statistics.mean(aiohttp_times):.2f} 秒")
print(f"httpx 平均耗时: {statistics.mean(httpx_times):.2f} 秒")
print(f"httpx aio 平均耗时: {statistics.mean(httpx_aio_times):.2f} 秒")
if __name__ == '__main__':
# 运行基准测试
asyncio.run(run_benchmark(512))
|
We encountered an issue with |
Although 500 is usually related to server-side error, it seems that |
That's what we're thinking at first, but it's just a simple encode and return endpoint. Nothing can go wrong. |
@MarkusSintonen @tomchristie
|
Here is a more complete version for this workaround. I used it in my production code, and it works well. class AiohttpTransport(AsyncBaseTransport):
def __init__(self, session: aiohttp.ClientSession | None = None):
self._session = session or aiohttp.ClientSession()
self._closed = False
async def handle_async_request(self, request: httpx.Request) -> httpx.Response:
if (
_rsp := try_to_get_mocked_response(request)
) is not None: # 为了兼容RESPX mock
return _rsp
if self._closed:
raise RuntimeError("Transport is closed")
# 应用认证
headers = dict(request.headers)
# 准备请求参数
method = request.method
url = str(request.url)
content = request.content
async with self._session.request(
method=method, url=url, headers=headers, data=content, allow_redirects=False
) as aiohttp_response:
# 读取响应内容
content = await aiohttp_response.read()
# 转换headers
headers = [
(k.lower(), v)
for k, v in aiohttp_response.headers.items()
if k.lower() != "content-encoding"
]
# 构建httpx.Response
return httpx.Response(
status_code=aiohttp_response.status,
headers=headers,
content=content,
request=request,
)
async def aclose(self):
if not self._closed:
self._closed = True
await self._session.close()
mock_router = ContextVar("mock_router")
def try_to_get_mocked_response(request: Request) -> Response | None:
try:
_mock_handler = mock_router.get()
except LookupError:
return None
return _mock_handler(request)
def create_aiohttp_backed_httpx_client(
*,
headers: dict[str, str] | None = None,
total_timeout: float | None = None,
base_url: str = "",
proxy: str | None = None,
keepalive_timeout: float = 15,
max_connections: int = 100,
max_connections_per_host: int = 0,
verify_ssl: bool = False,
login: str | None = None,
password: str | None = None,
encoding: str = "latin1",
) -> httpx.AsyncClient:
timeout = aiohttp.ClientTimeout(total=total_timeout)
connector = aiohttp.TCPConnector(
keepalive_timeout=keepalive_timeout,
limit=max_connections,
limit_per_host=max_connections_per_host,
verify_ssl=verify_ssl,
enable_cleanup_closed=True,
)
if login and password:
auth = aiohttp.BasicAuth(login=login, password=password, encoding=encoding)
else:
auth = None
return httpx.AsyncClient(
base_url=base_url,
verify=False,
transport=AiohttpTransport(
session=aiohttp.ClientSession(
proxy=proxy,
auth=auth,
timeout=timeout,
connector=connector,
headers=headers,
)
),
|
@lizeyan Using this method, are all the APIs consistent with httpx? |
I'm not entirely certain, but in my experience, the features in the code—authentication, timeout, connection limit, and proxy—function effectively. |
I am curious about the response api, should I also use context manager like aiohttp? |
@RyanMarten - I have been doing some work on this in the background which I'll share sometime soon. Our connection pooling is a little overcomplicated, and there's some serious refactoring we can dig into here. (Also, having |
There seems to be some performance issues in
httpx
(0.27.0) as it has much worse performance thanaiohttp
(3.9.4) with concurrently running requests (in python 3.12). The following benchmark shows how running 20 requests concurrently is over 10x slower withhttpx
compared toaiohttp
. The benchmark has very basichttpx
usage for doing multiple GET requests with limited concurrency. The script outputs a figure showing how duration of each GET request has a huge duration variance withhttpx
.I found the following issue but seems its not related as the workaround doesnt make a difference here #838 (comment)
The text was updated successfully, but these errors were encountered: