Use aiohttp for chainlet <-> chainlet communication #1194

squidarth · 2024-10-21T04:57:41Z

🚀 What

While working on a chain in production, noticed some latency issues in the chain <-> chain communication. After a bunch of debugging, stumbled across this unresolved github issue: encode/httpx#3215, which seemed to suggest that httpx has bad performance in high concurrent scenarios.

After subbing out httpx for aiohttp, I was able to notice a big performance increase on a toy example, see this loom: https://www.loom.com/share/fb1fbe06529e473499901766711787b4.

Note: I didn't touch the synchronous path, but that's not appropriate for highly concurrent scenarios anyway.

💻 How

🔬 Testing

Tested this out on a production chain (using an RC) & also ran the chain integration tests.

After this goes out, we should redeploy chains and see what kind of speed up we get from this.

jrochette · 2024-10-21T13:14:14Z

truss-chains/truss_chains/stub.py

        # Check `_client_cycle_needed` before and after locking to avoid
        # needing a lock each time the client is accessed.
        if self._client_cycle_needed(self._cached_async_client):
            async with self._async_lock:
                if self._client_cycle_needed(self._cached_async_client):
+                    connector = aiohttp.TCPConnector(
+                        limit=self._client_limits.max_connections,


nit: is max_connections still 1000? wondering if we should have a lower number since we saw decreased performance at 1000?

nit: is max_connections still 1000? wondering if we should have a lower number since we saw decreased performance at 1000?

It is -- I think the tough thing with this is that the right number is highly dependent on workload/hardware run. I am strongly leaning towards keeping this high & leaving it up to users to limit the # of outgoing connections.

yeah that makes sense. 👍 Probably something we should document somewhere, either in docs or truss example (or both)

bolasim

I want to approve, but this creates a big gotcha of sync vs async. Can we please keep sync in lockstep with async here?

pyproject.toml

bolasim · 2024-10-21T15:42:34Z

truss-chains/truss_chains/stub.py

@@ -81,17 +82,20 @@ def _client_sync(self) -> httpx.Client:
        assert self._cached_sync_client is not None
        return self._cached_sync_client[0]

-    async def _client_async(self) -> httpx.AsyncClient:
+    async def _client_async(self) -> aiohttp.ClientSession:


I think the typing of everything in the BasetenSession class definition need to change. Additionally, the sync client created above also needs to change. We should ideally try to drop the httpx import and httpx limit definiteion above.

what would you propose changing sync to? requests? I don't think there's a problem with using httpx for sync, this bug is only apparent in async.

okay. that's fine then. Maybe worth adding a comment

I'd like to drop the sync path anyway, but there is an unresolved question around that: https://basetenlabs.slack.com/archives/C06RAC0JT5J/p1727900427264919

marius-baseten · 2024-10-21T16:18:25Z

truss-chains/truss_chains/utils.py

@@ -277,36 +306,31 @@ def handle_response(response: httpx.Response, remote_name: str) -> Any:
                "Could not get JSON from error response. Status: "
                f"`{response.status_code}`."
            ) from e
+        _handle_response_error(response_json=response_json, remote_name=remote_name)


Good call to refactor this, thanks!

squidarth · 2024-10-30T21:52:42Z

pyproject.toml

@@ -108,7 +109,7 @@ httpx = { extras = ["cli"], version = "*" }
 mypy = "^1.0.0"
 pytest-split = "^0.8.1"
 requests-mock = ">=1.11.0"
-types-requests = ">=2.31.0.2"
+types-requests = "==2.31.0.2"


FYI: I did this downgrade to fix an issue where poetry lock --no-update hangs forever. We can't use later versions of types-requests because it depends on urllib > 2, which is not supported by boto.

squidarth added 2 commits October 21, 2024 03:42

Switch to aiohttp for chains.

55cb869

Minor refactor + better docstring.

daa11a0

squidarth requested review from marius-baseten and bolasim October 21, 2024 04:57

jrochette approved these changes Oct 21, 2024

View reviewed changes

bolasim reviewed Oct 21, 2024

View reviewed changes

bolasim approved these changes Oct 21, 2024

View reviewed changes

marius-baseten approved these changes Oct 21, 2024

View reviewed changes

squidarth added 4 commits October 30, 2024 20:33

Resolve merge conflict.

7535fe2

Updated deps.

585433c

Fix types.

abd0252

Fix poetry lock issue.

99a945e

marius-baseten approved these changes Oct 30, 2024

View reviewed changes

Revert boto3 changes.

8e203c4

squidarth merged commit f544e05 into main Oct 30, 2024
5 checks passed

squidarth deleted the sshanker/aiohttp-chains branch October 30, 2024 21:49

squidarth commented Oct 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use aiohttp for chainlet <-> chainlet communication #1194

Use aiohttp for chainlet <-> chainlet communication #1194

squidarth commented Oct 21, 2024

jrochette Oct 21, 2024

squidarth Oct 21, 2024

jrochette Oct 21, 2024

bolasim left a comment

bolasim Oct 21, 2024

squidarth Oct 21, 2024

bolasim Oct 21, 2024

marius-baseten Oct 21, 2024

marius-baseten Oct 21, 2024

squidarth Oct 30, 2024

Use aiohttp for chainlet <-> chainlet communication #1194

Use aiohttp for chainlet <-> chainlet communication #1194

Conversation

squidarth commented Oct 21, 2024

🚀 What

💻 How

🔬 Testing

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bolasim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment