Switch Async HTTP-client to `httpx` (HTTP/2-support) #770

thomasht86 · 2024-05-12T11:13:46Z

I confirm that this contribution is made under the terms of the license found in the root directory of this repository's source tree and that I have the authority necessary to make this contribution on behalf of its copyright owner.

Removed @staticmethod-decorator of callback_docv1 (See comment on bug below)
Switched to httpx - Mainly to add HTTP/2 support, which presumably could improve feed performance by a lot.
Unit tests passing ✅ - BUT, working to quantify performance gain.

⚠️ Encountered this bug when tenacity retry is applied

self = <AsyncRetrying object at 0x10468fca0 (stop=<tenacity.stop.stop_after_attempt object at 0x10468fb20>, wait=<tenacity.wa...bject at 0x10468fac0>, before=<function before_nothing at 0x104463f70>, after=<function after_nothing at 0x10447b1f0>)>
retry_state = <RetryCallState 4369142976: attempt #3; slept for 3.0; last result: failed (TypeError object dict can't be used in 'await' expression)>

    def iter(self, retry_state: "RetryCallState") -> t.Union[DoAttempt, DoSleep, t.Any]:  # noqa
        fut = retry_state.outcome
        if fut is None:
            if self.before is not None:
                self.before(retry_state)
            return DoAttempt()
    
        is_explicit_retry = fut.failed and isinstance(fut.exception(), TryAgain)
        if not (is_explicit_retry or self.retry(retry_state)):
            return fut.result()
    
        if self.after is not None:
            self.after(retry_state)
    
        self.statistics["delay_since_first_attempt"] = retry_state.seconds_since_start
        if self.stop(retry_state):
            if self.retry_error_callback:
>               return self.retry_error_callback(retry_state)
E               TypeError: 'staticmethod' object is not callable

notebooks/lib/python3.9/site-packages/tenacity/__init__.py:322: TypeError

We should make unit tests for error handling. Maybe this is part of your fix?

jobergum · 2024-05-13T06:48:22Z

Last time I tried httpx with http/2 I ran into issues with that it did not handle the case when the server would shutdown the connection (we do in Vespa cloud to disallow extremely long connections)

thomasht86 · 2024-05-13T06:52:44Z

Last time I tried httpx with http/2 I ran into issues with that it did not handle the case when the server would shutdown the connection (we do in Vespa cloud to disallow extremely long connections)

That's very useful info. I will try to mock a test for that to check!

thomasht86 · 2024-06-14T12:01:09Z

Closing for now, as httpx did not improve feeding performance notably, even with HTTP/2.
Need to investigate further to justify the change.

thomasht86 · 2024-07-30T08:54:46Z

Adding these for context:
encode/httpx#2112
encode/httpx#3215

Got a prototype working. Get Error: <ConnectionTerminated error_code:ErrorCodes.NO_ERROR, last_stream_id:2147483647, additional_data:677261636566756c> for the last ~3k when running 10k POST-requests with vectors "concurrently".
1k requests is approx 2x faster than feed_iterable, which is promising.
Need to implement queue, retries and semaphore to benchmark properly.

thomasht86 · 2024-08-09T09:15:59Z

Ok, think this should be good now!

VespaAsync now uses httpxwith http/2.
Changed default max_connections-parameter to VespaAsync.__init__() to 1.
Introduced keepalive_expiry=10 to avoid connections being terminated by the server.
Added feed_async_iterable()-method. ~2-5x faster than feed_iterable() for the test case in the notebooks, better with increasing number of docs.
Added feed_performance_cloud.ipynb that compares and demonstrates different ways of feeding to Vespa Cloud.

Some things that were explored:

Using asyncio.Queue, following the pattern from feed_iterable(), but using a Semaphore and only simple batching for restricting memory usage of tasks was 2-3x faster (with the case from the notebook in the PR). Possibly due to overhead of switching between putting/getting from the queue.
Different max_queue_size does not impact performance significantly.
Increasing max_connections for the async client impacts the performance negatively, as expected with HTTP2.

Should still be possible to get better performance by parallelizing, but leaving this for later.

Tests

Added unit test.
Added integration test in docker.
Added notebook which also acts as test.
⚠️ Unit tests failing on linkcheck, will be fixed in different PR.
⚠️ Integration-cloud test cancelled due to conflicting with other PR runs.

glebashnik · 2024-08-09T11:02:13Z

tests/integration/test_integration_docker.py

+            namespace="retryapplication",
+        )
+
+    def test_retries_async(self):


Not obvious from code that it tests retires.
Is feeding 10 docs will trigger retries?
Maybe a brief comment explaining that uses special application that sleeps on indexing.

👍
I will add comments to make this more obvious.

vespa/application.py

glebashnik · 2024-08-09T13:00:20Z

vespa/application.py

@@ -582,6 +581,157 @@ def _handle_result_callback(
                queue.join()
                consumer_thread.join()

+    def feed_async_iterable(
+        self,
+        iter: Iterable[Dict],


Better name for iter arg, e.g. documents or documents_iter.

Same, iteris a reserved keyword, and shadowing it isn`t ideal, but think it is more important to follow existing pattern rather than to introduce divergent naming.
Agree?

In case of reserved words, I think it is worth the change.
Confusing for devs.

Hmm, since this is a public API, we would need to add deprecation-warning for a while first if we want to rename the keyword arg from feed_iterable. That should nevertheless be a separate PR. So, then the question is if this new method should have both the renamed arg, but also iter in the meanwhile, to avoid divergent naming.

Maybe @jobergum could enlighten us a little on best practice here? 🙏

Right, backwards compatibility for public APIs.
Separate PR makes sense.

Since this is a new method with a new signature we don't need to make the same mistake, it this works well, we can deprecate the sync version.

glebashnik

See the comments.

vespa/application.py

thomasht86 added 6 commits May 12, 2024 07:29

switch

4ee838d

fix tests ✅

c0e6426

add integration test

bc8770a

add dependency

cb09085

explicit

ece75e7

no self

3e77399

thomasht86 changed the title ~~Switch Async HTTP-client to httpx~~ Switch Async HTTP-client to httpx (HTTP/2-support) May 13, 2024

thomasht86 added 14 commits May 13, 2024 09:44

rename

93c7d73

Merge branch 'master' into thomasht86/switch-asynchttpclient-to-httpx

93f6060

fix test

7a57cb1

change client to httpx

be037d5

fix depr-warning ellipticcurve

e6d2bd8

test

3668cc4

🤦‍♂️

7bca842

await

97340bd

--

1e3b689

now

d54be6a

add comparison notebook

84dd3a3

add keepalive_expiry and add docstring

e1d173c

test longlived connection

bd2fdeb

update feed cloud notebook

0932066

thomasht86 closed this Jun 14, 2024

Merge branch 'master' into thomasht86/switch-asynchttpclient-to-httpx

dbdcee6

bratseth reopened this Jul 29, 2024

working http2 without queue handling

34bc3dd

minor adjustments to application.py

a280b49

thomasht86 added 18 commits August 7, 2024 14:40

working async_iterable

b0a0ef5

Merge branch 'master' into thomasht86/switch-asynchttpclient-to-httpx

577062c

not raise on status

5bbf67e

add test_for async feed

d3d3fb7

add callback handling

cf0ff49

add unit test

008f57e

add docker integration test

8ef7c0b

Merge branch 'master' into thomasht86/switch-asynchttpclient-to-httpx

d035de3

notebook

45c116c

improve docstring and comments

ce96d49

better docstring

31a1c9a

cleaner nb

5170f8f

improve notebook

c246e0c

adjust default params

bc3bc70

Merge branch 'master' into thomasht86/switch-asynchttpclient-to-httpx

2dcd2c3

update default params

3fbe88b

full nb run

18ddf9b

remove superfluous async in open

edaa1d6

thomasht86 marked this pull request as ready for review August 9, 2024 09:33

thomasht86 requested a review from glebashnik August 9, 2024 09:33

thomasht86 added 3 commits August 12, 2024 09:28

Merge branch 'master' into thomasht86/switch-asynchttpclient-to-httpx

c58b6e8

fix link

49d79bd

replace ! with subprocess

0914b8f

glebashnik requested changes Aug 12, 2024

View reviewed changes

vespa/application.py Show resolved Hide resolved

glebashnik approved these changes Aug 13, 2024

View reviewed changes

thomasht86 merged commit a8af934 into master Aug 13, 2024
34 of 45 checks passed

thomasht86 deleted the thomasht86/switch-asynchttpclient-to-httpx branch August 13, 2024 11:09

thomasht86 mentioned this pull request Aug 14, 2024

FR: async feed_iterable #712

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch Async HTTP-client to `httpx` (HTTP/2-support) #770

Switch Async HTTP-client to `httpx` (HTTP/2-support) #770

thomasht86 commented May 12, 2024 •

edited

Loading

jobergum commented May 13, 2024

thomasht86 commented May 13, 2024

thomasht86 commented Jun 14, 2024

thomasht86 commented Jul 30, 2024

thomasht86 commented Aug 9, 2024

glebashnik Aug 9, 2024

thomasht86 Aug 12, 2024

glebashnik Aug 9, 2024 •

edited

Loading

thomasht86 Aug 12, 2024

glebashnik Aug 12, 2024

thomasht86 Aug 12, 2024

glebashnik Aug 12, 2024

jobergum Aug 13, 2024

glebashnik left a comment

Switch Async HTTP-client to httpx (HTTP/2-support) #770

Switch Async HTTP-client to httpx (HTTP/2-support) #770

Conversation

thomasht86 commented May 12, 2024 • edited Loading

jobergum commented May 13, 2024

thomasht86 commented May 13, 2024

thomasht86 commented Jun 14, 2024

thomasht86 commented Jul 30, 2024

thomasht86 commented Aug 9, 2024

glebashnik Aug 9, 2024

Choose a reason for hiding this comment

thomasht86 Aug 12, 2024

Choose a reason for hiding this comment

glebashnik Aug 9, 2024 • edited Loading

Choose a reason for hiding this comment

thomasht86 Aug 12, 2024

Choose a reason for hiding this comment

glebashnik Aug 12, 2024

Choose a reason for hiding this comment

thomasht86 Aug 12, 2024

Choose a reason for hiding this comment

glebashnik Aug 12, 2024

Choose a reason for hiding this comment

jobergum Aug 13, 2024

Choose a reason for hiding this comment

glebashnik left a comment

Choose a reason for hiding this comment

Switch Async HTTP-client to `httpx` (HTTP/2-support) #770

Switch Async HTTP-client to `httpx` (HTTP/2-support) #770

thomasht86 commented May 12, 2024 •

edited

Loading

glebashnik Aug 9, 2024 •

edited

Loading