http stream response via http 1.1 chunked encoding #2233

lxning · 2023-04-16T03:11:58Z

Description

Please read our CONTRIBUTING.md prior to creating your first pull request.

Please include a summary of the feature or issue being fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Support http streaming inference response via http 1.1 chunked encoding

Fixes #(issue)
#2232

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
New feature (non-breaking change which adds functionality)
This change requires a documentation update

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

Test A
reg.txt

Checklist:

Did you have fun?
Have you added tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

codecov · 2023-04-17T17:44:12Z

Codecov Report

Merging #2233 (4c32794) into master (e2b4511) will not change coverage.
The diff coverage is 0.00%.

❗ Current head 4c32794 differs from pull request most recent head 48867c6. Consider uploading reports for the commit 48867c6 to get more accurate results

@@           Coverage Diff           @@
##           master    #2233   +/-   ##
=======================================
  Coverage   70.40%   70.40%           
=======================================
  Files          75       75           
  Lines        3392     3392           
  Branches       57       57           
=======================================
  Hits         2388     2388           
  Misses       1001     1001           
  Partials        3        3

Impacted Files	Coverage Δ
ts/protocol/otf_message_handler.py	`72.58% <0.00%> (ø)`

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

…nked

msaroufim · 2023-04-18T05:12:19Z

docs/inference_api.md

+Client side receives the chunked data.
+```
+def test_echo_stream_inference():
+    test_utils.start_torchserve(no_config_snapshots=True, gen_mar=False)


can we just promote these test_utils functions to the core library they're very useful

which core library? who is the user of the core library?

I mean the ts namespace

msaroufim · 2023-04-18T05:13:53Z

frontend/server/src/main/java/org/pytorch/serve/wlm/BatchAggregator.java

-                if (job.getCmd() == WorkerCommands.STREAMPREDICT
-                        && prediction.getHeaders().get(TS_STREAM_NEXT).equals("true")) {
-                    jobDone = false;
+                if (jobDone) {


do all the java side changes need unit tests?

There is a tech debt in frontend which is lack of unit tests, only includes integration test . This PR uses regression to test the e2e.

sanjams2

Mostly minor comments except one about the logging. I would also like to see the enum used to denote the current state of a request instead of the int (especially with the current choice for which int denotes which state), but this is more of an opinion so therefore not blocking.

sanjams2 · 2023-04-20T15:13:46Z

frontend/server/src/main/java/org/pytorch/serve/job/RestJob.java

@@ -38,6 +45,7 @@ public class RestJob extends Job {

    private ChannelHandlerContext ctx;
    private CompletableFuture<byte[]> responsePromise;
+    private int numStreams; // -1: stream end; 0+: number of streams received; 0: non stream


This is a bit hard to follow and slightly counterintuitive in my opinion. Why wouldnt non-streamable = -1, end of stream = 0? Now we have a disjoint set of integers like so:

streaming no streaming streaming

(-infinity, -1] [0,0] [1, infinity]

which feels hard to keep track of mentally compared to something where the set of streaming states is not split in two by the non-streaming state.

I think this could all be alleviated with the use of an enum. You would still need an integer to track # of parts received, but it would be a bit more clear to read IMO.

Update: looking further in the code, it doesnt appear you even use the number of parts other than to check if it's > 1 (i.e you arent using it as a metric). I think this makes the argument for an enum stronger

the following 4 cases need take different actions.
-1: stream end.
0: non-stream (default).
1: 1st stream.
[2, integer.max): 2nd+ stream.

frontend/server/src/main/java/org/pytorch/serve/wlm/BatchAggregator.java

frontend/server/src/main/java/org/pytorch/serve/wlm/WorkerThread.java

sanjams2 · 2023-04-20T15:36:35Z

ts/protocol/otf_message_handler.py

+            if ts_stream_next is True:
+                context.set_response_header(idx, "ts_stream_next", "true")
+            else:
+                if "true" == context.get_response_headers(idx).get("ts_stream_next"):


Curious: why do we need to check this? When would this be false?

there are 2 scenarios when ts_stream_next is false.

non-stream response.

last stream response. Line 85 is to handle this case.

context.get_response_headers(idx).get("ts_stream_next") is a string, not bool. it's value can be:
"true": has next stream
"false": last stream
"": non-stream response.

frontend/server/src/main/java/org/pytorch/serve/wlm/WorkerThread.java

sanjams2 · 2023-04-20T15:53:46Z

frontend/server/src/main/java/org/pytorch/serve/job/RestJob.java

+                ctx.writeAndFlush(new DefaultHttpContent(Unpooled.wrappedBuffer(body)));
+                ctx.writeAndFlush(LastHttpContent.EMPTY_LAST_CONTENT);


Minor/Curious: Instead of writing twice, can you just write once and tell netty that this is the last chunk. Also assuming you cant do that, do you need to writeAndFlush here or can you just write knowing that you are immediately writeAndFlush'ing the LastHttpConent chunk after?

it needs call writeAndFlush twice, kind of similar as grpc (onComplete).

lxning added 2 commits April 7, 2023 11:48

set chunked repsonse

2c9e75f

precommit fmt

2f3e46b

lxning added the enhancement New feature or request label Apr 16, 2023

lxning added this to the v0.8.0 milestone Apr 16, 2023

lxning self-assigned this Apr 16, 2023

lxning and others added 2 commits April 17, 2023 10:18

fmt

916aac6

Merge branch 'master' into feat/chunked

79b9160

lxning requested review from HamidShojanazeri and msaroufim April 17, 2023 17:26

lxning added 2 commits April 17, 2023 14:54

fix doc typos

e365266

Merge branch 'feat/chunked' of github.com:pytorch/serve into feat/chu…

92c4892

…nked

msaroufim reviewed Apr 18, 2023

View reviewed changes

msaroufim self-requested a review April 19, 2023 04:54

msaroufim approved these changes Apr 19, 2023

View reviewed changes

sanjams2 reviewed Apr 20, 2023

View reviewed changes

lxning and others added 2 commits April 21, 2023 00:24

Add comments

cf126b4

Merge branch 'master' into feat/chunked

6e4025b

sanjams2 approved these changes Apr 21, 2023

View reviewed changes

lxning and others added 4 commits April 21, 2023 10:41

fmt

a1d695b

Merge branch 'master' into feat/chunked

6b9706e

fmt

2b86563

Merge branch 'master' into feat/chunked

fa037be

HamidShojanazeri approved these changes Apr 21, 2023

View reviewed changes

Merge branch 'master' into feat/chunked

48867c6

lxning merged commit 419edb6 into master Apr 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

http stream response via http 1.1 chunked encoding #2233

http stream response via http 1.1 chunked encoding #2233

lxning commented Apr 16, 2023 •

edited

Loading

codecov bot commented Apr 17, 2023 •

edited

Loading

msaroufim Apr 18, 2023

lxning Apr 18, 2023

msaroufim Apr 19, 2023

msaroufim Apr 18, 2023

lxning Apr 18, 2023

sanjams2 left a comment

sanjams2 Apr 20, 2023

lxning Apr 21, 2023 •

edited

Loading

sanjams2 Apr 20, 2023

lxning Apr 21, 2023

sanjams2 Apr 20, 2023

lxning Apr 21, 2023

		ctx.writeAndFlush(new DefaultHttpContent(Unpooled.wrappedBuffer(body)));
		ctx.writeAndFlush(LastHttpContent.EMPTY_LAST_CONTENT);

http stream response via http 1.1 chunked encoding #2233

http stream response via http 1.1 chunked encoding #2233

Conversation

lxning commented Apr 16, 2023 • edited Loading

Description

Type of change

Feature/Issue validation/testing

Checklist:

codecov bot commented Apr 17, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sanjams2 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lxning Apr 21, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lxning commented Apr 16, 2023 •

edited

Loading

codecov bot commented Apr 17, 2023 •

edited

Loading

lxning Apr 21, 2023 •

edited

Loading