Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http stream response via http 1.1 chunked encoding #2233

Merged
merged 13 commits into from
Apr 21, 2023
Merged

Conversation

lxning
Copy link
Collaborator

@lxning lxning commented Apr 16, 2023

Description

Please read our CONTRIBUTING.md prior to creating your first pull request.

Please include a summary of the feature or issue being fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Support http streaming inference response via http 1.1 chunked encoding

Fixes #(issue)
#2232

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

Checklist:

  • Did you have fun?
  • Have you added tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

@lxning lxning added the enhancement New feature or request label Apr 16, 2023
@lxning lxning added this to the v0.8.0 milestone Apr 16, 2023
@lxning lxning self-assigned this Apr 16, 2023
@codecov
Copy link

codecov bot commented Apr 17, 2023

Codecov Report

Merging #2233 (4c32794) into master (e2b4511) will not change coverage.
The diff coverage is 0.00%.

❗ Current head 4c32794 differs from pull request most recent head 48867c6. Consider uploading reports for the commit 48867c6 to get more accurate results

@@           Coverage Diff           @@
##           master    #2233   +/-   ##
=======================================
  Coverage   70.40%   70.40%           
=======================================
  Files          75       75           
  Lines        3392     3392           
  Branches       57       57           
=======================================
  Hits         2388     2388           
  Misses       1001     1001           
  Partials        3        3           
Impacted Files Coverage Δ
ts/protocol/otf_message_handler.py 72.58% <0.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Client side receives the chunked data.
```
def test_echo_stream_inference():
test_utils.start_torchserve(no_config_snapshots=True, gen_mar=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just promote these test_utils functions to the core library they're very useful

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which core library? who is the user of the core library?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the ts namespace

if (job.getCmd() == WorkerCommands.STREAMPREDICT
&& prediction.getHeaders().get(TS_STREAM_NEXT).equals("true")) {
jobDone = false;
if (jobDone) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do all the java side changes need unit tests?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a tech debt in frontend which is lack of unit tests, only includes integration test . This PR uses regression to test the e2e.

@msaroufim msaroufim self-requested a review April 19, 2023 04:54
Copy link

@sanjams2 sanjams2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly minor comments except one about the logging. I would also like to see the enum used to denote the current state of a request instead of the int (especially with the current choice for which int denotes which state), but this is more of an opinion so therefore not blocking.

@@ -38,6 +45,7 @@ public class RestJob extends Job {

private ChannelHandlerContext ctx;
private CompletableFuture<byte[]> responsePromise;
private int numStreams; // -1: stream end; 0+: number of streams received; 0: non stream

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit hard to follow and slightly counterintuitive in my opinion. Why wouldnt non-streamable = -1, end of stream = 0? Now we have a disjoint set of integers like so:

streaming no streaming streaming
(-infinity, -1] [0,0] [1, infinity]

which feels hard to keep track of mentally compared to something where the set of streaming states is not split in two by the non-streaming state.

I think this could all be alleviated with the use of an enum. You would still need an integer to track # of parts received, but it would be a bit more clear to read IMO.


Update: looking further in the code, it doesnt appear you even use the number of parts other than to check if it's > 1 (i.e you arent using it as a metric). I think this makes the argument for an enum stronger

Copy link
Collaborator Author

@lxning lxning Apr 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the following 4 cases need take different actions.
-1: stream end.
0: non-stream (default).
1: 1st stream.
[2, integer.max): 2nd+ stream.

if ts_stream_next is True:
context.set_response_header(idx, "ts_stream_next", "true")
else:
if "true" == context.get_response_headers(idx).get("ts_stream_next"):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious: why do we need to check this? When would this be false?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are 2 scenarios when ts_stream_next is false.

  1. non-stream response.
  2. last stream response. Line 85 is to handle this case.

context.get_response_headers(idx).get("ts_stream_next") is a string, not bool. it's value can be:
"true": has next stream
"false": last stream
"": non-stream response.

Comment on lines +164 to +165
ctx.writeAndFlush(new DefaultHttpContent(Unpooled.wrappedBuffer(body)));
ctx.writeAndFlush(LastHttpContent.EMPTY_LAST_CONTENT);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor/Curious: Instead of writing twice, can you just write once and tell netty that this is the last chunk. Also assuming you cant do that, do you need to writeAndFlush here or can you just write knowing that you are immediately writeAndFlush'ing the LastHttpConent chunk after?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it needs call writeAndFlush twice, kind of similar as grpc (onComplete).

@lxning lxning merged commit 419edb6 into master Apr 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants