Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for binary data extension protocol and FP16 datatype #3685

Merged
merged 8 commits into from
Aug 24, 2024

Conversation

sivanantha321
Copy link
Member

@sivanantha321 sivanantha321 commented May 13, 2024

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #3643

https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_binary_data.md

Type of changes
Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Feature/Issue validation/testing:

Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

  • Test A

  • Test B

  • Logs

Special notes for your reviewer:

  1. Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Checklist:

  • Have you added unit/e2e tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

Release note:


Re-running failed tests

  • /rerun-all - rerun all failed workflows.
  • /rerun-workflow <workflow name> - rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.

Copy link

oss-prow-bot bot commented Jun 10, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sivanantha321
Once this PR has been reviewed and has the lgtm label, please assign yuzisun for approval by writing /assign @yuzisun in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sivanantha321 sivanantha321 force-pushed the support-fp16-datatype branch 2 times, most recently from 1035075 to 615e6b3 Compare June 10, 2024 20:06
@sivanantha321
Copy link
Member Author

/rerun-all

@sivanantha321 sivanantha321 force-pushed the support-fp16-datatype branch 2 times, most recently from 252e322 to d14808d Compare June 10, 2024 21:38
@sivanantha321
Copy link
Member Author

/rerun-all

@sivanantha321 sivanantha321 force-pushed the support-fp16-datatype branch 2 times, most recently from 03bef97 to 92a7b02 Compare June 12, 2024 08:45
@sivanantha321
Copy link
Member Author

/rerun-all

1 similar comment
@sivanantha321
Copy link
Member Author

/rerun-all

@sivanantha321 sivanantha321 marked this pull request as ready for review June 12, 2024 11:42
python/kserve/kserve/model.py Outdated Show resolved Hide resolved
Comment on lines 239 to +241
if isinstance(body, InferRequest):
return body, attributes
elif isinstance(body, InferenceRequest) or (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember InferenceRequest was always converted to InferRequest before hitting here

@@ -174,7 +173,7 @@ def create_application(self) -> FastAPI:
r"/v2/models/{model_name}/infer",
v2_endpoints.infer,
methods=["POST"],
response_model=InferenceResponse,
response_model=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside of this is that we lose the validation when returning back the InferenceResponse

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The response will be validated here

@@ -221,6 +223,32 @@ def test_sklearn_v2():
res = predict(service_name, "./data/iris_input_v2.json", protocol_version="v2")
assert res["outputs"][0]["data"] == [1, 1]

raw_res = predict(
service_name,
"./data/iris_input_v2_binary.json",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we test sending inputs with binary data for REST ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in custom transformer test and also included in python tests

@yuzisun
Copy link
Member

yuzisun commented Jul 1, 2024

@sivanantha321 Need a rebase after merging the inference client PR

@sivanantha321 sivanantha321 force-pushed the support-fp16-datatype branch 3 times, most recently from 494d70f to d00e06c Compare July 8, 2024 08:42
@sivanantha321
Copy link
Member Author

/rerun-all

3 similar comments
@sivanantha321
Copy link
Member Author

/rerun-all

@sivanantha321
Copy link
Member Author

/rerun-all

@sivanantha321
Copy link
Member Author

/rerun-all

@sivanantha321
Copy link
Member Author

/rerun-all

@sivanantha321
Copy link
Member Author

/rerun-all

2 similar comments
@sivanantha321
Copy link
Member Author

/rerun-all

@sivanantha321
Copy link
Member Author

/rerun-all

@sivanantha321 sivanantha321 force-pushed the support-fp16-datatype branch 2 times, most recently from b353bb6 to 382ba9e Compare August 6, 2024 13:29
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
@sivanantha321
Copy link
Member Author

/rerun-all

test/e2e/common/utils.py Outdated Show resolved Hide resolved
test/e2e/common/utils.py Outdated Show resolved Hide resolved
Comment on lines +296 to +297
# Fixme: Gets only the 1st element of the input
# inputs = get_predict_input(request)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Fixme: Gets only the 1st element of the input
# inputs = get_predict_input(request)

Comment on lines +329 to +330
# Fixme: Gets only the 1st element of the input
# inputs = get_predict_input(request)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Fixme: Gets only the 1st element of the input
# inputs = get_predict_input(request)

Signed-off-by: Dan Sun <dsun20@bloomberg.net>
Copy link
Member

@yuzisun yuzisun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@yuzisun yuzisun merged commit 69cdca5 into kserve:master Aug 24, 2024
57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inference gRPC/Rest client to support FP16
2 participants