Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Batch inferencing tests #1249

Closed
wants to merge 10 commits into from
Closed

[WIP] Batch inferencing tests #1249

wants to merge 10 commits into from

Conversation

msaroufim
Copy link
Member

@msaroufim msaroufim commented Sep 21, 2021

Before this PR can be merged - we need to make sure that batch inferencing works fine without any test breaks #1244

As of now this PR has a test for

  • Backed in Pytest - need to find a way to create a batched future request that only returns once.
  • Changes in test_utils to include batch inferencing

Next step is a

  • postman test for regressions that may only show up in long running tests
  • Frontend test in Java

Open question - how to error check responses

  • should I make an batched future request in an async way?
  • Only assign the last inference from the batch to the response? But no guarantee that last element of batch will be used (I'm using a batch delay of 10s but even that could be too small)
  • Append response to responses = [] and check if len(responses) == batch size

@msaroufim msaroufim requested a review from maaquib September 21, 2021 17:52
@msaroufim msaroufim changed the title Batch inferencing tests [WIP] Batch inferencing tests Sep 21, 2021
@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: 6d11d9f
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: 4444a37
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: 6d11d9f
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: 4444a37
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: cfa1868
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: 88e9190
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: ee972cc
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: ee972cc
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: 88e9190
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: 7bcd77b
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: 46d5b7e
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: 46d5b7e
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: 7bcd77b
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: cfa1868
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: 0693df8
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: 0693df8
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: 4444a37
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: 0693df8
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: f37869f
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: f37869f
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: 10d3ad7
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: 10d3ad7
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: f37869f
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: 10d3ad7
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Comment on lines +1073 to +1074
Channel channel = TestUtils.connect(ConnectorType.MANAGEMENT_CONNECTOR, configManager);
Assert.assertNotNull(channel);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be replaced by Channel channel = TestUtils.getManagementChannel(configManager);

@@ -157,13 +157,19 @@ public static void registerModel(
String url,
String modelName,
boolean withInitialWorkers,
boolean syncChannel)
boolean syncChannel,
int batchSize)
throws InterruptedException {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixing will break the old test cases.
Here, function registerModel needs be overloaded. In other words, the original function registerModel should be kept; the new function registerModel call the old one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok got it

TestUtils.setResult(null);
TestUtils.setLatch(new CountDownLatch(1));

TestUtils.registerModel(channel, "noop.mar", "err_success", true, false, batch_size=batch_size);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace it with TestUtils.registerModel(channel, "noop.mar", "err_success", true, false, batch_size)

Assert.assertEquals(
status.getStatus(),
"Model \"success_batch\" Version: 1.0 registered with 1 initial workers");

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this assertion misses modelName. And also where is the definition for the message "success_batch"?

Comment on lines +1098 to +1112
for(int i = 0; i < batch_size ; i ++) {
DefaultFullHttpRequest req =
new DefaultFullHttpRequest(
HttpVersion.HTTP_1_1, HttpMethod.POST, "/predictions/err_success");
// req.content().writeCharSequence("data=invalid_output", CharsetUtil.UTF_8);
HttpUtil.setContentLength(req, req.content().readableBytes());
req.headers()
.set(
HttpHeaderNames.CONTENT_TYPE,
HttpHeaderValues.APPLICATION_X_WWW_FORM_URLENCODED);
channel.writeAndFlush(req);

TestUtils.getLatch().await();
Assert.assertEquals(TestUtils.getHttpStatus(), HttpResponseStatus.ACCEPTED);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why these 4 inference requests can confirm batching is working?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah thinking about this some more it doesn't confirm anything really. In the case of the Python backend I can create a a batch size counter in the preprocess handler and pass it all the way to the post process and then validate there that my batch size was correct. Not sure how to do the same for frontend yet

@@ -92,7 +92,7 @@ def run_inference_using_url_with_data(purl=None, pfiles=None, ptimeout=120):
else:
return response

def run_inference_using_url_with_data_json(purl=None, json_input=None, ptimeout=120):
def run_inference_using_url_with_data_json(purl=None, pfiles=None, json_input=None, ptimeout=120):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "pfiles" is needed? Will this change break existing tests?

Comment on lines +1117 to 1120
@Test(
alwaysRun = true,
dependsOnMethods = {"testSuccessBatch"})
public void testErrorBatch() throws InterruptedException {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "testPredictionMemoryError" is replaced with "testSuccessBatch"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the dependencies here just determine the order in which tests are ran, it's not a real dependency. All tests in this file are dependent on each other without needing it. We could perhaps have a a separate PR where we remove unnecessary dependencies but I believe this may be a known limitation because can't run more than one torchserve instance at a time so can't run more than one torchserve test at a time

Comment on lines 1068 to 1070
@Test(
alwaysRun = true,
dependsOnMethods = {"testPredictionMemoryError"})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this test case depends on "testPredictionMemoryError"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're not dependent, I believe dependsOnMethods is used as a way to set the order in which tests are called

@@ -224,6 +224,27 @@ def test_kfserving_mnist_model_register_and_inference_on_valid_model_explain():

assert np.array(json.loads(response.content)['explanations']).shape == (1, 1, 28, 28)
test_utils.unregister_model("mnist")

def test_mnist_batch_inference():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this test case called? And also what's the difference b/s test_mnist_batch_inference vs testSuccessBatch?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Purely a frontend vs backend test - test is called if you run pytest here

Comment on lines +67 to +69
('batch_size', str(batch_size)),
('batch_delay', '10000')

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part should not be changed if the old registerModel (see following) is kept.
public static void registerModel(
Channel channel,
String url,
String modelName,
boolean withInitialWorkers,
boolean syncChannel)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad - was under the impression that you can add arguments to the end of a function with optional values to make sure no backwards compatbility is broken

@msaroufim msaroufim mentioned this pull request Oct 6, 2021
@msaroufim
Copy link
Member Author

Closing for now

@msaroufim msaroufim closed this Oct 13, 2021
@msaroufim msaroufim reopened this Oct 15, 2021
@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: 10d3ad7
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: 10d3ad7
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@msaroufim msaroufim closed this Oct 25, 2021
@msaroufim msaroufim deleted the batch-test branch June 16, 2022 01:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants