[WIP] Batch inferencing tests #1249

msaroufim · 2021-09-21T17:49:17Z

Before this PR can be merged - we need to make sure that batch inferencing works fine without any test breaks #1244

As of now this PR has a test for

Backed in Pytest - need to find a way to create a batched future request that only returns once.
Changes in test_utils to include batch inferencing

Next step is a

postman test for regressions that may only show up in long running tests
Frontend test in Java

Open question - how to error check responses

should I make an batched future request in an async way?
Only assign the last inference from the batch to the response? But no guarantee that last element of batch will be used (I'm using a batch delay of 10s but even that could be too small)
Append response to responses = [] and check if len(responses) == batch size

sagemaker-neo-ci-bot · 2021-09-21T18:24:34Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-cpu
Commit ID: 6d11d9f
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-neo-ci-bot · 2021-09-21T18:29:03Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-cpu
Commit ID: 4444a37
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-neo-ci-bot · 2021-09-21T18:30:37Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-win
Commit ID: 6d11d9f
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-neo-ci-bot · 2021-09-21T18:34:52Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-win
Commit ID: 4444a37
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-neo-ci-bot · 2021-09-21T18:37:01Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-cpu
Commit ID: cfa1868
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-neo-ci-bot · 2021-09-21T18:37:14Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-win
Commit ID: 88e9190
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-neo-ci-bot · 2021-09-21T18:37:37Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-cpu
Commit ID: ee972cc
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-neo-ci-bot · 2021-09-21T18:38:03Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-win
Commit ID: ee972cc
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-neo-ci-bot · 2021-09-21T18:40:31Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-cpu
Commit ID: 88e9190
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-neo-ci-bot · 2021-09-21T18:41:44Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-cpu
Commit ID: 7bcd77b
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-neo-ci-bot · 2021-09-21T18:43:07Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-cpu
Commit ID: 46d5b7e
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-neo-ci-bot · 2021-09-21T18:43:21Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-win
Commit ID: 46d5b7e
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-neo-ci-bot · 2021-09-21T18:45:20Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-win
Commit ID: 7bcd77b
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-neo-ci-bot · 2021-09-21T18:49:08Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-win
Commit ID: cfa1868
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-neo-ci-bot · 2021-09-21T19:31:33Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-cpu
Commit ID: 0693df8
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-neo-ci-bot · 2021-09-21T19:34:54Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-win
Commit ID: 0693df8
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-neo-ci-bot · 2021-09-21T23:50:59Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-gpu
Commit ID: 4444a37
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-neo-ci-bot · 2021-09-22T03:26:37Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-gpu
Commit ID: 0693df8
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-neo-ci-bot · 2021-09-24T19:54:54Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-cpu
Commit ID: f37869f
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-neo-ci-bot · 2021-09-24T19:58:13Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-win
Commit ID: f37869f
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-neo-ci-bot · 2021-09-24T20:01:52Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-cpu
Commit ID: 10d3ad7
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-neo-ci-bot · 2021-09-24T20:05:05Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-win
Commit ID: 10d3ad7
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-neo-ci-bot · 2021-09-25T00:29:40Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-gpu
Commit ID: f37869f
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-neo-ci-bot · 2021-09-25T00:31:49Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-gpu
Commit ID: 10d3ad7
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

lxning · 2021-09-27T17:34:35Z

frontend/server/src/test/java/org/pytorch/serve/ModelServerTest.java

+        Channel channel = TestUtils.connect(ConnectorType.MANAGEMENT_CONNECTOR, configManager);
+        Assert.assertNotNull(channel);


This can be replaced by Channel channel = TestUtils.getManagementChannel(configManager);

lxning · 2021-09-27T17:45:07Z

frontend/server/src/test/java/org/pytorch/serve/TestUtils.java

@@ -157,13 +157,19 @@ public static void registerModel(
            String url,
            String modelName,
            boolean withInitialWorkers,
-            boolean syncChannel)
+            boolean syncChannel,
+            int batchSize)
            throws InterruptedException {


This fixing will break the old test cases.
Here, function registerModel needs be overloaded. In other words, the original function registerModel should be kept; the new function registerModel call the old one.

lxning · 2021-09-27T17:46:30Z

frontend/server/src/test/java/org/pytorch/serve/ModelServerTest.java

+        TestUtils.setResult(null);
+        TestUtils.setLatch(new CountDownLatch(1));
+
+        TestUtils.registerModel(channel, "noop.mar", "err_success", true, false, batch_size=batch_size);


replace it with TestUtils.registerModel(channel, "noop.mar", "err_success", true, false, batch_size)

lxning · 2021-09-27T17:55:15Z

frontend/server/src/test/java/org/pytorch/serve/ModelServerTest.java

+        Assert.assertEquals(
+                status.getStatus(),
+                "Model \"success_batch\" Version: 1.0 registered with 1 initial workers");
+


It seems that this assertion misses modelName. And also where is the definition for the message "success_batch"?

lxning · 2021-09-27T18:15:53Z

frontend/server/src/test/java/org/pytorch/serve/ModelServerTest.java

+        for(int i = 0; i < batch_size ; i ++) {
+        DefaultFullHttpRequest req =
+            new DefaultFullHttpRequest(
+                   HttpVersion.HTTP_1_1, HttpMethod.POST, "/predictions/err_success");
+        // req.content().writeCharSequence("data=invalid_output", CharsetUtil.UTF_8);
+            HttpUtil.setContentLength(req, req.content().readableBytes());
+            req.headers()
+                .set(
+                        HttpHeaderNames.CONTENT_TYPE,
+                        HttpHeaderValues.APPLICATION_X_WWW_FORM_URLENCODED);
+        channel.writeAndFlush(req);
+
+        TestUtils.getLatch().await();
+        Assert.assertEquals(TestUtils.getHttpStatus(), HttpResponseStatus.ACCEPTED);
+        }


Why these 4 inference requests can confirm batching is working?

Yeah thinking about this some more it doesn't confirm anything really. In the case of the Python backend I can create a a batch size counter in the preprocess handler and pass it all the way to the post process and then validate there that my batch size was correct. Not sure how to do the same for frontend yet

lxning · 2021-09-27T18:18:18Z

test/pytest/test_handler.py

@@ -92,7 +92,7 @@ def run_inference_using_url_with_data(purl=None, pfiles=None, ptimeout=120):
    else:
        return response

-def run_inference_using_url_with_data_json(purl=None, json_input=None, ptimeout=120):
+def run_inference_using_url_with_data_json(purl=None, pfiles=None, json_input=None, ptimeout=120):


Why "pfiles" is needed? Will this change break existing tests?

lxning · 2021-09-27T18:21:24Z

frontend/server/src/test/java/org/pytorch/serve/ModelServerTest.java

+    @Test(
+            alwaysRun = true,
+            dependsOnMethods = {"testSuccessBatch"})
    public void testErrorBatch() throws InterruptedException {


Why "testPredictionMemoryError" is replaced with "testSuccessBatch"?

So the dependencies here just determine the order in which tests are ran, it's not a real dependency. All tests in this file are dependent on each other without needing it. We could perhaps have a a separate PR where we remove unnecessary dependencies but I believe this may be a known limitation because can't run more than one torchserve instance at a time so can't run more than one torchserve test at a time

lxning · 2021-09-27T18:26:30Z

frontend/server/src/test/java/org/pytorch/serve/ModelServerTest.java

    @Test(
            alwaysRun = true,
            dependsOnMethods = {"testPredictionMemoryError"})


Why this test case depends on "testPredictionMemoryError"?

They're not dependent, I believe dependsOnMethods is used as a way to set the order in which tests are called

lxning · 2021-09-27T18:29:08Z

test/pytest/test_handler.py

@@ -224,6 +224,27 @@ def test_kfserving_mnist_model_register_and_inference_on_valid_model_explain():

    assert np.array(json.loads(response.content)['explanations']).shape == (1, 1, 28, 28)
    test_utils.unregister_model("mnist")
+
+def test_mnist_batch_inference():


where is this test case called? And also what's the difference b/s test_mnist_batch_inference vs testSuccessBatch?

Purely a frontend vs backend test - test is called if you run pytest here

lxning · 2021-09-27T18:32:22Z

test/pytest/test_utils.py

+        ('batch_size', str(batch_size)),
+        ('batch_delay', '10000')
+


This part should not be changed if the old registerModel (see following) is kept.
public static void registerModel(
Channel channel,
String url,
String modelName,
boolean withInitialWorkers,
boolean syncChannel)

my bad - was under the impression that you can add arguments to the end of a function with optional values to make sure no backwards compatbility is broken

msaroufim · 2021-10-13T19:10:10Z

Closing for now

sagemaker-neo-ci-bot · 2021-10-15T19:38:55Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-gpu
Commit ID: 10d3ad7
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-neo-ci-bot · 2021-10-15T19:41:15Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-win
Commit ID: 10d3ad7
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

pytest inclusion

6d11d9f

msaroufim requested a review from maaquib September 21, 2021 17:52

msaroufim changed the title ~~Batch inferencing tests~~ [WIP] Batch inferencing tests Sep 21, 2021

msaroufim requested review from HamidShojanazeri and lxning September 21, 2021 17:52

Ubuntu added 4 commits September 21, 2021 17:53

updated batch delay to 10s

4444a37

fixed model registration

88e9190

fixed loop handler

cfa1868

added java test

7bcd77b

added java test and changed utils to use batches

ee972cc

fixed java testg

46d5b7e

Merge branch 'master' into batch-test

0693df8

fixed batch size count

f37869f

test handler update

10d3ad7

lxning reviewed Sep 27, 2021

View reviewed changes

msaroufim mentioned this pull request Oct 6, 2021

Bert base batch test #1272

Merged

msaroufim closed this Oct 13, 2021

msaroufim reopened this Oct 15, 2021

msaroufim closed this Oct 25, 2021

msaroufim deleted the batch-test branch June 16, 2022 01:39

		Channel channel = TestUtils.connect(ConnectorType.MANAGEMENT_CONNECTOR, configManager);
		Assert.assertNotNull(channel);

[WIP] Batch inferencing tests #1249

[WIP] Batch inferencing tests #1249

Conversation

msaroufim commented Sep 21, 2021 • edited Loading

sagemaker-neo-ci-bot commented Sep 21, 2021

AWS CodeBuild CI Report

sagemaker-neo-ci-bot commented Sep 21, 2021

AWS CodeBuild CI Report

sagemaker-neo-ci-bot commented Sep 21, 2021

AWS CodeBuild CI Report

sagemaker-neo-ci-bot commented Sep 21, 2021

AWS CodeBuild CI Report

sagemaker-neo-ci-bot commented Sep 21, 2021

AWS CodeBuild CI Report

sagemaker-neo-ci-bot commented Sep 21, 2021

AWS CodeBuild CI Report

sagemaker-neo-ci-bot commented Sep 21, 2021

AWS CodeBuild CI Report

sagemaker-neo-ci-bot commented Sep 21, 2021

AWS CodeBuild CI Report

sagemaker-neo-ci-bot commented Sep 21, 2021

AWS CodeBuild CI Report

sagemaker-neo-ci-bot commented Sep 21, 2021

AWS CodeBuild CI Report

sagemaker-neo-ci-bot commented Sep 21, 2021

AWS CodeBuild CI Report

sagemaker-neo-ci-bot commented Sep 21, 2021

AWS CodeBuild CI Report

sagemaker-neo-ci-bot commented Sep 21, 2021

AWS CodeBuild CI Report

sagemaker-neo-ci-bot commented Sep 21, 2021

AWS CodeBuild CI Report

sagemaker-neo-ci-bot commented Sep 21, 2021

AWS CodeBuild CI Report

sagemaker-neo-ci-bot commented Sep 21, 2021

AWS CodeBuild CI Report

sagemaker-neo-ci-bot commented Sep 21, 2021

AWS CodeBuild CI Report

sagemaker-neo-ci-bot commented Sep 22, 2021

AWS CodeBuild CI Report

sagemaker-neo-ci-bot commented Sep 24, 2021

AWS CodeBuild CI Report

sagemaker-neo-ci-bot commented Sep 24, 2021

AWS CodeBuild CI Report

sagemaker-neo-ci-bot commented Sep 24, 2021

AWS CodeBuild CI Report

sagemaker-neo-ci-bot commented Sep 24, 2021

AWS CodeBuild CI Report

sagemaker-neo-ci-bot commented Sep 25, 2021

AWS CodeBuild CI Report

sagemaker-neo-ci-bot commented Sep 25, 2021

AWS CodeBuild CI Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

msaroufim commented Oct 13, 2021

sagemaker-neo-ci-bot commented Oct 15, 2021

AWS CodeBuild CI Report

sagemaker-neo-ci-bot commented Oct 15, 2021

AWS CodeBuild CI Report

msaroufim commented Sep 21, 2021 •

edited

Loading