Fixes #205 - Docker install not finding GPUs #235

dhanainme · 2020-04-21T05:08:53Z

Fixes #205 - Docker install not finding GPUs

Invokes docker with --runtime=nvidia for GPU
Added the option to specify the GPU / specific GPU ids in start.sh
Fixed the documentation for start.sh script
Fixed the JDK version in Dockerfile.gpu

Tests :

Testing : This was tested on a DL AMI 27, Ubuntu 18 / p3.8xlarge (which has 4 CUDA-compatible GPUs). I built the Docker image with the --gpu directive.

Terminal Output for all 3 cases :

ubuntu@ip-172-31-32-255:~/serve$ ./start.sh
Starting pytorch/torchserve:latest-gpu docker image
Successfully started torchserve in docker
Registering resnet-18 model
.....
ubuntu@ip-172-31-32-255:~/serve$ docker exec -it 4a162a13f695 head logs/ts_log.log
....
Temp directory: /home/model-server/tmp
Number of GPUs: 0
Number of CPUs: 32

ubuntu@ip-172-31-32-255:~/serve$ ./start.sh -g
Starting pytorch/torchserve:latest-gpu docker image
Successfully started torchserve in docker
Registering resnet-18 model
....
ubuntu@ip-172-31-32-255:~/serve$ docker exec -it 281fc5744248 head logs/ts_log.log
....
Temp directory: /home/model-server/tmp
Number of GPUs: 4
Number of CPUs: 32

ubuntu@ip-172-31-32-255:~/serve$ ./start.sh --gpu --gpu_devices 1,2,3
Starting pytorch/torchserve:latest-gpu docker image
Successfully started torchserve in docker
Registering resnet-18 model
successfully registered resnet-18 model with torchserve
...

ubuntu@ip-172-31-32-255:~/serve$ docker exec -it e05ff19681f9 head logs/ts_log.log
...
Number of GPUs: 3
Number of CPUs: 32
...

Invokes docker with --runtime=nvidia for GPU Added the option to specify the GPU / specific GPU ids in start.sh Fixed the documentation for start.sh script Fixed the JDK version in Dockerfile.gpu

dhanainme · 2020-04-21T05:10:28Z

Fixed Docker tags

sagemaker-neo-ci-bot · 2020-04-21T05:16:30Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-cpu
Commit ID: d4f79d9
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-neo-ci-bot · 2020-04-21T05:20:30Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-gpu
Commit ID: d4f79d9
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-neo-ci-bot · 2020-04-21T17:55:10Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-cpu
Commit ID: c6fe3db
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-neo-ci-bot · 2020-04-21T17:59:47Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-gpu
Commit ID: c6fe3db
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

chauhang · 2020-04-22T22:34:20Z

start.sh

+          exit 0
+          ;;
+        -g|--gpu)
+          DOCKER_RUNTIME="--runtime=nvidia"


@dhanainme The new way of running docker with gpus is with --gpus flag. Please change to that as described in #191

Was trying to use this originally, But the new format uses a very weird quoting for multiple GPUs which does not play well with bash. Would prefer leaving it this way.

docker run --rm -it --gpus '"device=0,1,2"' -p 8080:8080 -p 8081:8081 torchserve:v0.1-gpu-latest

chauhang

Please see comments inline

fbbradheintz

Good news first: Testing on multi-GPU machines, the updated scripts seem functionally correct.

There are two more items to fix, besides my inline comments:

serve/docker/README.md is missing any mention of the docker command-line flags for specifying GPU runtime and specific devices - this needs to be corrected.
serve/docker/README.md has no mention of the build_image.sh and start.sh scripts. Was this intentional? If not, please update the Docker README.

README.md

fbbradheintz · 2020-04-22T22:25:41Z

start.sh

+          IMAGE_NAME="pytorch/torchserve:latest-gpu"
+          shift
+          ;;
+	-d|--gpu_devices)


Shouldn't -d imply -g? Right now, it's possible to do something like:

./start.sh -d <devices>

... with the net effect that GPU devices are specified, but not the Nvidia runtime.

Have you tested all combinations of options?

Also: Should there be an example command line that shows what a device specifier looks like for this command line? Is it 0? cuda:0?

Shouldn't -d imply -g?

Yes. But I presumed this is understood implicitly. Have fixed this.

Have you tested all combinations of options?

Yes. I had tested CPU only, GPU w/o devices flag. GPU with devices flag (with different set of devices to confirm that it works)

Also: Should there be an example command line that shows what a device specifier looks like for this command line? Is it 0? cuda:0?

The usage is clearly specified in the README.md

serve/docker/README.md is missing any mention of the docker command-line flags for specifying GPU runtime and specific devices - this needs to be corrected.

Have added this.

serve/docker/README.md has no mention of the build_image.sh and start.sh scripts. Was this intentional? If not, please update the Docker README.

I think its intentional. The start.sh & build_image.sh are more off a quick start guide for folks trying to get things up & running quickly with out even using docker commands.

Infact build_image.sh is just a wrapper for docker build & start.sh is a wrapper for docker run both of which are explained with the right options in serve/docker/README.md

Also: Should there be an example command line that shows what a device ID for a GPU looks like.

sagemaker-neo-ci-bot · 2020-04-23T20:51:21Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-cpu
Commit ID: b1a8517
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-neo-ci-bot · 2020-04-23T20:55:30Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-gpu
Commit ID: b1a8517
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-neo-ci-bot · 2020-04-23T21:11:31Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-cpu
Commit ID: 21c761e
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-neo-ci-bot · 2020-04-23T21:15:12Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-gpu
Commit ID: 21c761e
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

* update tags in docker readme * update conda binary names and pip repo names

to say "latest" instead of hardcoded version

Updated links in pypi rst for model-archiver

Invokes docker with --runtime=nvidia for GPU Added the option to specify the GPU / specific GPU ids in start.sh Fixed the documentation for start.sh script Fixed the JDK version in Dockerfile.gpu

sagemaker-neo-ci-bot · 2020-04-23T22:24:56Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-cpu
Commit ID: a6a767f
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-neo-ci-bot · 2020-04-23T22:26:55Z

AWS CodeBuild CI Report

CodeBuild project: torch-serve-build-gpu
Commit ID: a6a767f
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

dhanainme · 2020-04-23T22:53:52Z

Too many merge conflicts / updates. Deleting this branch for sanity.

Tracked in #262. All the feedback from this PR have been fixed.

Fixes #205 - Docker install not finding GPUs

d4f79d9

Invokes docker with --runtime=nvidia for GPU Added the option to specify the GPU / specific GPU ids in start.sh Fixed the documentation for start.sh script Fixed the JDK version in Dockerfile.gpu

dhanainme requested review from chauhang and fbbradheintz April 21, 2020 05:09

Merge branch 'master' into issue_205

c6fe3db

chauhang reviewed Apr 22, 2020

View reviewed changes

chauhang suggested changes Apr 22, 2020

View reviewed changes

fbbradheintz suggested changes Apr 22, 2020

View reviewed changes

#205 - Fix review feedback. Documentation

b1a8517

#205 Fix Whitespace issues

21c761e

dhanainme requested review from fbbradheintz and chauhang April 23, 2020 21:11

Manoj Rao and others added 7 commits April 23, 2020 22:09

update instructions as per the latest public repos

0801cf6

* update tags in docker readme * update conda binary names and pip repo names

update macos instructions too

4858423

update docker tags while building

f45e36e

to say "latest" instead of hardcoded version

remove instructions for Mac OS since it's N/A

e25f48f

Update PyPiDescription.rst

41dbd33

Updated links in pypi rst for model-archiver

Update PyPiDescription.rst

3a21cb3

Fixes #205 - Docker install not finding GPUs

a6a767f

Invokes docker with --runtime=nvidia for GPU Added the option to specify the GPU / specific GPU ids in start.sh Fixed the documentation for start.sh script Fixed the JDK version in Dockerfile.gpu

dhanainme self-assigned this Apr 23, 2020

dhanainme closed this Apr 23, 2020

dhanainme deleted the issue_205 branch April 23, 2020 22:23

dhanainme mentioned this pull request Apr 23, 2020

Fixes #205 - Docker install not finding GPUs #262

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes #205 - Docker install not finding GPUs #235

Fixes #205 - Docker install not finding GPUs #235

dhanainme commented Apr 21, 2020 •

edited

Loading

dhanainme commented Apr 21, 2020

sagemaker-neo-ci-bot commented Apr 21, 2020

sagemaker-neo-ci-bot commented Apr 21, 2020

sagemaker-neo-ci-bot commented Apr 21, 2020

sagemaker-neo-ci-bot commented Apr 21, 2020

chauhang Apr 22, 2020

dhanainme Apr 23, 2020

chauhang left a comment

fbbradheintz left a comment

fbbradheintz Apr 22, 2020

fbbradheintz Apr 22, 2020

fbbradheintz Apr 23, 2020

dhanainme Apr 23, 2020

dhanainme Apr 23, 2020

fbbradheintz Apr 26, 2020

sagemaker-neo-ci-bot commented Apr 23, 2020

sagemaker-neo-ci-bot commented Apr 23, 2020

sagemaker-neo-ci-bot commented Apr 23, 2020

sagemaker-neo-ci-bot commented Apr 23, 2020

sagemaker-neo-ci-bot commented Apr 23, 2020

sagemaker-neo-ci-bot commented Apr 23, 2020

dhanainme commented Apr 23, 2020

Fixes #205 - Docker install not finding GPUs #235

Fixes #205 - Docker install not finding GPUs #235

Conversation

dhanainme commented Apr 21, 2020 • edited Loading

dhanainme commented Apr 21, 2020

sagemaker-neo-ci-bot commented Apr 21, 2020

AWS CodeBuild CI Report

sagemaker-neo-ci-bot commented Apr 21, 2020

AWS CodeBuild CI Report

sagemaker-neo-ci-bot commented Apr 21, 2020

AWS CodeBuild CI Report

sagemaker-neo-ci-bot commented Apr 21, 2020

AWS CodeBuild CI Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chauhang left a comment

Choose a reason for hiding this comment

fbbradheintz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sagemaker-neo-ci-bot commented Apr 23, 2020

AWS CodeBuild CI Report

sagemaker-neo-ci-bot commented Apr 23, 2020

AWS CodeBuild CI Report

sagemaker-neo-ci-bot commented Apr 23, 2020

AWS CodeBuild CI Report

sagemaker-neo-ci-bot commented Apr 23, 2020

AWS CodeBuild CI Report

sagemaker-neo-ci-bot commented Apr 23, 2020

AWS CodeBuild CI Report

sagemaker-neo-ci-bot commented Apr 23, 2020

AWS CodeBuild CI Report

dhanainme commented Apr 23, 2020

dhanainme commented Apr 21, 2020 •

edited

Loading