-
Notifications
You must be signed in to change notification settings - Fork 871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #205 - Docker install not finding GPUs #235
Conversation
Invokes docker with --runtime=nvidia for GPU Added the option to specify the GPU / specific GPU ids in start.sh Fixed the documentation for start.sh script Fixed the JDK version in Dockerfile.gpu
Fixed Docker tags |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
exit 0 | ||
;; | ||
-g|--gpu) | ||
DOCKER_RUNTIME="--runtime=nvidia" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dhanainme The new way of running docker with gpus is with --gpus flag. Please change to that as described in #191
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was trying to use this originally, But the new format uses a very weird quoting for multiple GPUs which does not play well with bash. Would prefer leaving it this way.
docker run --rm -it --gpus '"device=0,1,2"' -p 8080:8080 -p 8081:8081 torchserve:v0.1-gpu-latest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see comments inline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good news first: Testing on multi-GPU machines, the updated scripts seem functionally correct.
There are two more items to fix, besides my inline comments:
serve/docker/README.md
is missing any mention of thedocker
command-line flags for specifying GPU runtime and specific devices - this needs to be corrected.serve/docker/README.md
has no mention of thebuild_image.sh
andstart.sh
scripts. Was this intentional? If not, please update the Docker README.
IMAGE_NAME="pytorch/torchserve:latest-gpu" | ||
shift | ||
;; | ||
-d|--gpu_devices) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't -d
imply -g
? Right now, it's possible to do something like:
./start.sh -d <devices>
... with the net effect that GPU devices are specified, but not the Nvidia runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested all combinations of options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also: Should there be an example command line that shows what a device specifier looks like for this command line? Is it 0
? cuda:0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't -d imply -g?
Yes. But I presumed this is understood implicitly. Have fixed this.
Have you tested all combinations of options?
Yes. I had tested CPU only, GPU w/o devices flag. GPU with devices flag (with different set of devices to confirm that it works)
Also: Should there be an example command line that shows what a device specifier looks like for this command line? Is it 0? cuda:0?
The usage is clearly specified in the README.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
serve/docker/README.md is missing any mention of the docker command-line flags for specifying GPU runtime and specific devices - this needs to be corrected.
Have added this.
serve/docker/README.md has no mention of the build_image.sh and start.sh scripts. Was this intentional? If not, please update the Docker README.
I think its intentional. The start.sh & build_image.sh are more off a quick start guide for folks trying to get things up & running quickly with out even using docker commands.
Infact build_image.sh is just a wrapper for docker build & start.sh is a wrapper for docker run both of which are explained with the right options in serve/docker/README.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also: Should there be an example command line that shows what a device ID for a GPU looks like.
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
* update tags in docker readme * update conda binary names and pip repo names
to say "latest" instead of hardcoded version
Updated links in pypi rst for model-archiver
Invokes docker with --runtime=nvidia for GPU Added the option to specify the GPU / specific GPU ids in start.sh Fixed the documentation for start.sh script Fixed the JDK version in Dockerfile.gpu
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Too many merge conflicts / updates. Deleting this branch for sanity. Tracked in #262. All the feedback from this PR have been fixed. |
Fixes #205 - Docker install not finding GPUs
Tests :
Testing : This was tested on a DL AMI 27, Ubuntu 18 / p3.8xlarge (which has 4 CUDA-compatible GPUs). I built the Docker image with the --gpu directive.
Terminal Output for all 3 cases :