-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: support PyTorch 1.7.1 training, inference and data parallel #2185
Conversation
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
@@ -54,7 +54,8 @@ | |||
"1.3": "1.3.1", | |||
"1.4": "1.4.0", | |||
"1.5": "1.5.0", | |||
"1.6": "1.6.0" | |||
"1.6": "1.6.0", | |||
"1.7": "1.7.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this alias be to 1.7.1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed code with the fix for both inference and training version_aliases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Thank you so much!
I think we might have to update: https://github.com/aws/sagemaker-python-sdk/blob/master/tests/conftest.py#L172-L185
As it doesn't account for py36.
We can follow similar logic shown here: https://github.com/aws/sagemaker-python-sdk/blob/master/tests/conftest.py#L139-L156
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
As i can see in
https://github.com/aws/deep-learning-containers/blob/master/available_images.md
PyTorch 1.7.1 is both 3.6 (py36) for training and inference.
So if you can please explain more?
…On Wed, Mar 3, 2021 at 9:04 PM Dan ***@***.***> wrote:
***@***.**** commented on this pull request.
Awesome! Thank you so much!
I think we might have to update:
https://github.com/aws/sagemaker-python-sdk/blob/master/tests/conftest.py#L172-L185
As it doesn't account for py36.
We can follow similar logic shown here:
https://github.com/aws/sagemaker-python-sdk/blob/master/tests/conftest.py#L139-L156
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2185 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEICFBROMDET7YPLNZWXPMDTB2B4VANCNFSM4YRA5GYA>
.
|
Spoke offline. |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Integ test for SageMaker Data Parallel is failing, as it uses the latest_pytorch_version in the conftest, however there is a conditional that maps to supported versions. These should ideally be decoupled. For this PR I'll add in the changes introduced in: #2039 |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Integration test failing due to: pytorch/vision#1938 Test was updated based on: #986 |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Issue #, if available:
PyTorch 1.7.1 DLC for training and inference are availible but not supported with SageMaker.
ValueError: Unsupported pytorch version: 1.7.1 You may need to upgrade your SDK version (pip install -U sagemaker) for newer pytorch versions. Supported pytorch version(s): 0.4.0, 1.0.0, 1.1.0, 1.2.0, 1.3.1, 1.4.0, 1.5.0, 1.6.0, 0.4, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6.
Description of changes:
Added "1.7.1" version for training and inference sections of the json.
Testing done:
Training and inference on cifar10 dataset with 1.7.1, using this code.
Merge Checklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.General
Tests
unique_name_from_base
to create resource names in integ tests (if appropriate)By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.