-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug Report] - SageMaker Pipelines to Run Jobs Locally #3635
Comments
Thanks for running the sample. This reference is supposed to locate the model file on S3: 'source=step_train.properties.ModelArtifacts.S3ModelArtifacts' Can you check the Training Job stats and check if the "model.tar.gz" file is in fact stored in S3 at the prescribed location? |
Is it possible to know the value of that reference? Based on this reference, about the properties attribute, it should point to the S3 location right? In what place can I check the Training Job stats? The last logging info line in there is the following: The location of that file is the following: I'm not sure where the last part comes from? Does the reference |
@fjpa121197 - I just ran your notebook and it worked for me with no issues. Are you consistently seeing this? |
Really? I mean, I haven't been able to run the notebook successfully in like 10 tries that I did yesterday. Is there another way to make the same reference and be able to proceed with the evaluation? |
This helped me, maybe it helps you: docker/docker-py#3099 I can run everything fine now, but before I was unable to get the sagemaker pipeline to work locally, even though I was able to build/run the individual containers in my sagemaker pipeline. I'm not 100% sure how this works but it seems like the sagemaker package that I used was still on an older version of Docker that did not have this fix (see link above).
|
@Rainymood Thanks for the update. I wonder if myself and Kirit did not run into this problem since we are running samples on native MacOS. The docker fix you reference indicates a fix for Windows sockets. |
Link to the notebook
Following example from: https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-pipelines/tabular/local-mode/sagemaker-pipelines-local-mode.ipynb
My notebook with error (i made some modifications to it, minor ones): https://github.com/fjpa121197/aws-sagemaker-training/blob/main/sagemaker-pipelines-local-mode.ipynb
Describe the bug
I'm trying to follow this tutorial to run Sagemaker Pipelines locally and test them before using managed resources. I have created a pipeline definition that includes preprocessing, training and evaluation. I'm able to create the pipeline without any problem, but when executing the pipeline, I encountered error in the evaluation step. It is related to not being able to download the model.tar.gz file to the container and to the correct directory to use the model for evaluation.
Error:
I understand that the evaluation step definition is as follows:
And my
eval_args
definition is as follows:where
source
for the first input refers to the step_train defined before and it should download the model artifacts, but it is not doing it. For the other defined input, it does download the test data to use, but not the model artificats.Not sure if there is a replacement for:
source=step_train.properties.ModelArtifacts.S3ModelArtifacts
argument.Am I doing something wrong? I don't think it is permission/policies related since it doesn't give any AccessDenied errors.
Im using sagemaker 2.113.0
Thanks in advance
The text was updated successfully, but these errors were encountered: