Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws-stepfunctions-tasks: SageMakerCreateTrainingJob does not correctly support empty inputDataConfig #31132

Closed
1 task
Assignees
Labels
@aws-cdk/aws-stepfunctions-tasks bug This issue is a bug. effort/medium Medium work item – several days of effort p1

Comments

@Stoggles
Copy link

Stoggles commented Aug 16, 2024

Describe the bug

inputDataconfig is incorrectly marked as a required field on SageMakerCreateTrainingJobProps. Forcing this value to undefined throws a Cannot read properties of undefined error.

An empty list is not accepted by SageMaker:

"1 validation errors detected: Value '[]' at 'inputDataConfig' failed to satisfy constraint: Member must have length greater than or equal to 1 (Service: AmazonSageMaker; Status Code: 400; Error Code: ValidationException; Request ID: 20b3ebdb-d89d-4dbe-882b-377d50058e93; Proxy: null)"

Relevant SageMakerCreateTrainingJob docs: https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html#sagemaker-CreateTrainingJob-request-InputDataConfig

Regression Issue

  • Select this option if this issue appears to be a regression.

Last Known Working CDK Version

No response

Expected Behavior

A valid SageMakerCreateTrainingJob step function task can be created when inputDataConfig is undefined.

Current Behavior

SageMakerCreateTrainingJob throws an error when inputDataConfig is undefined.

Argument of type '{ trainingJobName: string; algorithmSpecification: { trainingImage: aws_stepfunctions_tasks.DockerImage; trainingInputMode: aws_stepfunctions_tasks.InputMode.FILE; }; outputDataConfig: { ...; }; resourceConfig: { ...; }; stoppingCondition: { ...; }; hyperparameters: {}; }' is not assignable to parameter of type 'SageMakerCreateTrainingJobProps'.
  Property 'inputDataConfig' is missing in type '{ trainingJobName: string; algorithmSpecification: { trainingImage: aws_stepfunctions_tasks.DockerImage; trainingInputMode: aws_stepfunctions_tasks.InputMode.FILE; }; outputDataConfig: { ...; }; resourceConfig: { ...; }; stoppingCondition: { ...; }; hyperparameters: {}; }' but required in type 'SageMakerCreateTrainingJobProps'.ts(2345)

Reproduction Steps

import { Duration, Size } from 'aws-cdk-lib';
import { InstanceType } from 'aws-cdk-lib/aws-ec2';
import { JsonPath } from 'aws-cdk-lib/aws-stepfunctions';
import { DockerImage, InputMode, S3Location, SageMakerCreateTrainingJob } from 'aws-cdk-lib/aws-stepfunctions-tasks';
import { Construct } from 'constructs';

class Workflow extends Construct {
  constructor(scope: Construct, id: string) {
    super(scope, id);

    const sageMakerInvocation = new SageMakerCreateTrainingJob(this, 'Start SageMaker', {
      trainingJobName: JsonPath.stringAt('$.JobName'),
      algorithmSpecification: {
        trainingImage: DockerImage.fromJsonExpression('$.DockerImagePath'),
        trainingInputMode: InputMode.FILE,
      },
      // inputDataConfig: [],
      outputDataConfig: {
        s3OutputLocation: S3Location.fromJsonExpression('$.S3OutputPath'),
      },
      resourceConfig: {
        instanceCount: 1,
        instanceType: new InstanceType(JsonPath.stringAt('$.InstanceType')),
        volumeSize: Size.gibibytes(JsonPath.numberAt('$.InstanceVolumeSize')),
      },
      stoppingCondition: {
        maxRuntime: Duration.seconds(JsonPath.numberAt('$.MaximumJobDuration')),
      },
    });
  }
}

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.151.0 (build b8289e2)

Framework Version

No response

Node.js Version

Node.js v20.12.2

OS

macOS 14.6.1

Language

TypeScript

Language Version

5.5.4

Other information

No response

@Stoggles Stoggles added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Aug 16, 2024
@ashishdhingra
Copy link
Contributor

@Stoggles Good afternoon. Per my understanding, one needs to provide input source for training the model in Sagemaker. Although, Amazon Sagemaker > CreateTrainingJob API reference specifies InputDataConfig as not required, this could be a API reference documentation issue. Most of the examples I have searched around so far specify InputDataConfig.

For validation, are you able to create training job using any of the available AWS SDKs or AWS CLI, without providing the InputDataConfig?

Thanks,
Ashish

@ashishdhingra ashishdhingra added p2 response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed needs-triage This issue or PR still needs to be triaged. labels Aug 16, 2024
@ashishdhingra ashishdhingra self-assigned this Aug 16, 2024
@Stoggles
Copy link
Author

Stoggles commented Aug 16, 2024

I can confirm that SageMaker will accept and run jobs without InputDataConfig set.

The latest JavaScript SDK does not mark InputDataConfig as required.

The latest AWS CLI reference also does not mark InputdataConfig as required.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Aug 17, 2024
@ashishdhingra ashishdhingra added p1 effort/medium Medium work item – several days of effort and removed p2 labels Aug 19, 2024
@ashishdhingra ashishdhingra removed their assignment Aug 19, 2024
@ashishdhingra
Copy link
Contributor

ashishdhingra commented Aug 20, 2024

inputDataConfig is defined in L2 aws-stepfunctions-tasks construct.

@xazhao xazhao self-assigned this Aug 21, 2024
@mergify mergify bot closed this as completed in #31210 Sep 3, 2024
mergify bot pushed a commit that referenced this issue Sep 3, 2024
…ly support empty inputDataConfig (#31210)

### Issue # (if applicable)

Closes #31132.

### Reason for this change

`inputDataConfig` is not a required property in the API:
https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html#sagemaker-CreateTrainingJob-request-InputDataConfig

However in `SageMakerCreateTrainingJob`, it's marked as required. We should make it align with the API.

### Description of changes

Make the property optional.

### Description of how you validated changes

unit test and integration test

### Checklist
- [x] My code adheres to the [CONTRIBUTING GUIDE](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md) and [DESIGN GUIDELINES](https://github.com/aws/aws-cdk/blob/main/docs/DESIGN_GUIDELINES.md)

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Copy link

github-actions bot commented Sep 3, 2024

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 3, 2024
pahud pushed a commit to pahud/aws-cdk that referenced this issue Sep 9, 2024
…ly support empty inputDataConfig (aws#31210)

### Issue # (if applicable)

Closes aws#31132.

### Reason for this change

`inputDataConfig` is not a required property in the API:
https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html#sagemaker-CreateTrainingJob-request-InputDataConfig

However in `SageMakerCreateTrainingJob`, it's marked as required. We should make it align with the API.

### Description of changes

Make the property optional.

### Description of how you validated changes

unit test and integration test

### Checklist
- [x] My code adheres to the [CONTRIBUTING GUIDE](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md) and [DESIGN GUIDELINES](https://github.com/aws/aws-cdk/blob/main/docs/DESIGN_GUIDELINES.md)

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
xazhao added a commit to xazhao/aws-cdk that referenced this issue Sep 12, 2024
…ly support empty inputDataConfig (aws#31210)

### Issue # (if applicable)

Closes aws#31132.

### Reason for this change

`inputDataConfig` is not a required property in the API:
https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html#sagemaker-CreateTrainingJob-request-InputDataConfig

However in `SageMakerCreateTrainingJob`, it's marked as required. We should make it align with the API.

### Description of changes

Make the property optional.

### Description of how you validated changes

unit test and integration test

### Checklist
- [x] My code adheres to the [CONTRIBUTING GUIDE](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md) and [DESIGN GUIDELINES](https://github.com/aws/aws-cdk/blob/main/docs/DESIGN_GUIDELINES.md)

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
GavinZZ pushed a commit that referenced this issue Sep 12, 2024
…ly support empty inputDataConfig (#31210)

### Issue # (if applicable)

Closes #31132.

### Reason for this change

`inputDataConfig` is not a required property in the API:
https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html#sagemaker-CreateTrainingJob-request-InputDataConfig

However in `SageMakerCreateTrainingJob`, it's marked as required. We should make it align with the API.

### Description of changes

Make the property optional.

### Description of how you validated changes

unit test and integration test

### Checklist
- [x] My code adheres to the [CONTRIBUTING GUIDE](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md) and [DESIGN GUIDELINES](https://github.com/aws/aws-cdk/blob/main/docs/DESIGN_GUIDELINES.md)

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.