Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(glue): Timeout and workerType missing validation for Ray job types #29612

Closed
haljarrett opened this issue Mar 26, 2024 · 3 comments · Fixed by #32119
Closed

(glue): Timeout and workerType missing validation for Ray job types #29612

haljarrett opened this issue Mar 26, 2024 · 3 comments · Fixed by #32119
Labels
@aws-cdk/aws-glue Related to AWS Glue bug This issue is a bug. effort/medium Medium work item – several days of effort p3

Comments

@haljarrett
Copy link

Describe the bug

When developing Ray jobs via CDK, I bumped into the fact that specifying a timeout or not specifying a worker type on Job for Ray job types will succeed in synthesis, but fail deployment.

Expected Behavior

I'd expect an error to be thrown if workerType is not provided or timeout is provided, as both of these will cause deployment failures for Ray job types.

Current Behavior

Synthesis passes and failure occurs on deployment:

 UPDATE_FAILED [...] Timeout not supported for Ray jobs

or

CREATE_FAILED [...] Worker type cannot be null and only [Z.2X] worker types are supported for glueray jobs

Reproduction Steps

#!/usr/bin/env node
import * as cdk from 'aws-cdk-lib';
import { Code, GlueVersion, Job, JobExecutable, PythonVersion, Runtime, WorkerType } from '@aws-cdk/aws-glue-alpha';

const app = new cdk.App();
const stack = new cdk.Stack(app, 'glue-job-issue-stack');
const job = new Job(stack, 'MyJob', {
  jobName: 'glue-issue-test-job',
  executable: JobExecutable.pythonRay({
    glueVersion: GlueVersion.V4_0,
    pythonVersion: PythonVersion.THREE_NINE,
    runtime: Runtime.RAY_TWO_FOUR,
    script: Code.fromAsset('lib/test-job.py')
  }),
  workerType: WorkerType.G_025X,
  workerCount: 1,
  timeout: cdk.Duration.minutes(10),
});

Possible Solution

Additional validation in packages/@aws-cdk/aws-glue-alpha/lib/job.ts would address this and provide a slightly smoother developer experience when working with ray job types. I would be interested in contributing a fix if triage determines this is valid bug.

Additional Information/Context

No response

CDK CLI Version

2.133.0

Framework Version

2.133.0

Node.js Version

20.10.0

OS

macOS

Language

TypeScript

Language Version

5.3.3

Other information

No response

@haljarrett haljarrett added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Mar 26, 2024
@github-actions github-actions bot added the @aws-cdk/aws-glue Related to AWS Glue label Mar 26, 2024
@haljarrett haljarrett changed the title (aws-glue-alpha): Timeout and workerType missing validation for Ray job types (glue): Timeout and workerType missing validation for Ray job types Mar 26, 2024
@tim-finnigan tim-finnigan self-assigned this Mar 26, 2024
@tim-finnigan tim-finnigan added investigating This issue is being investigated and/or work is in progress to resolve the issue. and removed needs-triage This issue or PR still needs to be triaged. labels Mar 26, 2024
@tim-finnigan
Copy link

Thanks for creating this issue as well as the PR. I think adding validation as you described makes sense if the deployment is failing. Linking Ray job documentation for reference:

Did not see much on the topics in this issue, maybe more needs to be added on that.

@tim-finnigan tim-finnigan added p2 effort/medium Medium work item – several days of effort and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. labels Mar 26, 2024
@tim-finnigan tim-finnigan removed their assignment Mar 26, 2024
@haljarrett
Copy link
Author

Hey Tim, thanks for attaching the docs there - in the same vein, this is also not noted in:

The worker type constraint is mentioned, but for both, timeout is just listed as an optional parameter, no mention that it must be omitted in some circumstances. Having the only documentation be in the request failure message is not ideal, but not sure if it's something that even can be enforced at the L1 / Cfn level, so validating at the L2 construct level should hopefully avoid some frustration.

@pahud pahud added p3 and removed p2 labels Jun 11, 2024
mergify bot pushed a commit that referenced this issue Nov 20, 2024
### Issue # (if applicable)

Closes #29612.

### Reason for this change

AWS Glue Ray job has some restriction.
- must use Z.2X worker type
```sh
CREATE_FAILED [...] Worker type cannot be null and only [Z.2X] worker types are supported for glueray jobs
```
- must not specify timeout
```sh
UPDATE_FAILED [...] Timeout not supported for Ray jobs
```

### Description of changes

Add validation for above restriction.

```ts
    if (executable.type.name === JobType.RAY.name) {
      if (props.workerType !== WorkerType.Z_2X) {
        throw new Error(`WorkerType must be Z_2X for Ray jobs, got: ${props.workerType}`);
      }
      if (props.timeout !== undefined) {
        throw new Error('Timeout cannot be set for Ray jobs');
      }
    }
```

### Description of how you validated changes

Add unit test.

### Checklist
- [x] My code adheres to the [CONTRIBUTING GUIDE](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md) and [DESIGN GUIDELINES](https://github.com/aws/aws-cdk/blob/main/docs/DESIGN_GUIDELINES.md)

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@mergify mergify bot closed this as completed in #32119 Nov 20, 2024
Copy link

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 20, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
@aws-cdk/aws-glue Related to AWS Glue bug This issue is a bug. effort/medium Medium work item – several days of effort p3
Projects
None yet
3 participants