Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws-cdk/aws-batch: Skip use of optimal instance type with graviton instances #31148

Closed
1 task
jasonforte opened this issue Aug 19, 2024 · 4 comments · Fixed by #31510 or softwaremill/tapir#4137 · May be fixed by NOUIY/aws-solutions-constructs#135 or NOUIY/aws-solutions-constructs#136
Assignees
Labels
@aws-cdk/aws-batch Related to AWS Batch bug This issue is a bug. effort/small Small work item – less than a day of effort p2

Comments

@jasonforte
Copy link

Describe the bug

I'm attempting to create an EC2 ECS compute environment using the AWS Batch constructs that makes use of only Graviton instances. When I deploy the stack I get the following error:

Error executing request, Exception : arm-based instance type cannot be used with other instance types.

Regression Issue

  • Select this option if this issue appears to be a regression.

Last Known Working CDK Version

No response

Expected Behavior

When encountering a know arm-based instance class, the construct should not append optimal as it's not supported for arm based instances

Current Behavior

When deploying a EC2 ECS Compute Environment, the error below is thrown.

14:57:17 | CREATE_FAILED        | AWS::Batch::ComputeEnvironment | MyECSComputeEnvironment6A03089C
Resource handler returned message: "Error executing request, Exception : arm-based instance type cannot be used with other instance types., RequestId: 167a0e5
f-0376-4ad4-a0b6-34a85a940cb7 (Service: Batch, Status Code: 400, Request ID: 167a0e5f-0376-4ad4-a0b6-34a85a940cb7)" (RequestToken: d266edf5-f5cd-d61f-866f-5e5
ecefa478c, HandlerErrorCode: InvalidRequest)

Reproduction Steps

I'm declaring my compute environment as follows:

const computeEnvironment = new ManagedEc2EcsComputeEnvironment(this, 'MyECSComputeEnvironment', {
    vpc,
    minvCpus: 0,
    maxvCpus: 8,
    instanceClasses: [InstanceClass.M6G],
})

When deploying there is an error:

14:57:17 | CREATE_FAILED        | AWS::Batch::ComputeEnvironment | MyECSComputeEnvironment6A03089C
Resource handler returned message: "Error executing request, Exception : arm-based instance type cannot be used with other instance types., RequestId: 167a0e5
f-0376-4ad4-a0b6-34a85a940cb7 (Service: Batch, Status Code: 400, Request ID: 167a0e5f-0376-4ad4-a0b6-34a85a940cb7)" (RequestToken: d266edf5-f5cd-d61f-866f-5e5
ecefa478c, HandlerErrorCode: InvalidRequest)

Possible Solution

When I inspect the synthesized CloudFormation Stack I can see that optimal has been appended to the InstanceTypes list:

image

Possible solution would be to fix logic in renderInstances to take into account the types of instances (Arm vs AMD).

Additional Information/Context

Workaround

I've found workaround by setting the useOptimalInstanceClasses to false but this is not expected behavior because I've explicitly set the types of instances I want.

const computeEnvironment = new ManagedEc2EcsComputeEnvironment(this, 'MyECSComputeEnvironment', {
    vpc,
    minvCpus: 0,
    maxvCpus: 8,
    instanceClasses: [InstanceClass.M6G],
    useOptimalInstanceClasses: false
})

CDK CLI Version

2.152.0 (build faa7d79)

Framework Version

No response

Node.js Version

v20.14.0

OS

ubuntu

Language

TypeScript

Language Version

No response

Other information

No response

@jasonforte jasonforte added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Aug 19, 2024
@github-actions github-actions bot added the @aws-cdk/aws-batch Related to AWS Batch label Aug 19, 2024
@ashishdhingra ashishdhingra self-assigned this Aug 19, 2024
@ashishdhingra ashishdhingra added p2 needs-reproduction This issue needs reproduction. and removed needs-triage This issue or PR still needs to be triaged. labels Aug 19, 2024
@ashishdhingra
Copy link
Contributor

@jasonforte Good morning. Thanks for reporting the issue. I was able to reproduce the issue using the provided code, where the error is thrown by CloudFormation:

11:31:22 AM | CREATE_FAILED        | AWS::Batch::ComputeEnvironment | MyECSComputeEnvironment6A03089C
Resource handler returned message: "Error executing request, Exception : arm-based instance type cannot be used with other instance types., RequestId: 07f6a0b1-5003-4004-88c7-3ae2c317cb24 (Ser
vice: Batch, Status Code: 400, Request ID: 07f6a0b1-5003-4004-88c7-3ae2c317cb24)" (RequestToken: 3e651d0a-82c0-ed7b-6c77-01a111440adf, HandlerErrorCode: InvalidRequest)


 ❌  CdktestStack failed: Error: The stack named CdktestStack failed to deploy: UPDATE_ROLLBACK_COMPLETE: Resource handler returned message: "Error executing request, Exception : arm-based instance type cannot be used with other instance types., RequestId: 07f6a0b1-5003-4004-88c7-3ae2c317cb24 (Service: Batch, Status Code: 400, Request ID: 07f6a0b1-5003-4004-88c7-3ae2c317cb24)" (RequestToken: 3e651d0a-82c0-ed7b-6c77-01a111440adf, HandlerErrorCode: InvalidRequest)
    at FullCloudFormationDeployment.monitorDeployment (/usr/local/lib/node_modules/aws-cdk/lib/index.js:446:10568)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async Object.deployStack2 [as deployStack] (/usr/local/lib/node_modules/aws-cdk/lib/index.js:449:199716)
    at async /usr/local/lib/node_modules/aws-cdk/lib/index.js:449:181438

 ❌ Deployment failed: Error: The stack named CdktestStack failed to deploy: UPDATE_ROLLBACK_COMPLETE: Resource handler returned message: "Error executing request, Exception : arm-based instance type cannot be used with other instance types., RequestId: 07f6a0b1-5003-4004-88c7-3ae2c317cb24 (Service: Batch, Status Code: 400, Request ID: 07f6a0b1-5003-4004-88c7-3ae2c317cb24)" (RequestToken: 3e651d0a-82c0-ed7b-6c77-01a111440adf, HandlerErrorCode: InvalidRequest)
    at FullCloudFormationDeployment.monitorDeployment (/usr/local/lib/node_modules/aws-cdk/lib/index.js:446:10568)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async Object.deployStack2 [as deployStack] (/usr/local/lib/node_modules/aws-cdk/lib/index.js:449:199716)
    at async /usr/local/lib/node_modules/aws-cdk/lib/index.js:449:181438

The stack named CdktestStack failed to deploy: UPDATE_ROLLBACK_COMPLETE: Resource handler returned message: "Error executing request, Exception : arm-based instance type cannot be used with other instance types., RequestId: 07f6a0b1-5003-4004-88c7-3ae2c317cb24 (Service: Batch, Status Code: 400, Request ID: 07f6a0b1-5003-4004-88c7-3ae2c317cb24)" (RequestToken: 3e651d0a-82c0-ed7b-6c77-01a111440adf, HandlerErrorCode: InvalidRequest)

Here is an excerpt of template generated by CDK synthesis:

...
 MyECSComputeEnvironment6A03089C:
    Type: AWS::Batch::ComputeEnvironment
    Properties:
      ComputeResources:
        AllocationStrategy: BEST_FIT_PROGRESSIVE
        InstanceRole:
          Fn::GetAtt:
            - MyECSComputeEnvironmentInstanceProfile9F922264
            - Arn
        InstanceTypes:
          - m6g
          - optimal
        MaxvCpus: 8
        MinvCpus: 0
        SecurityGroupIds:
          - Fn::GetAtt:
              - MyECSComputeEnvironmentSecurityGroup7C63B7FD
              - GroupId
        Subnets:
          - subnet-045c5a5af92ce5bf5
          - subnet-0552f61c30c94db58
          - subnet-0b187fc322e757ab0
        Type: EC2
      ReplaceComputeEnvironment: false
      State: ENABLED
      Type: managed
      UpdatePolicy: {}
    Metadata:
      aws:cdk:path: CdktestStack/MyECSComputeEnvironment/Resource
...

The reason optimal is added to InstanceTypes is due the fact that useOptimalInstanceClasses property is by default true (refer code here which inspects useOptimalInstanceClasses property). Please explicitly set useOptimalInstanceClasses to false as shown below:

const vpc = ec2.Vpc.fromLookup(this, 'myVpc', { isDefault: true });
const computeEnvironment = new batch.ManagedEc2EcsComputeEnvironment(this, 'MyECSComputeEnvironment', {
  vpc,
  minvCpus: 0,
  maxvCpus: 8,
  instanceClasses: [ec2.InstanceClass.M6G],
  useOptimalInstanceClasses: false
});

I'm unsure if it would be feasibly to check for instance type based on used instance class since it is an enum which gets updated as new instance classes are supported by EC2 service.

Thanks,
Ashish

@ashishdhingra ashishdhingra added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed needs-reproduction This issue needs reproduction. labels Aug 20, 2024
@pahud
Copy link
Contributor

pahud commented Aug 20, 2024

We probably could improve:

  1. if instanceClasses is defined, x86 and arm instance types can't be mixed. We have a similar function in aws-eks and we technically could check the consistency.
  2. if instanceClasses is defined as all arm instance types, useOptimalInstanceClasses has to be false if undefined and can't be set explicitly true.
  3. we could implement the checks in renderInstances()

function renderInstances(types?: ec2.InstanceType[], classes?: ec2.InstanceClass[], useOptimalInstanceClasses?: boolean): string[] {
const instances = [];
for (const instanceType of types ?? []) {
instances.push(instanceType.toString());
}
for (const instanceClass of classes ?? []) {
instances.push(instanceClass);
}
if (useOptimalInstanceClasses || useOptimalInstanceClasses === undefined) {
instances.push('optimal');
}
return instances;
}

@ashishdhingra ashishdhingra added effort/small Small work item – less than a day of effort and removed response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. labels Aug 20, 2024
@ashishdhingra ashishdhingra removed their assignment Sep 20, 2024
@mergify mergify bot closed this as completed in #31510 Oct 1, 2024
@mergify mergify bot closed this as completed in 7e80cc9 Oct 1, 2024
Copy link

github-actions bot commented Oct 1, 2024

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

1 similar comment
Copy link

github-actions bot commented Oct 1, 2024

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.