Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os-type change breaks backward compatibility #2881

Closed
HaroonSaid opened this issue May 25, 2021 · 17 comments
Closed

os-type change breaks backward compatibility #2881

HaroonSaid opened this issue May 25, 2021 · 17 comments

Comments

@HaroonSaid
Copy link

HaroonSaid commented May 25, 2021

Summary

We run mixed-mode instances in our ECS clusters with windows (2016,2019 and 20H2) and Linux
Prior to Agent 1.51.1, the os-type for windows instances was windows, and on deployment, the task with a placement constraint worked perfectly
The new names, WINDOWS_SERVER_XXXX_FULL|CORE break backward compatibility
e.g. WINDOWS_SERVER_2004_CORE or WINDOWS_SERVER_20H2_CORE

 "placementConstraints": [
    {
      "type": "memberOf",
      "expression": "attribute:ecs.os-type == windows"
    }
  ]

We also use custom attributes code to determine placement

 "placementConstraints": [
    {
      "type": "memberOf",
      "expression": "attribute:ecs.os-type == windows and attribute:winver exists and attribute:winver == win20H2"
    }
  ]

The changes in 1.15.1 breaks compatibility

Description

Expected Behavior

Observed Behavior

Environment Details

Supporting Log Snippets

@shubham2892
Copy link
Contributor

@HaroonSaid thanks for reporting this issue, looking into it.

@swlasse
Copy link

swlasse commented May 25, 2021

Thanks for raising this @HaroonSaid - we are having the exact same problem.

Our environment is:
Region: eu-west-1
AMI: Windows_Server-2004-English-Core-ECS_Optimized-2021.05.21
ECS Agent: 1.52.2

Just for reference, the docs are still referring to the "linux" and "windows" values so I assume this could be a bug in the agent (?)

Docs: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-placement-constraints.html#attributes

ecs.os-type
The operating system for the instance. The possible values for this attribute are linux and windows.

Thanks for looking into this @shubham2892

@shubham2892
Copy link
Contributor

I was able to reproduce this issue, working on the fix.

@ellenthsu
Copy link

related PR: #2859

@singholt
Copy link
Contributor

singholt commented May 25, 2021

Thanks a lot for reporting the issue @HaroonSaid @swlasse. While we're working on the fix and a new agent release for the same, could you please try modifying the placement constraints in your task definition to -

    {
      "type": "memberOf",
      "expression": "attribute:ecs.os-type != linux"
    }

changing to != linux above will help you schedule tasks on Windows instances again. We're working on the long-term fix and will have a new agent version and set of updated ECS_Optimized Windows AMIs out soon. Please let us know if this suggestion works for you in the meantime.

@swlasse
Copy link

swlasse commented May 26, 2021

Thanks for the update @singholt. I assume you are going back to how the original ecs.os-type attribute was working so we can expect to see the value windows again when your fix has been applied?

@singholt
Copy link
Contributor

Hi @swlasse, we're currently evaluating our options and we do plan to ensure that whatever changes come in do not break backward compatibility again. We'll keep this issue updated, thanks!

@swlasse
Copy link

swlasse commented May 26, 2021

Sounds good @singholt. Thanks a lot for the update 👍

@HaroonSaid
Copy link
Author

HaroonSaid commented May 28, 2021

The approach we did as a solution

"placementConstraints": [
 {
   "type": "memberOf",
  "expression": "(attribute:ecs.os-type == windows or attribute:ecs.os-type =~ WINDOWS.*) and attribute:winver exists and attribute:winver == win20H2"
 }
]

We have a custom attribute winver to figure out windows flavor based upon build containers

@HaroonSaid
Copy link
Author

HaroonSaid commented May 28, 2021

Hi @swlasse, we're currently evaluating our options and we do plan to ensure that whatever changes come in do not break backward compatibility again. We'll keep this issue updated, thanks!

I think creating a new attribute would have been easier and simpler
ecs.os-family that can be used for windows version 2016, 2019, 2004, 20H1, 21H2 ...
and maybe even have flavors for Linux?

@swlasse
Copy link

swlasse commented Jun 1, 2021

I think creating a new attribute would have been easier and simpler
ecs.os-family that can be used for windows version 2016, 2019, 2004, 20H1, 21H2 ...
and maybe even have flavors for Linux?

Yes, I agree. In order to resolve this issue we have created our own custom attribute with os-family info - similar to your approach @HaroonSaid.

@HaroonSaid
Copy link
Author

Just an FYI, the change had a much bigger impact on the stable production environments as new EC2 instances came alive with new agents - scaling up the ECS tasks failed to start.
Our workaround was to redeploy the failing tasks with the newer placement constraints.

@angelcar
Copy link
Contributor

Closing as the code generating the issue was reverted in v1.53.0

@swlasse
Copy link

swlasse commented Jun 15, 2021

Thanks for looking into this @angelcar and team. Do you know when a new ECS Optimized Windows Server Core 2004 AMI with the 1.53.0 agent will be available?

The SSM parameter: /aws/service/ami-windows-latest/Windows_Server-2004-English-Core-ECS_Optimized/image_id was last modified on May 22nd, so seems this has not been rolled out yet?

@singholt
Copy link
Contributor

Hi @swlasse, AMI release is in progress and should be out soon. I'll let you know here once its available.

@singholt
Copy link
Contributor

ECS Windows AMIs with updated agent are now public. Please let us know if you need anything else. Thanks

@swlasse
Copy link

swlasse commented Jun 16, 2021

Thanks @singholt - I've have just tried it out and things are now working as expected (new Windows AMI is available with the new 1.53.0 ECS agent and the value of the ecs.os-type attribute is "back" to windows). Thanks again for your help with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants