-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows worker fleets failing to start #295
Labels
bug
This issue is a bug.
Comments
horsmand
added
bug
This issue is a bug.
needs-triage
This issue or PR still needs to be triaged.
and removed
needs-triage
This issue or PR still needs to be triaged.
labels
Jan 22, 2021
horsmand
referenced
this issue
in horsmand/aws-rfdk
Jan 22, 2021
Fixes #295 This fixes a bug that is preventing the deployment of any worker instance that is using an AMI with a Windows OS.
horsmand
referenced
this issue
in horsmand/aws-rfdk
Jan 22, 2021
Fixes #295 This fixes a bug that is preventing the deployment of any worker instance that is using an AMI with a Windows OS.
horsmand
referenced
this issue
in horsmand/aws-rfdk
Jan 22, 2021
Fixes #295 This fixes a bug that is preventing the deployment of any worker instance that is using an AMI with a Windows OS.
horsmand
referenced
this issue
in horsmand/aws-rfdk
Jan 22, 2021
Fixes #295 This fixes a bug that is preventing the deployment of any worker instance that is using an AMI with a Windows OS.
horsmand
referenced
this issue
in horsmand/aws-rfdk
Jan 22, 2021
Fixes #295 This fixes a bug that is preventing the deployment of any worker instance that is using an AMI with a Windows OS.
ddneilson
pushed a commit
that referenced
this issue
Jan 22, 2021
Fixes #295 This fixes a bug that is preventing the deployment of any worker instance that is using an AMI with a Windows OS.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Trying to deploy a worker fleet (or new instance into an existing worker fleet) that uses any Windows AMI is failing due to a bug in a script that we execute on the host as part of its initialization process (in the UserData). The nature of the bug means that it affects all versions of RFDK in production, and all worker fleets that try to deploy using an AMI with a Windows operating system.
The script that is failing attempts to install and configure the CloudWatch agent onto the instance. Because this script fails, the script that is supposed to configure the Deadline worker to connect to the render queue and start it never gets executed and the health check fails, causing the CDK deployment to fail and roll back.
Since the failure prevents CloudWatch from setting up properly, no logs are uploaded to CloudWatch and viewable from the AWS Console, and the host gets terminated.
Log statement seen that signals we're falling back to the latest version of the CloudWatch agent rather than the version we try to pin to: https://github.com/aws/aws-rfdk/blob/mainline/packages/aws-rfdk/lib/core/scripts/powershell/configureCloudWatchAgent.ps1#L26
Error message that was observed: https://github.com/aws/aws-rfdk/blob/mainline/packages/aws-rfdk/lib/core/scripts/powershell/configureCloudWatchAgent.ps1#L52
Environment
This is 🐛 Bug Report
The text was updated successfully, but these errors were encountered: