Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMI for 2765.2.6 on AWS lost #456

Closed
christianhuening opened this issue Jul 29, 2021 · 24 comments
Closed

AMI for 2765.2.6 on AWS lost #456

christianhuening opened this issue Jul 29, 2021 · 24 comments

Comments

@christianhuening
Copy link

Description

We used the AMI ami-0737c661a0881fd94 for flatcar community version 2765.2.6 on AWS and still have some EC2 instances running with it. Starting new instances from our tooling exposed that this AMI indeed is not available anymore.

Impact

broken deployments and we're wondering why that AMI was removed

Environment and steps to reproduce

try to deploy an EC2 instance with AMI ami-0737c661a0881fd94 in region me-south-1.

Expected behavior

AMI still available

@sayanchowdhury
Copy link
Member

The AMI ami-0737c661a0881fd94 should be available now. Can you check and verify?

@christianhuening
Copy link
Author

hm yes, it's there now. Can you tell us why it was gone? Was it a mistake? If not, is there anywhere to subscribe to to receive notifications about revocations or so?

@christianhuening
Copy link
Author

@sayanchowdhury hm the image is available but despite being a community image we receive the following error:

Cloud provider message - machine codes error: code = [Internal] message = [OptInRequired: In order to use this AWS Marketplace product you need to accept terms and subscribe. To do so please visit https://aws.amazon.com/marketplace/pp?sku=1d7i2p7lb26sz24e5lr090wem

this is weird and was not the case earlier. Especially since the AMI dashboard does not list this AMI as a marketplace variant.

@sayanchowdhury
Copy link
Member

hm yes, it's there now. Can you tell us why it was gone? Was it a mistake? If not, is there anywhere to subscribe to to receive notifications about revocations or so?

This was surely a bug and should not have happened. It is strange that this happened only in this region from what it seems.

@sayanchowdhury
Copy link
Member

@sayanchowdhury hm the image is available but despite being a community image we receive the following error:

Cloud provider message - machine codes error: code = [Internal] message = [OptInRequired: In order to use this AWS Marketplace product you need to accept terms and subscribe. To do so please visit https://aws.amazon.com/marketplace/pp?sku=1d7i2p7lb26sz24e5lr090wem

this is weird and was not the case earlier. Especially since the AMI dashboard does not list this AMI as a marketplace variant.

Hmm, indeed this is weird. Let me check.

@sayanchowdhury
Copy link
Member

@christianhuening I could not reproduce the last reported issue. I tried spinning up an instance in my personal AWS account, and it worked without any issue.

Is it possible for you to check if you face the same issue when you try to spin an instance directly via awscli or the AWS Management Console?

@christianhuening
Copy link
Author

That's weird. We had to explicitly accept the terms and conditions of the flatcar marketplace thingy for a price of $0 to get it started. Then it worked like a charm 😵‍💫

@christianhuening
Copy link
Author

@sayanchowdhury i can try to do that tomorrow. Probably need to use another AWS account, but yeah.

@daviddyball
Copy link

daviddyball commented Aug 1, 2021

Just wanted to chime in and say that ami-0097d8b6241e9cf76 (eu-west-2 AMI) also just went missing for us at 2021-08-01T00:00 UTC+1:

image

All our ASGs are complaining with

Launching a new EC2 instance. Status Reason: Could not launch
On-Demand Instances. AuthFailure - Not authorized for images: 
[ami-0097d8b6241e9cf76]. Launching EC2 instance failed.

DescribeImages shows no publicly visible images for use:

aws ec2 describe-images --image-id ami-0097d8b6241e9cf76 --region eu-west-2
{
    "Images": []
}

@daviddyball
Copy link

daviddyball commented Aug 1, 2021

@christianhuening was right.... had to subscribe on the AWS marketplace and then it worked fine. Have AWS changed something or has Kinvolk changed CI/CD for the way AMIs are refreshed? Do all images now need us to subscribe to you on AWS?

EDIT: Screenshot (attached) shows a lot of images now listed as Source: aws-marketplace/.... whereas they used to be Source: 075585003325/Flatcar-........

image

@daviddyball
Copy link

daviddyball commented Aug 1, 2021

Additional side-note... you can't clone/copy aws-marketplace/.... images into your own account... so Kinvolk's public images are now a single-point of failure in peoples infrastructure should they go missing 😞
image

EDIT: Every day is a school day... just read Kinvolk got acquired by Microsoft. Firstly congrats for the Kinvolk team... Secondly is the move to having to accept a license agreement and aws-marketplace/... images a result of that acquisition, or just a weird unintended side-effect of something else that has changed in the background?

EDIT2: aws-marketplace/... images aren't compatible with kops tooling for managing K8s clusters. We used to specify the image as 075585003325/Flatcar-stable-2765.2.6-hvm and it would figure out exactly which AMI it needed based on the region of your kops cluster.... switching this to aws-marketplace/Flatcar-stable-2765.2.6-hvm fails with kops tooling 😢 The only workaround I can see right now is to manually resolve aws-marketplace/Flatcar-stable-2765.2.6-hvm to an AMI id per-region and hardcode that in our cluster configurations, which is less than ideal

@hato221
Copy link

hato221 commented Aug 1, 2021

Hey @daviddyball,
for kops, you need to specify the full name of the source image as that was changed as well. In our case, we have changed for instance 075585003325/Flatcar-stable-2605.12.0-hvm -> aws-marketplace/Flatcar-stable-2605.12.0-hvm-1716ad1c-deff-42e5-86bc-228658463d0e. You can find the images at EC2 -> Images section and look for the one you need.
Btw. Before running kops, you need to be subscribed to aws-marketplace Flatcar Linux you need (Stable, LTS, etc.)

@daviddyball
Copy link

Thanks @hato221 .... For now I've just switched to straight ami-* ids.

I must say I'm not happy with the situation at the moment. Hopefully it was just a mistake somewhere in the Kinvolk release pipelines or something, but the necessity to subscribe to marketplace images feels anti-user to me.... combined with the fact that this affected production workloads... First CoreOS gets neutered and now some weird things going on with Flatcar .... 😏

@mindw
Copy link

mindw commented Aug 1, 2021

It is actually much, much worse. All AMIs linked from https://kinvolk.io/docs/flatcar-container-linux/latest/installing/cloud/aws-ec2/ lead to
image
There are no AMIs registered for account 075585003325 .

It means that even AMIs that are supposed to be there were yanked from AWS WITHOUT ANY WARNING. A lot of your users will be extremely annoyed (to put it mildly) when their CIs, development and PRODUCTION will break.

@daviddyball
Copy link

One by one all of my clusters are slowly succumbing to this issue as auto-scaling and spot terminations take place. Worse still each account in my AWS organisation has to subscribe to these images in order to get access... which is just an awful UX. It's a brilliant way to spend my weekend 👍

A lot of your users will be extremely annoyed (to put it mildly) when their CIs, development and PRODUCTION will break.

This is such a polite way to put it 😂

@sayanchowdhury
Copy link
Member

There has been no change in the CI/CD pipeline, and the change to aws-marketplace is not intended and needs to be looked into. Thanks for your inputs, this would surely help with reproducing the case.

The Flatcar AWS account has been suspended (we are unsure why) due to which all the AMIs are now unavailable. We are working with the cloud provider to things quickly running again. The suspension also blocks us to debug the earlier mentioned issue.

Related: https://twitter.com/flatcar_linux/status/1421861030033072133

@dharmab
Copy link

dharmab commented Aug 2, 2021

First off, #hugops to everyone. I hope everyone is doing as well as they can be while dealing with a frustrating problem.

Once the dust settles, a written account of the issues that resulted in the account suspension would be really helpful. It is concerning that AWS can suspend an account with difficult recourse. I'm sure we all want to prepare our own orgs and accounts for any similar future event.

Best wishes!

@computeralex92
Copy link

Seems that the issue is solved, the images are accessible again.
Thank you for the team to resolve this issue before Monday morning; that saves me of a stressful day.

Once the dust settles, a written account of the issues that resulted in the account suspension would be really helpful. It is concerning that AWS can suspend an account with difficult recourse. I'm sure we all want to prepare our own orgs and accounts for any similar future event.

That would be great; I think for every AWS / cloud customer this is one of the most worst "worst case scenario".

@ahrkrak
Copy link
Contributor

ahrkrak commented Aug 2, 2021

Yes, we are working with our contacts at AWS, who have been supportive and helpful, to find out what exactly happened and how to avoid in future. We will be sharing findings ASAP but please give us some time as we want to avoid premature conclusions.

@hligit
Copy link

hligit commented Aug 2, 2021

Looks like Flatcar-alpha images for arm64 are still missing.

@sayanchowdhury
Copy link
Member

Looks like Flatcar-alpha images for arm64 are still missing.

Oh! yes. They should be public now. @hligit Can you please verify?

@hligit
Copy link

hligit commented Aug 2, 2021

@sayanchowdhury, I have verified that the ARM64 images are available. Thanks!

@argvk
Copy link

argvk commented Aug 19, 2021

We will be sharing findings ASAP but please give us some time as we want to avoid premature conclusions.

Hey there, I was wondering if there is an ETA for the findings/postmortem. From the tweets and issues, it wasn't clear which users were affected.

@jepio
Copy link
Member

jepio commented Jan 28, 2022

The root cause was a locked account causing all of our AMI to be marked private. The account was locked based on an automated system, with no clear correlation to any action our side. That's all we know. We'll be closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests