-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'certify' doesn't update existing certificates #590
Comments
To +1 this: it seems that none of my certificates have been auto-renewing. I've been having to do it by manually deleting the custom domain name in APIGW and re-deploying/re-certifying. Here's an example zappa_settings.json that I've been using:
EDIT: Should clarify that while troubleshooting this, I attempted to re-run the certify command with no luck. Running on zappa 0.32.1 |
I think this is actually related: #391 |
+1 This is happening to me too. I noticed that when I run I wonder if it's related to this code in the
Is this code just changing the certificate name? Maybe we need to pass the certificate body, etc... to this method. Something like:
...with better formatting of course. I got this code searching Github so it's likely supported. I don't have time to test this at the moment, but within the next week I could try it unless this issue is resolved or someone else can try it. |
Unfortunately, this is not possible like I had hoped. Only the certificate name can be updated (https://docs.aws.amazon.com/apigateway/api-reference/link-relation/domainname-update/). I suppose this problem was known when originally writing the I'm not sure how to get around this programmatically, short of creating a new api gateway domain with the new certificate, which may involve some downtime if we have to delete the current api gateway domain first. Hopefully, there is a better way. I will test with an old version of Zappa to determine when this regression occurred, although I always had problems with re-certifying my domain. Currently, I'm updating my certificate by using the undocumented Zappa features to output the certificate data, and pasting that data into the aws console as described at http://docs.aws.amazon.com/apigateway/latest/developerguide/how-to-custom-domains.html#how-to-rotate-custom-domain-certificate. P.S: The Terraform project also encountered this problem (hashicorp/terraform#8789) |
Yeah I'm not sure how to go about this. Just by virtue of the way APIGW/CF works, any time the certificate is changed on the fly, there's going to be downtime while it's pushed out to the edge nodes in CF. The AWS docs have something on uploading a backup certificate and then rotating it into production gradually to avoid the downtime issue. I don't know what aspect of this are exposed in the API, but maybe this will give us some clues on where to start looking. EDIT: I wonder if you could use the same method to update the certificateBody, certificatePrivateKey, and certificateChain parameters by specifying them as paths (i.e. /certificateBody) Just guessing blindly. Worth a shot maybe. |
If only it were that simple...I tried that previously but it didn't work. This is the output I got:
|
Ah it was worth a shot. I can't find anything in the documentation that points to the backup certificate functionality being exposed at all. The best we could hope for I think is that maybe it's undocumented and uses parameters like backupCertificateBody or something like that. Supposedly, AWS has been working on getting ACM to work with API Gateway, but there's no public time table at the moment: Maybe we should disable the renewal feature for the time being since from what I can tell it doesn't work as intended and it might catch people off guard when their 90 days is up (speaking from experience) or at least add a warning of sorts. I'll keep looking as time allows, but unfortunately it doesn't look good at the moment. EDIT: Well there it is in black and white, I just missed it the first time in the APIGW docs - "You cannot rotate custom domain name certificates programmatically." |
Personally I'm okay with renewing the certificate manually through the AWS console. After all, it's something you only have to do after 90 days. Of course, this should be made very clear in the documentation along with detailed instructions. One possible option to get the certificate updated programmatically would be to generate a new api gateway domain specifically for the new certificate and then switch the route53 domain to use the new api gateway. Eventually, after a sufficient amount of time for everything to propagate we'd delete the old api gateway. It's complicated but I believe this could all be done automatically in code. Just an idea. Of course, a simpler way would be much preferred. |
@Miserlou Had to renew all our certificates manually today (4 out of 5 expired). So it looks like it was not working properly for a while? |
One of the issues was that the documentation was wrong, sorry about that. It's |
@Miserlou hmm.. but that would only affect the creation of a cloudwatch trigger, won't it? I did see certificates expired even with the trigger present. |
Also, I did try to just run certify locally with no effect on the certificate (expired one wasn't replaced). |
Yeah, it needs to be chased down. I think there was a PR that introduced this bug although I can't confirm that. I think this is related: #391 |
Welp...this one just came around for me again on a different project. I'm really eager to help out with any solutions on this front. What can we do? |
First step is diagnosis! |
Roger. I'm going to crack down on this one bigtime, so I should have a PR by week's end. |
Fantastic news. |
All right, now that I'm actually getting down to business here, I can't find any way to recreate the issues people are concerned about on a fresh project. It looks to me like when certify is called, the certificate is correctly replaced with a new one, and the ANSI date in its same name seems to confirm this. Apart from how long it takes for new certs to reach edge nodes, which is not ours to change, what are we looking to improve here? |
How long did it take your update to reach "edge nodes"? |
It hasn't yet, as far as I can tell, and that's a good point -- until it arrives and that expiration date changes, I really won't have any way to tell for certain if we're sending through fresh certs or just changing the name displayed on the AWS UI and nothing else. |
Bump - did it propagate? |
Looks like it!
|
So now if I have an existing site that didn't update, what do I do? |
Well nothing this second since Amazon S3 is down, but... I would update your local tools and try running certify again. If that doesn't fix the problem, it may be time for more drastic measures. You can always undeploy, go through the admin interface to make sure everything is gone, then deploy and certify like it's fresh. |
I went around this in a giant circle today. Head -> Wall -> bang New theory is we can remove the API Gateway entry for the domain and recreating it but leave the Route53 domain name in tact, updating the record. Expecting this will cause down time between 0 and 40 minutes. >:[ |
Yeah given how AWS chose to not expose the backup certificate functionality in the API for APIGW, I don't see any other way. Supposedly they're going to integrate ACM with APIGW at some point, but no one knows if that's tomorrow or 10 years from now. |
I've been writing a little side project (for work) in flask using zappa, so this is my first go-around running zappa certify, and three weird things happened when I ran it which broke my project. What I did:
Note that this is the first update to prod I attempted after upgrading zappa to the most recent version (0.37.2) and the only certify I've done. Fixes I've attempted that didn't work:
Steps for fix I attempted that did work:
Let me know if you have questions or trouble reproducing any of this. I need to get it done, so I'll just setup the custom domains by hand for now. PS -- Sorry if I'm overloading this single issue, happy to split off into a different thread if appropriate. |
So I just tried my proposed solution up there ^, expecting a between 0 and 40 minute downtime, and it seems like the change is in fact instantaneous. So, it's not as elegant as actually rotating the certificate, but the damage isn't as bad as I imagined. <1 minute * 4 times a year - that's still quite a lot of 9's of uptime. |
Update: This does NOT work as intended. Gosh dang it. It works, I was able to see the page at the domain, immediately after calling certify, THEN the distribution goes down, once the new distribution has been created. So, as it stands, we can either have it work immediately, but then it breaks 40 minutes later, OR we can have it break immediately but it works 40 minutes later. Very annoying! Obviously the latter is preferred, but is there an opportunity for hacks here? Can we do some kind of domain name dance to ensure uptime? |
What's taking the 40 minutes? API Gateway? or route53? Is it possible to leave the old one in place but make a new one and then make the changeover but not nuke the old one for another 40 minutes? Basically: keep them both in existence until the changeover time is complete, so there's never downtime - the site is always served with either the old one or the new one. |
That'd be ideal, but would require either an extremely long running process, or we set up a scheduled function to do it from AWS, which might not work for everybody's setup. In the meantime, I've just added a |
This is ideally all happening via the recertify job, right? So it's already a long running process? |
Unless I'm missing something, it seems like something that could be done with no downtime. In the certify handler set up a new API gateway, leave the old one, point the route 53 domain to the new API gateway. The old API gateway could be deleted the next time the certify handler is called. |
Will this work since APIGW doesn't allow you to have two of the same custom domain names in the system at any given time? |
That's a very good point. Also, now that i think about it we would also need to ensure that the API gateway is fully propagated before pointing the route53 domain to it to guarantee no downtime. |
To give a sense of my manual process to work around this for the moment, here's what I've had to do.
CloudFront warns you to expect that domain deletion/creation can take up to 40 minutes each way, but it's been my experience that it's usually in the 20-25 minute range. I have about 15 sites using Zappa at the moment (thankfully none that are SLA sensitive) and this is the method I've used successfully since we discovered the issue. I'm not sure how we can clear this hurdle without:
|
@chromakey - if your current strategy already involves manual deletion, you can use the new --manual feature to use the console to update the cert to avoid downtime. The updated certify behavior will do a (reordered) version of your steps automatically. |
Sweet. I'll take it for a test drive in the next couple days. |
Haven't looked into this yet, but looks like we may have a path to success |
Yep! Currently open for discussion here: #710 |
Somewhere, a regression was introduced, I think this was from a PR that I was iffy about.
Repro:
Certify on a new name
Go to domain, old cert still in production.
The text was updated successfully, but these errors were encountered: