-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'Waiting for img.shields.io" #445
Comments
I see you are using Coverity and Travis badges. I'll try to monitor their response times. Could you give me information about when they are not on par with what we should expect? |
Thanks for the prompt reply, much appreciated. I am sorry that I did not note down the exact time when it happened. What I can confirm is that when it did happen, the sample badges on the http://shields.io/ were also not rendered properly. I may also add that the outage did not take long to recover. It is the outage frequency that worries me. I happened to catch it with the pant down three or four times already since we switch to shields.io. |
It is happening now. |
It seems it has recovered itself again. |
Thanks a lot for reporting. There seems to have been a set of sudden surges in request frequency past 400 per seconds (from a normal 40) from servers at Amazon AWS. It has caused a lot of request time-outs, and has made Redis fail, which caused a marginal number of recovered crashes (about 50). There was also a Heroku slow-down at the same time caused by our using slighly over 512MB of memory, which made things worse. I think I plan on switching away from Heroku. We can hopefully use a lot more memory and take surges like these with a better infrastructure. |
Thanks for looking into this. |
I frequently experience this with the Gratipay shields. |
This issue still happens |
It is happening currently with rendering badges |
This may sound stupid (or arrogant depend on how you read it), but I was wondering whether this issue exists before I raised it. More precisely, did it exist before our project website switch to use shields.io badges? Or it it just because I am the first who bother to report it? I would not reveal the link to our project website here to avoid the impression that I am promoting it but I believe it should contribute to quite an amount of workload on your server because:
With all these combined together, it means it does not matter where our website visitors navigate to, each page being rendered would generate a small workload on your server. The accumulated workloads could become significant depends on how well your server scale up on demand. Hence I am wondering could it be our own doing that cause this in the first place! I hope I am worrying too much tough. |
@weitjong Based on my server logs, there is no single website generating a majority of the traffic, although GitHub is the most significant. Issues related to the responsiveness of the server have existed as long as we have used the current configuration (see eg. #226), and have been solved by caches and algorithms so far. The main issue is that, as far as I know, the Heroku system doesn't give a way to the server software to detect when it enters slowdown mode. I will change server setup, however. |
Thanks for your prompt reply. Again much appreciated. That is quite a relief to hear. I just want to come clean. 😄 |
Just to add: the badges are still slow. |
I'm seeing timeouts on my Travis and npm badges today at https://github.com/rackt/history. |
The problem seems to be getting worse. I have temporarily switched our website to use back the lower res badge from its original source instead of from shields.io. It probably does not make any differences to shields.io performance, but who knows. |
Indeed, it won't make a difference. Unfortunately, the server became unreachable at the worst possible time — just as I started going to sleep. I'm investigating the issue with my hosting provider. |
I have just rebooted the server, and things are running again. |
@espadrine have you considered hosting the images on S3 behind CloudFront? Couldn't you cache them there, if not actually render them to their as a static site as the primary representation? |
@ms-ati There is a difference between having slow badges, having downtime, and having incorrect badges. I strive to produce badges that are as correct as possible, as fast as possible, and with as little downtime as possible. I switched hosting providers a week ago to fix the speed issue; badges should now be exactly as slow as the service that produces the information they provide (plus network lag, which is generally negligible). More importantly, while before the server entered a severe slowdown for one hour every week at peak times, that should no longer happen. So far, that seems accurate. Today, the VPS went down and did not restart, for which I am talking to the provider. Having a CloudFront cache would not help a server that is not running on a machine that is not up to serve images. Cache really is not the issue right now. I have a cache that I use when vendors (Travis CI, etc) do not respond, when data changes rarely, or when the server receives duplicate requests in rapid fire. So: are the badges still slow for you? |
Is it down again? :) |
@espadrine Shouldn't a CloudFront or Fastly cache, simply in front of the badges system as a whole, smooth over any temporary outages? In other words, isn't HTTP caching itself a very well suited mechanism for increasing the availability of the badge urls? |
Yes, the server went dark again. I'm getting annoyed at OVH. I'd rather not spend another week of holidays setting things up yet again on a new server (say Digital Ocean), but two downtimes a week is obviously inacceptable. I sent them a mail, I will see what they say. @ms-ati I use HTTP caching. Obviously, most people want badges to have accurate information, though. It is irrelevant anyway when you can't even access the IP that the DNS points your browser to. |
@espadrine Would it make more sense to use AWS and automatic scaling load balanced instances? Thus handling large sums of traffic quickly and easily according to your auto scaling policy. So that when say 80% of CPU is being used AWS will automatically spin up an exact copy of that instance and start sending traffic to it and the other instance? |
Would it also make sense to put a CDN in front, which is configured to On Mon, Oct 5, 2015 at 10:29 AM, tankerkiller125 notifications@github.com
Marc Siegel Email: marc@usainnov.com |
@tankerkiller125 Shields.io is not CPU-bound. @ms-ati What is the difference between a CDN that returns the last value when the origin is unreachable, and a cache that does the same thing? We currently have the latter. |
@espadrine Good question! I believe the difference is in the area of robustness. Or downtime, like this ticket discusses. Using a cache as we have today impacts performance, and it may provide robustness against downtime of the data sources. However, as this ticket attests, it doesn't provide robustness for the shields.io service endpoints itself. I think that using a CDN "in front" of shields.io, which is configured to return successfully with the last value for any url it has cached when the origin is down, will provide robustness to the service itself being down. That way, badges that have previously been requested will continue to appear, and just won't be updated until the service is back up. |
@hotrush I think the load that the servers are under is causing the request to API's to occur to slowly for the software. Causing the vendor unresponsive errors. |
Note: I changed to having two servers with DNS round-robin. We'll see if there is some improvements. |
My badges still don't load (see my post & repo link above), do you have any idea what I could try? |
I have the same problem, seems that github link for license badge inside readme doesn't work for some time now
|
I just checked and was very surprised to see the license and version badges load a few minutes ago. I refreshed a few times and it was working each time. Now, 3 minutes or so later, the release badge is back to not loading. |
@patrikhuber That's probably GitHub's rate limit on one server kicking in. I suggest using |
Hello, I've the same issue with my badges. Travis badge working well, other (Github Badge : License and Releases) sometimes appears, sometimes not. https://github.com/algorys/agshmne/ @espadrine I've tried your solution. That's working great but maybe too complicated to maintain for large project |
@espadrine Thanks. Is there a workaround/solution for the release version badges? Are all shields users experiencing these problems, and if not, why only a handful of us? Why does this rate limit not kick in for others? |
@patrikhuber you can make the same with your release but you have to change manually for each versions / tags :
But as I said before, if you release often it'll be hard to maintain. |
@patrikhuber I'm exchanging with GitHub to figure out a solution. The main issue is here: #529. It does affect everyone, although we do have two servers, so two IPs, presumably treated separately. |
@algorys I know, but that's not really a great option unfortunately. @espadrine Cool, that's great to hear! Awesome. And thanks to the link to #529. It just seems a bit disappointing that the conversation hasn't advanced since Sept 2015. Anyway, glad to see I'm not the only one having this issue, and I hope there will be a solution soon. |
I've just gone through the comments in this issue and those in its sibling #529. As a result I also contributed tokens using https://img.shields.io/github-auth. What I don't understand is why GitHub would switch img src URLs from https://img.shields.io/ with those from https://camo.githubusercontent.com/ at all. After all, it'd be the browser making those requests and thus hit shields.io. Why would GitHub want to redirect (and cache) those? To me that makes particularly little sense for "static" badges (i.e. those that don't need to access GitHub resources to render) such as license or Twitter. |
@marcelstoer Originally, the point was to avoid mixed-content warnings which browsers raise when a HTTPS page has resources (eg. images) fetching data over insecure HTTP. You can read more on the subject on their README. |
Ahh, right... "mixed-content warnings", forgot about those - except that shields.io can be accessed over HTTPS as well. |
@marcelstoer Camo is now also used to set CSP information, which they rely on to avoid a class of vulnerabilities, things that are part of XSS or CSRF. |
Would self-hosting be a sensible alternative to all of us loading from https://img.shields.io? |
@marcelstoer I've put my static badges to separate branch and linked that over rawgit service after too much trouble with img.shields.io, it's just not available most of the time |
@pkoretic I had already planned to do that for static badges as well. The majority of badges are dynamic, though (i.e. the make some API calls). |
@marcelstoer yeah, I ignored them for now, probably the best is to add reverse caching proxy in front of them for your usage |
I believe there's a misunderstanding, either on your side or on mine 😉 What I meant was to host this project myself at let's say https://shields.mydomain.io. Then in my READMEs I'd use that domain. GitHub would still route them through camo but those requests wouldn't count against the https://img.shields.io rate limit. |
@marcelstoer I understand, but it seems to much work for just badges Thats why I recommended reverse caching proxy (using nginx for example) That way you can still use your domain https://shields.mydomain.io which is proxy to https://img.shields.io so when https://img.shields.io is not available you will still get cached results instead of hanging |
The load has increased quite a bit lately: it averages 170 req/s at peak time. I'll add a server to the pool (I currently rely on two servers). GitHub rate limits are not the issue anymore. |
@espadrine There is a module for Nginx that allows the load balancing feature to be controlled by a rest interface. And you could do an HTTP check on the servers like every 20s or so. This would allow people with additional servers to help out with handling the load with the minimal configuration for you to handle. Of course, this would need some authentication so that people can add and remove servers securely. |
@espadrine Is there any action to take here? |
@paulmelnikow I believe the performance of the servers is more suitable nowadays. |
After changing our project website to use shields.io, it frequently has long wait to render the badges. When we have problem in the badge rendering on our website, the http://shields.io/ also has the similar problem. So, I guess the problem is not on our side.
Is there any way to improve this? I am considering to switch back to use the original badges from their respective service providers. It is better to get a reliable low resolution version than an unreliable high resolution badge.
The text was updated successfully, but these errors were encountered: