'Waiting for img.shields.io" #445

weitjong · 2015-05-13T09:04:52Z

After changing our project website to use shields.io, it frequently has long wait to render the badges. When we have problem in the badge rendering on our website, the http://shields.io/ also has the similar problem. So, I guess the problem is not on our side.

Is there any way to improve this? I am considering to switch back to use the original badges from their respective service providers. It is better to get a reliable low resolution version than an unreliable high resolution badge.

espadrine · 2015-05-13T09:29:42Z

I see you are using Coverity and Travis badges. I'll try to monitor their response times. Could you give me information about when they are not on par with what we should expect?

weitjong · 2015-05-14T05:16:35Z

Thanks for the prompt reply, much appreciated. I am sorry that I did not note down the exact time when it happened. What I can confirm is that when it did happen, the sample badges on the http://shields.io/ were also not rendered properly. I may also add that the outage did not take long to recover. It is the outage frequency that worries me. I happened to catch it with the pant down three or four times already since we switch to shields.io.

weitjong · 2015-05-19T13:51:02Z

It is happening now.

weitjong · 2015-05-19T14:23:47Z

It seems it has recovered itself again.

espadrine · 2015-05-19T14:46:12Z

Thanks a lot for reporting. There seems to have been a set of sudden surges in request frequency past 400 per seconds (from a normal 40) from servers at Amazon AWS. It has caused a lot of request time-outs, and has made Redis fail, which caused a marginal number of recovered crashes (about 50).

There was also a Heroku slow-down at the same time caused by our using slighly over 512MB of memory, which made things worse.

I think I plan on switching away from Heroku. We can hopefully use a lot more memory and take surges like these with a better infrastructure.

weitjong · 2015-05-20T13:28:27Z

Thanks for looking into this.

untitaker · 2015-06-10T14:19:08Z

I frequently experience this with the Gratipay shields.

ionelmc · 2015-06-18T15:20:52Z

This issue still happens

stephnr · 2015-06-18T15:27:40Z

It is happening currently with rendering badges

weitjong · 2015-06-19T06:20:51Z

This may sound stupid (or arrogant depend on how you read it), but I was wondering whether this issue exists before I raised it. More precisely, did it exist before our project website switch to use shields.io badges? Or it it just because I am the first who bother to report it?

I would not reveal the link to our project website here to avoid the impression that I am promoting it but I believe it should contribute to quite an amount of workload on your server because:

Our website has picked up a significant traffic recently.
The badges are located in our default footer page template, i.e. the badges appear in the footer of all the generated pages, including our documentation pages.
Before the switch, we have employed a small js to force browser to always download Travis-CI build status badges remotely instead of from local cache. This javascript is still in place after the switch.

With all these combined together, it means it does not matter where our website visitors navigate to, each page being rendered would generate a small workload on your server. The accumulated workloads could become significant depends on how well your server scale up on demand. Hence I am wondering could it be our own doing that cause this in the first place! I hope I am worrying too much tough.

espadrine · 2015-06-19T08:08:24Z

@weitjong Based on my server logs, there is no single website generating a majority of the traffic, although GitHub is the most significant. Issues related to the responsiveness of the server have existed as long as we have used the current configuration (see eg. #226), and have been solved by caches and algorithms so far. The main issue is that, as far as I know, the Heroku system doesn't give a way to the server software to detect when it enters slowdown mode. I will change server setup, however.

weitjong · 2015-06-19T08:20:54Z

Thanks for your prompt reply. Again much appreciated.

That is quite a relief to hear. I just want to come clean. 😄
Our website is hosted on github.io too. But I guess this statement does not change anything as you said the issue exists long before.

ms-ati · 2015-06-24T11:48:55Z

Just to add: the badges are still slow.

mjackson · 2015-07-28T22:33:40Z

I'm seeing timeouts on my Travis and npm badges today at https://github.com/rackt/history.

weitjong · 2015-09-15T04:14:01Z

The problem seems to be getting worse. I have temporarily switched our website to use back the lower res badge from its original source instead of from shields.io. It probably does not make any differences to shields.io performance, but who knows.

espadrine · 2015-09-15T06:30:55Z

Indeed, it won't make a difference. Unfortunately, the server became unreachable at the worst possible time — just as I started going to sleep. I'm investigating the issue with my hosting provider.

espadrine · 2015-09-15T06:34:47Z

I have just rebooted the server, and things are running again.

ms-ati · 2015-09-15T11:01:43Z

@espadrine have you considered hosting the images on S3 behind CloudFront? Couldn't you cache them there, if not actually render them to their as a static site as the primary representation?

espadrine · 2015-09-15T11:41:33Z

@ms-ati There is a difference between having slow badges, having downtime, and having incorrect badges.

I strive to produce badges that are as correct as possible, as fast as possible, and with as little downtime as possible.

I switched hosting providers a week ago to fix the speed issue; badges should now be exactly as slow as the service that produces the information they provide (plus network lag, which is generally negligible). More importantly, while before the server entered a severe slowdown for one hour every week at peak times, that should no longer happen. So far, that seems accurate.

Today, the VPS went down and did not restart, for which I am talking to the provider. Having a CloudFront cache would not help a server that is not running on a machine that is not up to serve images.

Cache really is not the issue right now. I have a cache that I use when vendors (Travis CI, etc) do not respond, when data changes rarely, or when the server receives duplicate requests in rapid fire.

So: are the badges still slow for you?

gnzlbg · 2015-09-18T13:47:05Z

Is it down again? :)

ms-ati · 2015-09-18T15:12:07Z

@espadrine Shouldn't a CloudFront or Fastly cache, simply in front of the badges system as a whole, smooth over any temporary outages? In other words, isn't HTTP caching itself a very well suited mechanism for increasing the availability of the badge urls?

espadrine · 2015-09-18T15:38:36Z

Yes, the server went dark again. I'm getting annoyed at OVH. I'd rather not spend another week of holidays setting things up yet again on a new server (say Digital Ocean), but two downtimes a week is obviously inacceptable. I sent them a mail, I will see what they say.

@ms-ati I use HTTP caching. Obviously, most people want badges to have accurate information, though. It is irrelevant anyway when you can't even access the IP that the DNS points your browser to.

tankerkiller125 · 2015-10-05T14:29:33Z

@espadrine Would it make more sense to use AWS and automatic scaling load balanced instances? Thus handling large sums of traffic quickly and easily according to your auto scaling policy. So that when say 80% of CPU is being used AWS will automatically spin up an exact copy of that instance and start sending traffic to it and the other instance?

ms-ati · 2015-10-05T14:57:33Z

Would it also make sense to put a CDN in front, which is configured to
successfully return the last value when the origin is unreachable? This
would seem a good case for that?

On Mon, Oct 5, 2015 at 10:29 AM, tankerkiller125 notifications@github.com
wrote:

@espadrine https://github.com/espadrine Would it make more sense to use
AWS and automatic scaling load balanced instances? Thus handling large sums
of traffic quickly and easily according to your auto scaling policy. So
that when say 80% of CPU is being used AWS will automatically spin up an
exact copy of that instance and start sending traffic to it and the other
instance?

—
Reply to this email directly or view it on GitHub
#445 (comment).

Marc Siegel
Technology Specialist
American Technology Innovations

Email: marc@usainnov.com
Phone: 1 (617) 399-8145
Cell: 1 (617) 223-1220
Fax: 1 (617) 334-7975

espadrine · 2015-10-05T15:16:31Z

@tankerkiller125 Shields.io is not CPU-bound.

@ms-ati What is the difference between a CDN that returns the last value when the origin is unreachable, and a cache that does the same thing? We currently have the latter.

ms-ati · 2015-10-05T17:37:56Z

What is the difference between a CDN that returns the last value when the origin
is unreachable, and a cache that does the same thing? We currently have the latter.

@espadrine Good question! I believe the difference is in the area of robustness. Or downtime, like this ticket discusses.

Using a cache as we have today impacts performance, and it may provide robustness against downtime of the data sources. However, as this ticket attests, it doesn't provide robustness for the shields.io service endpoints itself.

I think that using a CDN "in front" of shields.io, which is configured to return successfully with the last value for any url it has cached when the origin is down, will provide robustness to the service itself being down.

That way, badges that have previously been requested will continue to appear, and just won't be updated until the service is back up.

tankerkiller125 · 2016-02-03T22:20:19Z

@hotrush I think the load that the servers are under is causing the request to API's to occur to slowly for the software. Causing the vendor unresponsive errors.

espadrine · 2016-02-07T19:37:48Z

Note: I changed to having two servers with DNS round-robin. We'll see if there is some improvements.

patrikhuber · 2016-02-07T19:44:29Z

My badges still don't load (see my post & repo link above), do you have any idea what I could try?

pkoretic · 2016-02-16T22:45:18Z

@patrikhuber @espadrine

I have the same problem, seems that github link for license badge inside readme doesn't work for some time now
replacing it with direct link works

[![License](http://img.shields.io/:license-mit-blue.svg)](http...)

patrikhuber · 2016-02-17T13:30:29Z

I just checked and was very surprised to see the license and version badges load a few minutes ago. I refreshed a few times and it was working each time.

Now, 3 minutes or so later, the release badge is back to not loading.

espadrine · 2016-02-17T13:43:55Z

@patrikhuber That's probably GitHub's rate limit on one server kicking in. I suggest using https://img.shields.io/badge/license-mit-blue.svg?style=flat-square instead. That will essentially instantaneously load.

algorys · 2016-02-22T13:06:38Z

Hello,

I've the same issue with my badges.

Travis badge working well, other (Github Badge : License and Releases) sometimes appears, sometimes not.

https://github.com/algorys/agshmne/

@espadrine I've tried your solution. That's working great but maybe too complicated to maintain for large project

patrikhuber · 2016-02-24T16:49:20Z

@espadrine Thanks. Is there a workaround/solution for the release version badges?

Are all shields users experiencing these problems, and if not, why only a handful of us? Why does this rate limit not kick in for others?

algorys · 2016-02-25T07:18:59Z

@patrikhuber you can make the same with your release but you have to change manually for each versions / tags :

[![GitHub release](https://img.shields.io/badge/release-v0.0.3-blue.svg)](https://github.com/algorys/agshmne/releases/latest)

But as I said before, if you release often it'll be hard to maintain.

espadrine · 2016-03-29T21:45:04Z

@patrikhuber I'm exchanging with GitHub to figure out a solution. The main issue is here: #529. It does affect everyone, although we do have two servers, so two IPs, presumably treated separately.

patrikhuber · 2016-03-29T22:17:38Z

@algorys I know, but that's not really a great option unfortunately.

@espadrine Cool, that's great to hear! Awesome. And thanks to the link to #529. It just seems a bit disappointing that the conversation hasn't advanced since Sept 2015. Anyway, glad to see I'm not the only one having this issue, and I hope there will be a solution soon.

marcelstoer · 2017-01-19T19:03:03Z

I've just gone through the comments in this issue and those in its sibling #529. As a result I also contributed tokens using https://img.shields.io/github-auth.

What I don't understand is why GitHub would switch img src URLs from https://img.shields.io/ with those from https://camo.githubusercontent.com/ at all. After all, it'd be the browser making those requests and thus hit shields.io. Why would GitHub want to redirect (and cache) those? To me that makes particularly little sense for "static" badges (i.e. those that don't need to access GitHub resources to render) such as license or Twitter.

espadrine · 2017-01-20T11:15:57Z

@marcelstoer Originally, the point was to avoid mixed-content warnings which browsers raise when a HTTPS page has resources (eg. images) fetching data over insecure HTTP. You can read more on the subject on their README.

marcelstoer · 2017-01-20T11:47:11Z

Ahh, right... "mixed-content warnings", forgot about those - except that shields.io can be accessed over HTTPS as well.

espadrine · 2017-01-20T13:48:02Z

@marcelstoer Camo is now also used to set CSP information, which they rely on to avoid a class of vulnerabilities, things that are part of XSS or CSRF.

marcelstoer · 2017-01-25T20:35:11Z

Would self-hosting be a sensible alternative to all of us loading from https://img.shields.io?

pkoretic · 2017-01-25T21:44:14Z

@marcelstoer I've put my static badges to separate branch and linked that over rawgit service after too much trouble with img.shields.io, it's just not available most of the time
I can confirm it works without issues, example: https://github.com/qaap/recurse

marcelstoer · 2017-01-25T21:49:01Z

@pkoretic I had already planned to do that for static badges as well. The majority of badges are dynamic, though (i.e. the make some API calls).

pkoretic · 2017-01-26T08:46:54Z

@marcelstoer yeah, I ignored them for now, probably the best is to add reverse caching proxy in front of them for your usage

marcelstoer · 2017-01-26T08:54:31Z

I believe there's a misunderstanding, either on your side or on mine 😉 What I meant was to host this project myself at let's say https://shields.mydomain.io. Then in my READMEs I'd use that domain. GitHub would still route them through camo but those requests wouldn't count against the https://img.shields.io rate limit.

pkoretic · 2017-01-26T09:38:25Z

@marcelstoer I understand, but it seems to much work for just badges
it would be better if we could give more servers into the pool, since I too have spare dedicated servers which could take on that load

Thats why I recommended reverse caching proxy (using nginx for example)

That way you can still use your domain https://shields.mydomain.io which is proxy to https://img.shields.io so when https://img.shields.io is not available you will still get cached results instead of hanging

espadrine · 2017-01-26T13:01:52Z

The load has increased quite a bit lately: it averages 170 req/s at peak time. I'll add a server to the pool (I currently rely on two servers).

GitHub rate limits are not the issue anymore.

tankerkiller125 · 2017-01-27T17:44:53Z

@espadrine There is a module for Nginx that allows the load balancing feature to be controlled by a rest interface. And you could do an HTTP check on the servers like every 20s or so. This would allow people with additional servers to help out with handling the load with the minimal configuration for you to handle. Of course, this would need some authentication so that people can add and remove servers securely.

paulmelnikow · 2017-04-28T16:28:52Z

@espadrine Is there any action to take here?

espadrine · 2017-04-28T16:57:40Z

@paulmelnikow I believe the performance of the servers is more suitable nowadays.

weitjong mentioned this issue May 19, 2015

shields.io down? #448

Closed

This was referenced May 28, 2015

The second (numeric) half of badge crawls up as text #454

Closed

README: Shields back up acemod/ACE3#1410

Merged

espadrine mentioned this issue Jun 22, 2015

Grey AppVeyor "vendor: unresponsive" badges #480

Closed

tankerkiller125 mentioned this issue Feb 3, 2016

img.shields.io is down #645

Closed

paulmelnikow mentioned this issue Mar 28, 2017

Cloudflare security challenge breaks badges for non-US locations #800

Closed

paulmelnikow added the operations Hosting, monitoring, and reliability for the production badge servers label Apr 17, 2017

espadrine closed this as completed Apr 28, 2017

badges locked and limited conversation to collaborators Sep 6, 2017

'Waiting for img.shields.io" #445

'Waiting for img.shields.io" #445

Comments

weitjong commented May 13, 2015

espadrine commented May 13, 2015

weitjong commented May 14, 2015

weitjong commented May 19, 2015

weitjong commented May 19, 2015

espadrine commented May 19, 2015

weitjong commented May 20, 2015

untitaker commented Jun 10, 2015

ionelmc commented Jun 18, 2015

stephnr commented Jun 18, 2015

weitjong commented Jun 19, 2015

espadrine commented Jun 19, 2015

weitjong commented Jun 19, 2015

ms-ati commented Jun 24, 2015

mjackson commented Jul 28, 2015

weitjong commented Sep 15, 2015

espadrine commented Sep 15, 2015

espadrine commented Sep 15, 2015

ms-ati commented Sep 15, 2015

espadrine commented Sep 15, 2015

gnzlbg commented Sep 18, 2015

ms-ati commented Sep 18, 2015

espadrine commented Sep 18, 2015

tankerkiller125 commented Oct 5, 2015

ms-ati commented Oct 5, 2015

espadrine commented Oct 5, 2015

ms-ati commented Oct 5, 2015

tankerkiller125 commented Feb 3, 2016

espadrine commented Feb 7, 2016

patrikhuber commented Feb 7, 2016

pkoretic commented Feb 16, 2016

patrikhuber commented Feb 17, 2016

espadrine commented Feb 17, 2016

algorys commented Feb 22, 2016

patrikhuber commented Feb 24, 2016

algorys commented Feb 25, 2016

espadrine commented Mar 29, 2016

patrikhuber commented Mar 29, 2016

marcelstoer commented Jan 19, 2017

espadrine commented Jan 20, 2017

marcelstoer commented Jan 20, 2017 • edited Loading

espadrine commented Jan 20, 2017

marcelstoer commented Jan 25, 2017

pkoretic commented Jan 25, 2017

marcelstoer commented Jan 25, 2017

pkoretic commented Jan 26, 2017

marcelstoer commented Jan 26, 2017

pkoretic commented Jan 26, 2017

espadrine commented Jan 26, 2017

tankerkiller125 commented Jan 27, 2017

paulmelnikow commented Apr 28, 2017

espadrine commented Apr 28, 2017

marcelstoer commented Jan 20, 2017 •

edited

Loading