Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub Releases Broken #161

Closed
GrahamCampbell opened this issue Apr 21, 2014 · 55 comments
Closed

GitHub Releases Broken #161

GrahamCampbell opened this issue Apr 21, 2014 · 55 comments

Comments

@GrahamCampbell
Copy link

The GitHub releases badge for many of my repos don't work. Here is one of them: https://img.shields.io/github/release/GrahamCampbell/Bootstrap-CMS.svg.
Here are the tags for the repo: https://github.com/GrahamCampbell/Bootstrap-CMS/releases.


Latest Version

@GrahamCampbell
Copy link
Author

Wait, what? Now it is showing up. How can that be? Why does it keep switching between working and broken?

@espadrine
Copy link
Member

Sometimes, the vendor's servers (or Heroku's connection to them) lag, and the request takes over 20 seconds to be processed. I'll see how this evolves.

@chadwhitacre
Copy link
Contributor

@espadrine This sounds like a place where the caching strategy I was employing previously would be helpful: always return an image instantly, and then kick off a call to refresh the info.

@GrahamCampbell
Copy link
Author

@whit537 That would be useful for code coverage badges (coveralls). I see large delays with them, and even timeouts for image loading sometimes.

@chadwhitacre
Copy link
Contributor

There's a little more detail on the algorithm in comments here:

The first user to request a button always has to wait for the transform.

Subsequent users never have to wait for the transform. Rather than make them wait, we proceed with the last transform we did, no matter how stale it is. Before we do, however, we'll kick off a new transform in a thread, so that the next user after the thread completes will get a fresh transform.

This means that the average staleness of the buttons we serve will depend on the frequency with which they're requested. For active projects this will approach the TRANSFORM_TIMEOUT; for inactive projects, infinity. This is appropriate: stale buttons for stale projects, fresh buttons for fresh projects.

@GrahamCampbell
Copy link
Author

Coverage Status - Not working. :(
^^^ There should be a badge there...

@tedivm
Copy link

tedivm commented Apr 23, 2014

I'm having this issue as well. Maybe use some sort of caching or something so that when the vendor lags you can use slightly outdated data?

@GrahamCampbell
Copy link
Author

Yes @tedivm. That's exactly what @whit537 said.

@espadrine
Copy link
Member

This one fails way more often than any other vendor. I'd love to have a hard explanation. I'll try extending the cache in the next few days.

@BrendanThompson
Copy link

i seem to be having issues with mine as well, it's not updating the release version!

@espadrine
Copy link
Member

@BrendanThompson This sounds like a different issue than the one discussed here.

Can you open a new bug, give a link to the badge that is not updating, and to the project's repository location?

@krainboltgreene
Copy link

Having this issue with Release and Tag on Github: https://github.com/krainboltgreene/scrawl.gem

@espadrine
Copy link
Member

I made an update in which I manage a large cache that is used only when the vendor's server is unresponsive. It does mean that a badge that was never requested since the start of the Shields server will still show a "vendor unresponsive" badge.

@chadwhitacre
Copy link
Contributor

@espadrine Nice! :-)

@espadrine
Copy link
Member

As for the algorithm that @whit537 used before, I feel a bit strange about it, so I did a bit of math to explain why.

The data we provide occasionally gets changed, and the only way for us to know is to perform the request to the server. On the other hand, we receive requests for badges. Let's assume that requests for a particular badge and the data changes associated with the information provided by the badge are perfectly regular (they happen at fixed intervals with frequency rf for badge requests and df for data changes).

If we apply the suggested algorithm, a portion of the badges we provide will be outdated. The quotient of accurate badges we provide is

A(Δt) = 1 - min(# data change over Δt, # requests over Δt) / (# requests over Δt)
      = 1 - max(1, df) / rf

If the frequency of badge requests is twice as high as that of data changes, we only get half of our badges right, and our badges are outdated by up to 1 / rf seconds. In some cases, eg, when df = rf, all of our badges provide wrong information.

That said, we now have a huge cache that is hardly ever used, and in many cases it contains accurate information, which it can send fast.

Assuming we have a target minimum accuracy, say A ≥ α = 0.75, then we only need the following relation to hold true: df / rf ≤ 1 - α. Now, the part that gets interesting is that computing df / rf is rather cheap, since df / rf = (# requests since start) / (# data changes since start). We only need to increment two numbers for each badge. Better yet, to avoid getting NaN when we come close to infinity, we can reset both values simultaneously whenever one of them reaches 9007199254740992 (ie, MAX_INT in IEEE754).

@mhelvens
Copy link

Good to know you've put some thought into this. 👍

This will sound like bikeshedding after your last comment, but it might be useful to add a visible (yet unobtrusive) marking to a badge when its information was retrieved from cache, or when it is particularly stale. This should be optional, of course.

@chadwhitacre
Copy link
Contributor

👍

@tedivm
Copy link

tedivm commented Apr 30, 2014

I would rather have an outdated badge than a broken image show up on my thing.

Broken images just look unprofessional and sloppy, and I think that reflect far more poorly on a project than the occasion wrong graphic.

To add to that, most of the time when I release a new upgrade I have no problem hitting refresh on my readme page to see what it looks like- I can prime the cache myself.

I think you're completely overdoing it with that algorithm. Here's how it should work, in my opinion-

1. Request is made to load graphic.
2. Request to vendor server is made.

3a. The request to vendor times out after 20 seconds.
4a. A cached value is returned instead of the vendor one.

3a. The request to vendor times out after 20 seconds.
4a. There is no cached value to return, so "null" or "error" is passed back.

3b. The request to vendor succeeds.
4b. The value is cached before being returned.

5. A graphic is created using either the cached value, the vendor value, or a proper error.

Even if you decide it's better to show an error message instead of outdated information I feel the experience can be a lot better by fixing the error state itself. Right now we're not even getting proper errors, we're getting images that are just completely failing to load. However, I also feel that the occasionally wrong value is acceptable, especially since it should be a rather rare event that only occurs after an update is put out.

@espadrine
Copy link
Member

@tedivm I'm not discussing broken badges anymore. I have added a fix to show "server inaccessible" badges instead of broken images a couple of weeks ago, and around 12h ago I have added a cache to show the previous version of the badge instead whenever possible. Those changes are in. Therefore, the algorithm you propose is already in (except we send a proper error image instead of just "null" or "error"). If you witness failings in the current implementation, I am all ears; there might be bugs which I didn't notice.

The algorithm I suggested above is a way to make most badges snappier to load by using the cache I created to fix this issue. Therefore, it is actually partially irrelevant to this issue, which should be considered fixed.

@tedivm
Copy link

tedivm commented Apr 30, 2014

I didn't realize this was fixed because I'm still getting errors where the images do not load at all.

@espadrine
Copy link
Member

@mhelvens Do you mean an HTTP header? Which one would you suggest?

@espadrine
Copy link
Member

@tedivm Excellent! Is it reproducible? Which link does that?

(Edit: just loaded the logs, can't find an app timeout.)

@tedivm
Copy link

tedivm commented Apr 30, 2014

This link wasn't working for most of this morning, although it is now-

http://img.shields.io/coveralls/tedivm/JShrink.svg

@espadrine
Copy link
Member

@tedivm If you are in Europe or in Eastern US, that timeframe does coincide with a bug I fixed about 5h ago.

@tedivm
Copy link

tedivm commented Apr 30, 2014

The coverage thing wasn't working as recently as an hour ago, but it is working just fine right now.

@espadrine
Copy link
Member

@tedivm In which way exactly wasn't it working? Was it showing the "server inaccessible" badge, or a 500?

@tedivm
Copy link

tedivm commented Apr 30, 2014

There was no image showing at all, so I'm guessing a 500 error although I can't be sure (I wasn't tracking the headers).

@tedivm
Copy link

tedivm commented Apr 30, 2014

On a sidenote, you should add a "max-stale" header to your HTTP output so that CDN's and Proxy's will display the image even if there's a gateway timeout or something during revalidation.

@espadrine
Copy link
Member

@tedivm Hmm, can't find a 500 (or anything but a 200) for that URL in the logs. Ping me as soon as you experience this again.

I'll try to add a max-stale header.

@tedivm
Copy link

tedivm commented Jun 4, 2014

Still broken here- I'm getting "invalid" for all the releases on my projects.

espadrine added a commit that referenced this issue Jun 10, 2014
@tedivm
Copy link

tedivm commented Jun 10, 2014

None of my projects are showing the badges anymore. They are all failing by showing "invalid" as it's release.

@espadrine
Copy link
Member

@tedivm Hmm, that's really odd. I assume you mean http://img.shields.io/coveralls/tedivm/JShrink/master.svg (the others seem to look ok). "Unknown", for coveralls badges, means that the percentage did not parse as a number — or more specifically, it parsed as NaN. It seems to work locally, as well… I'll log some things, see what comes out.

@tedivm
Copy link

tedivm commented Jun 10, 2014

I'm talking about github releases, not coveralls.

Check out this image for Stash, it shows as invalid.

http://img.shields.io/github/release/tedious/stash.svg

@tedivm
Copy link

tedivm commented Jun 10, 2014

Same result for JShrink-

http://img.shields.io/github/release/tedious/jshrink.svg

@tedivm
Copy link

tedivm commented Jun 10, 2014

Coveralls is working fine, the reason you're seeing that error from yours is that I moved JShrink out of my personal account and to a new Github repository, and you're using the older link. The new one is working great for Coveralls- http://img.shields.io/coveralls/tedious/JShrink/master.svg

@espadrine
Copy link
Member

Ah, right, that was dumb on my part. I made the code work for GitHub tags, but didn't fully copy it for GitHub releases, which means it still had the 60 requests/minute limit. Should be fixed in a jiffy.

espadrine added a commit that referenced this issue Jun 10, 2014
@espadrine
Copy link
Member

It is (hopefully permanently) looking good now.

@tedivm
Copy link

tedivm commented Jun 10, 2014

Looks good, thank you!

@espadrine
Copy link
Member

Cool! ☺

@espadrine
Copy link
Member

Closing as fixed.

@pron
Copy link

pron commented Aug 11, 2014

delba added a commit to delba/JASON that referenced this issue Aug 15, 2015
delba added a commit to delba/Tactile that referenced this issue Aug 15, 2015
@gocanto
Copy link

gocanto commented Feb 26, 2016

any clue? I just came around this issue

@espadrine
Copy link
Member

The main issue left is rate limiting, which is this issue: #529.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants