A Golang http.RoundTripper optimized for caching responses from GitHub's REST API via conditional requests (ETag).
While the GitHub REST API is incredibly powerful, it enforces some strict rate-limits on incoming requests. As of February 2025, the primary rate-limits for the REST API are:
60
requests per hour for unauthenticated requests [details]5,000
requests per hour for authenticated requests [details]- This quota is shared across all personal access tokens of the authenticated user AND any GitHub/OAuth applications that have been authorized to make requests on behalf of the authenticated user.
5,000
requests per hour for GitHub Apps authenticated via an installation access token [details]- For installations on a repository owned by a GitHub Enterprise Cloud (GHEC) organization, this limit is increased to
15,000
requests per hour. - For non-GHEC repositories, the rate-limit will scale up based on the number of users and repositories the GitHub App is installed on, with an upper limit of
12,500
requests per hour.
- For installations on a repository owned by a GitHub Enterprise Cloud (GHEC) organization, this limit is increased to
1,000
requests per hour per repository for GitHub Actions workloads [details]- This quota is shared across all running GitHub Actions workflows in a given repository.
- For requests to resources that belong to a GitHub Enterprise Cloud (GHEC) organization, this limit is increased to
15,000
requests per hour.
For GitHub Enterprise Server (GHES) primary rate-limits are disabled by default but can be enabled by an administrator.
Fortunately, GitHub has implemented a feature called conditional requests which allows callers to make requests to the REST API without counting against their REST API rate-limit if the response has not changed since the last request via ETag headers, ex:
$ curl --request GET \
--url https://api.github.com/users/bored-engineer \
--header "X-GitHub-Api-Version: 2022-11-28" \
--include
HTTP/1.1 200 OK
...
ETag: "888348e1cff03510691fbf1eb221df5cf3c3c4651d7275d118372876e8cf9f5d"
X-RateLimit-Used: 1
$ curl --request GET \
--url https://api.github.com/users/bored-engineer \
--header "X-GitHub-Api-Version: 2022-11-28" \
--header 'If-None-Match: "888348e1cff03510691fbf1eb221df5cf3c3c4651d7275d118372876e8cf9f5d"' \
--include
HTTP/1.1 304 Not Modified
...
X-RateLimit-Used: 0
While this requires each client to use/implement their own RFC 7234 compliant cache for HTTP responses (ex: bored-engineer/httpcache), it can be an excellent option for applications that request REST API resources (ex: /users/{username} or /repos/{owner}/{repo}) which are unlikely to change frequently.
Unfortunately, when you actually try to implement conditional requests inside your client, you'll quickly find that they do not work well when combined with authentication.
This is because the response ETag
value is based (in part) on the Authorization
header included in the request, ex:
$ curl --request GET \
--url https://api.github.com/users/bored-engineer \
--header "X-GitHub-Api-Version: 2022-11-28" \
--header "Authorization: Bearer ${GITHUB_TOKEN}" \
--include
HTTP/1.1 200 OK
...
ETag: "993db4dbff350f7d8d5a92c3926fdab6311ff93963bc237343f07302c3ee3335"
$ curl --request GET \
--url https://api.github.com/users/bored-engineer \
--header "X-GitHub-Api-Version: 2022-11-28" \
--header "Authorization: Bearer ${OTHER_GITHUB_TOKEN_FOR_SAME_USER}" \
--include
HTTP/1.1 200 OK
...
ETag: "e2bf720fe34544a7dc8c92b200010af26c2b2b09801cfc9b763bd34b6a57e8bb"
On the surface, this behavior makes a lot of sense, the response from the REST API is going to change based on which authenticated user is making the request, or even for the same user if the token used for the request has been granted different permissions.
However, this means that whenever the Bearer
token used in the request changes (such as when a new personal access token is generated), the entire cache will become invalid and the client will quickly hit the REST API rate-limits.
This is especially problematic when using a GitHub App installation access token (one of the most common/useful authentication schemes in an enterprise context) because the access token is rotated once every hour, effectively guaranteeing there will be no benefit from using conditional requests.
When fetching public resources you can work-around this problem by authenticating using Basic Authentication, providing the GitHub/OAuth App's Client ID and Client Secret as the username and password respectively. However, this only works for public resources and still suffers the same problem when the Client Secret is rotated.
Ideally, GitHub would fix this problem by instead using a constant value derived from the Authorization
header in the calculation of the ETag
value, such as the GitHub App ID or the User ID associated with a PAT
Until that happens, I've spent some time reverse-engineering the GitHub ETag
algorithm to work-around this problem client-side...
Warning
This section describes the implementation of the ETag
header based on obversations/testing on github.com as of February 2025. Critically, this implementation is not documented by GitHub and could change without any notice, breaking/invalidating this package/section.
At it's core, the ETag
is a SHA-256 hash of the HTTP response body (before any compression), prepended with the value of the following headers (in this order, and only if present), separated by a :
...
Accept
Authorization
Cookie
For example, if we remove any of the above headers from the request (Accept
is added by default by curl
), the ETag
is just the SHA-256 of the response body:
$ curl --request GET \
--url https://api.github.com/users/bored-engineer \
--header "X-GitHub-Api-Version: 2022-11-28" \
--header "Accept:" \
--user-agent "Go-http-client/1.1" \
--verbose |
sha256sum /dev/stdin
HTTP/1.1 200 OK
...
< etag: W/"09dc54dc03e2e556eac3b0aeec31a70ffc04dc18a985144174d184815fe6ddea"
...
09dc54dc03e2e556eac3b0aeec31a70ffc04dc18a985144174d184815fe6ddea /dev/stdin
And the more complex case where both Accept
and Authorization
are present:
$ {
printf "*/*"; # Accept:
printf ":";
printf "Bearer ${GITHUB_TOKEN}"; # Authorization:
printf ":";
curl --request GET \
--url https://api.github.com/users/bored-engineer \
--header "X-GitHub-Api-Version: 2022-11-28" \
--header "Accept: */*" \
--header "Authorization: Bearer ${GITHUB_TOKEN}" \
--user-agent "Go-http-client/1.1" \
--verbose
} | sha256sum /dev/stdin
HTTP/1.1 200 OK
...
< etag: "993db4dbff350f7d8d5a92c3926fdab6311ff93963bc237343f07302c3ee3335"
...
993db4dbff350f7d8d5a92c3926fdab6311ff93963bc237343f07302c3ee3335 /dev/stdin
In both of the examples, the User-Agent
is set to Go-http-client/1.1
because the GitHub REST API will pretty-print the response JSON if the User-Agent
contains "curl". However this happens after the ETag
has been calculated, corrupting the checksum/demos.
Using this reverse-engineered ETag
algorithm, we can develop a http.RoundTripper that allows a GitHub REST API response that was cached/returned for a different Authorization
header to be safely reused. The logic for handling a HTTP request is roughly:
- If the HTTP request method is anything other than
GET
orHEAD
- Return early, executing the request as-is because it will not be a cacheable HTTP response
- Retrieve the cached HTTP response body bytes from the cache storage using the URL as the key:
- If no cached HTTP response is available, return early, executing the request as-is
- Calculate the expected
ETag
(via SHA-256) using the request HTTP headers and the cached response body bytes - Add the expected
ETag
to the request via theIf-None-Modified
header, then perform the HTTP request - If the HTTP response code is
304 Not Modified
, our cached value is still valid- Return the response headers (request-id, ratelimit headers, etc) but the cached HTTP response bytes
- If the HTTP response code is
200 OK
or201 Created
and anETag
header is present- Store the response bytes in the cache storage
- Return the HTTP response
Here is some example usage using the bbolt storage backend and the google/go-github client:
package main
import (
"context"
"log"
"net/http"
"os"
ghtransport "github.com/bored-engineer/github-conditional-http-transport"
bboltstorage "github.com/bored-engineer/github-conditional-http-transport/bbolt"
"github.com/google/go-github/v68/github"
)
func main() {
client := github.NewClient(&http.Client{
Transport: ghtransport.NewTransport(
bboltstorage.MustOpen("cache.db", 0644, nil, nil),
http.DefaultTransport,
),
}).WithAuthToken(os.Getenv("GITHUB_TOKEN"))
for loop := 0; loop < 3; loop++ {
_, resp, err := client.Users.Get(context.TODO(), "bored-engineer")
if err != nil {
log.Fatalf("(*github.Client).Users.Get failed: %v", err)
}
log.Println(resp.Header.Get("X-Ratelimit-Used"))
}
}