-
Notifications
You must be signed in to change notification settings - Fork 485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat-Implement httpcache middleware for GitHub API #203
Conversation
@@ -7,10 +7,11 @@ require ( | |||
github.com/golang/protobuf v1.4.3 // indirect | |||
github.com/google/go-github/v32 v32.1.0 | |||
github.com/kr/text v0.2.0 // indirect | |||
github.com/naveensrinivasan/httpcache v1.2.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fork of gregjones/httpcache#104 this PR.
Integration tests failure for 35f3b88dbb4e5dad84ca2916dc6db3d8e7e32d05 |
Integration tests success for 35f3b88dbb4e5dad84ca2916dc6db3d8e7e32d05 |
Integration tests success for 73ebc1a54963240d4ff9241dce169df0f5131478 |
Integration tests failure for e5b609b096fc67b393e928b3245a122f3919ef31 |
Integration tests success for e5b609b096fc67b393e928b3245a122f3919ef31 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple tiny nits, LGTM!
@@ -153,6 +152,20 @@ Signed-Releases: Fail 0 | |||
Signed-Tags: Fail 10 | |||
``` | |||
|
|||
### Caching | |||
|
|||
Scorecard uses `httpcache` with <https://docs.github.com/en/rest/overview/resources-in-the-rest-api#conditional-requests> for caching httpresponse. The default cache is in-memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe link to the etags stuff in GitHub - the real benefit is avoiding the API quota
// shouldUseDiskCache checks the env variables USE_DISK_CACHE and DISK_CACHE_PATH to determine if | ||
// disk should be used for caching. | ||
func shouldUseDiskCache() (string, bool) { | ||
if isDiskCache := os.Getenv(UseDiskCache); isDiskCache != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think you can avoid this if statement and go straight into ParseBool since "" parses as false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot! This is very exciting.
.gitignore
Outdated
@@ -23,3 +23,6 @@ results.json | |||
|
|||
# tools | |||
bin | |||
|
|||
#temp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this.
README.md
Outdated
|
||
To use disk cache two env variables have to be set `USE_DISK_CACHE=true` and `DISK_CACHE_PATH=./cache`. | ||
|
||
There are not TTL on cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"There is no"
} | ||
} | ||
} | ||
return "", false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe nil instead of ""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't do nil for a string in go
roundtripper/roundtripper_test.go
Outdated
t.Parallel() | ||
tests := []struct { | ||
name string | ||
want string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/want/diskCachePath
s/want1/useDiskCache
The GitHub API supports conditional requests https://docs.github.com/en/rest/overview/resources-in-the-rest-api#conditional-requests https://github.com/google/go-github supports Conditional requests https://github.com/google/go-github#conditional-requests As we are scaling more and more projects this would add a lot of value. Initial run fetches information using `httpcache` as a middleware, which caches the HTTP response initially in a large disk (PVC), probably move to Redis later as a cache instead of disk. Subsequent `cron runs` will utilize the `httpcache` for checking content modification and load it from the cache if it isn't modified, which reduces the hitting the Rate Limit of the GitHub API.
e5b609b
to
9645825
Compare
Integration tests success for 9645825ac6716a3a988b682ddc00145bb62695df |
* set GITHUB_TOKEN as default token * updates * Update doc * Update doc * updates * updates * update * update * update * update * updates
What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
Feature - Caching
What is the current behavior? (You can also link to an open issue here)
Scorecard scalability limitation: Reduce GitHub API calls #80 Reducing GitHub API calls to scale scanning repositories #202
What is the new behavior (if this is a feature change)?
The GitHub API supports conditional requests
https://docs.github.com/en/rest/overview/resources-in-the-rest-api#conditional-requests
https://github.com/google/go-github supports Conditional requests
https://github.com/google/go-github#conditional-requests
As we are scaling more and more projects this would add a lot of value.
Initial run fetches information using
httpcache
as a middleware,which caches the HTTP response initially in a large disk (PVC),
probably move to Redis later as a cache instead of disk.
Subsequent
cron runs
will utilize thehttpcache
for checking content modification andload it from the cache if it isn't modified, which reduces the hitting the
Rate Limit of the GitHub API.
Also fixed the golang-ci warnings.
Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)
None
Other information:
Subsequent cache runs on 50 repositories takes about 18 minutes with
3
GitHub tokensFolder size
Files in the folder