Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat-Implement httpcache middleware for GitHub API #203

Merged
merged 1 commit into from
Feb 22, 2021

Conversation

naveensrinivasan
Copy link
Member

  • Please check if the PR fulfills these requirements
  • Tests for the changes have been added (for bug fixes/features)

https://github.com/google/go-github supports Conditional requests
https://github.com/google/go-github#conditional-requests

As we are scaling more and more projects this would add a lot of value.

Initial run fetches information using httpcache as a middleware,
which caches the HTTP response initially in a large disk (PVC),
probably move to Redis later as a cache instead of disk.

Subsequent cron runs will utilize the httpcache for checking content modification and
load it from the cache if it isn't modified, which reduces the hitting the
Rate Limit of the GitHub API.

Also fixed the golang-ci warnings.

  • Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)
    None

  • Other information:

Subsequent cache runs on 50 repositories takes about 18 minutes with 3 GitHub tokens

❯ time ./cron.sh
github.com/18F/identity-idp
github.com/1N3/Sn1per
github.com/24pullrequests/24pullrequests
github.com/AElfProject/AElf
github.com/Activiti/Activiti
github.com/AdguardTeam/AdguardFilters
github.com/AlchemyCMS/alchemy_cms
github.com/AliasIO/wappalyzer
github.com/Amanieu/parking_lot
github.com/abonas/kubeclient
github.com/abpframework/abp
github.com/acassen/keepalived
github.com/acmesh-official/acme.sh
github.com/actions/runner
github.com/activeadmin/activeadmin
github.com/activemerchant/active_merchant
github.com/activerecord-hackery/ransack
github.com/activescaffold/active_scaffold
github.com/actix/actix
github.com/actix/actix-extras
github.com/actix/actix-net
github.com/actix/actix-web
github.com/adafruit/Adafruit_CircuitPython_Bundle
github.com/adempiere/adempiere
github.com/afragen/github-updater
github.com/aframevr/aframe
github.com/ai/autoprefixer-rails
github.com/aio-libs/aiohttp
github.com/airbnb/javascript
github.com/airbrake/airbrake
github.com/akaunting/akaunting
github.com/akeneo/pim-community-dev
github.com/akka/akka
github.com/akka/akka-http
github.com/akkadotnet/akka.net
github.com/alacritty/alacritty
github.com/alanmcgovern/monotorrent
github.com/alexcrichton/cc-rs
github.com/alexcrichton/curl-rust
github.com/alexcrichton/proc-macro2
github.com/alexcrichton/toml-rs
github.com/alexreisner/geocoder
github.com/alibaba/dragonwell8
github.com/alibaba/druid
github.com/alibaba/fastjson
github.com/alibaba/nacos
github.com/allenai/allennlp
github.com/allinurl/goaccess
github.com/alphagov/whitehall
github.com/alpinelinux/aports
./cron.sh  46.68s user 19.95s system 6% cpu 18:26.70 total

Folder size

du -sh ./cache
801M	

Files in the folder

find ./cache -type f | wc -l
10347

@@ -7,10 +7,11 @@ require (
github.com/golang/protobuf v1.4.3 // indirect
github.com/google/go-github/v32 v32.1.0
github.com/kr/text v0.2.0 // indirect
github.com/naveensrinivasan/httpcache v1.2.1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fork of gregjones/httpcache#104 this PR.

@github-actions
Copy link

Integration tests failure for 35f3b88dbb4e5dad84ca2916dc6db3d8e7e32d05

@github-actions
Copy link

Integration tests success for 35f3b88dbb4e5dad84ca2916dc6db3d8e7e32d05

@github-actions
Copy link

Integration tests success for 73ebc1a54963240d4ff9241dce169df0f5131478

@github-actions
Copy link

Integration tests failure for e5b609b096fc67b393e928b3245a122f3919ef31

@github-actions
Copy link

Integration tests success for e5b609b096fc67b393e928b3245a122f3919ef31

Copy link
Contributor

@dlorenc dlorenc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple tiny nits, LGTM!

@@ -153,6 +152,20 @@ Signed-Releases: Fail 0
Signed-Tags: Fail 10
```

### Caching

Scorecard uses `httpcache` with <https://docs.github.com/en/rest/overview/resources-in-the-rest-api#conditional-requests> for caching httpresponse. The default cache is in-memory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe link to the etags stuff in GitHub - the real benefit is avoiding the API quota

// shouldUseDiskCache checks the env variables USE_DISK_CACHE and DISK_CACHE_PATH to determine if
// disk should be used for caching.
func shouldUseDiskCache() (string, bool) {
if isDiskCache := os.Getenv(UseDiskCache); isDiskCache != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think you can avoid this if statement and go straight into ParseBool since "" parses as false.

Copy link
Contributor

@inferno-chromium inferno-chromium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! This is very exciting.

.gitignore Outdated
@@ -23,3 +23,6 @@ results.json

# tools
bin

#temp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this.

README.md Outdated

To use disk cache two env variables have to be set `USE_DISK_CACHE=true` and `DISK_CACHE_PATH=./cache`.

There are not TTL on cache.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"There is no"

}
}
}
return "", false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe nil instead of ""

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't do nil for a string in go

t.Parallel()
tests := []struct {
name string
want string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/want/diskCachePath
s/want1/useDiskCache

The GitHub API supports conditional requests
https://docs.github.com/en/rest/overview/resources-in-the-rest-api#conditional-requests

https://github.com/google/go-github supports Conditional requests
https://github.com/google/go-github#conditional-requests

As we are scaling more and more projects this would add a lot of value.

Initial run fetches information using `httpcache` as a middleware,
which caches the HTTP response initially in a large disk (PVC),
probably move to Redis later as a cache instead of disk.

Subsequent `cron runs` will utilize the `httpcache` for checking content modification and
load it from the cache if it isn't modified, which reduces the hitting the
Rate Limit of the GitHub API.
@github-actions
Copy link

Integration tests success for 9645825ac6716a3a988b682ddc00145bb62695df

@naveensrinivasan naveensrinivasan merged commit db81680 into main Feb 22, 2021
@naveensrinivasan naveensrinivasan deleted the feat/caching branch January 14, 2022 22:52
justaugustus pushed a commit to justaugustus/scorecard that referenced this pull request May 25, 2022
* set GITHUB_TOKEN as default token

* updates

* Update doc

* Update doc

* updates

* updates

* update

* update

* update

* update

* updates
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants