Reducing GitHub API calls to scale scanning repositories #202

naveensrinivasan · 2021-02-21T01:54:03Z

The GitHub API calls are throttled which makes it hard to scale the number of repositories to scan and provide results.

The code would have to wait for tens of minutes before continuing
{"level":"warn","ts":1613869247.8747272,"caller":"roundtripper/roundtripper.go:139","msg":"Rate limit exceeded. Waiting 44m34.125286853s to retry..."}

Scorecard checks for these don't need GitHub API, it requires a Git API

Active
Frozen-Deps
CodeQLInCheckDefinitions
Security-Policy
Packaging

Potential solution

Clone the Git Repo
Git pull on these repo's on a cron - to get the updates
Use an API to query these repositories directly instead of the GitHub

The https://github.com/go-git/go-git project provides an API on Git which could be used for avoiding the GitHub API limitations.

With httpcache #80 (comment) and reducing the number of GitHub API calls, we should be able to scale the scanning number of repositoreis.

related to #80

The text was updated successfully, but these errors were encountered:

naveensrinivasan · 2021-02-21T02:48:32Z

Also, this approach could be used to Scan not GitHub repositories like Gitlab which helps one API irrespective of the provider.

inferno-chromium · 2021-02-21T03:43:07Z

Cloning might be too heavy weight for some big repos, and slow too.

Maybe lets start with httpcache first. We can also scale the number of github tokens. right now, we have 2, we can easily go into 4-5 (seperated with comma).

naveensrinivasan · 2021-02-21T03:55:47Z

Cloning might be too heavy weight for some big repos, and slow too.

Maybe lets start with httpcache first. We can also scale the number of github tokens. right now, we have 2, we can easily go into 4-5 (seperated with comma).

Cloning can be async as another cron job and it is a one-time effort. Cloning should not be run as part of scorecard , probably give an additional option to look at a location for cached git repo's if not fetch them from github.com.

inferno-chromium · 2021-02-21T04:09:20Z

Cloning might be too heavy weight for some big repos, and slow too.
Maybe lets start with httpcache first. We can also scale the number of github tokens. right now, we have 2, we can easily go into 4-5 (seperated with comma).

Cloning can be async as another cron job and it is a one-time effort. Cloning should not be run as part of scorecard , probably give an additional option to look at a location for cached git repo's if not fetch them from github.com.

Makes sense.

azeemshaikh38 · 2021-05-26T19:56:43Z

Closing this since we are already tracking this here: #318

inferno-chromium closed this as completed Feb 21, 2021

inferno-chromium reopened this Feb 21, 2021

naveensrinivasan mentioned this issue Feb 21, 2021

Feat-Implement httpcache middleware for GitHub API #203

Merged

1 task

naveensrinivasan added the GitHub label Feb 21, 2021

naveensrinivasan linked a pull request Feb 24, 2021 that will close this issue

Feat- Use cloud buckets for caching #215

Merged

1 task

naveensrinivasan removed a link to a pull request Feb 24, 2021

Feat- Use cloud buckets for caching #215

Merged

1 task

naveensrinivasan mentioned this issue Mar 1, 2021

gitcache - Scaling the scorecard scans #227

Merged

1 task

inferno-chromium added the priority/must-do Upcoming release label Mar 22, 2021

azeemshaikh38 closed this as completed May 26, 2021

azeemshaikh38 added the duplicate This issue or pull request already exists label May 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reducing GitHub API calls to scale scanning repositories #202

Reducing GitHub API calls to scale scanning repositories #202

naveensrinivasan commented Feb 21, 2021

naveensrinivasan commented Feb 21, 2021

inferno-chromium commented Feb 21, 2021

naveensrinivasan commented Feb 21, 2021

inferno-chromium commented Feb 21, 2021

azeemshaikh38 commented May 26, 2021

Reducing GitHub API calls to scale scanning repositories #202

Reducing GitHub API calls to scale scanning repositories #202

Comments

naveensrinivasan commented Feb 21, 2021

Scorecard checks for these don't need GitHub API, it requires a Git API

Potential solution

naveensrinivasan commented Feb 21, 2021

inferno-chromium commented Feb 21, 2021

naveensrinivasan commented Feb 21, 2021

inferno-chromium commented Feb 21, 2021

azeemshaikh38 commented May 26, 2021