This tool is used to reliably enumerate projects on GitHub.
The output of this tool is can be used as an input for the criticality_score
tool, or for input for the collect_signals
worker.
$ export GITHUB_TOKEN=ghp_x # Personal Access Token Goes Here
$ enumerate_github \
-start 2008-01-01 \
-min-stars=10 \
-workers=1 \
-out=github_projects.txt
$ go install github.com/ossf/criticality_score/v2/cmd/enumerate_github@latest
$ enumerate_github [FLAGS]...
The URL for each repository is written to the output. By default stdout
is used
for output.
FLAGS
are optional. See below for documentation.
A comma delimited environment variable with one or more GitHub Personal Access Tokens must be set
Supported environment variables are GITHUB_AUTH_TOKEN
, GITHUB_TOKEN
,
GH_TOKEN
, or GH_AUTH_TOKEN
.
Example:
$ export GITHUB_TOKEN=ghp_abc,ghp_123
-out FILE
specify theFILE
to use for output. By defaultstdout
is used.-append
appends output toFILE
if it already exists.-force
overwritesFILE
if it already exists and-append
is not set.-format {text|scorecard}
indicates the format to use for output.text
is used by default and consists of one URL per line.scorecard
outputs a CSV file compatible with the scorecard project.
If FILE
exists and neither -append
nor -force
is set the command will fail.
-start date
the start date to enumerate back to. Must be at or after2008-01-01
. Defaults to2008-01-01
.-end date
the end date to enumerate from. Defaults to today's date.
-min-stars int
only enumerates repositories with this or more of stars Defaults to10
.-query string
sets the base query to use for enumeration. Defaults tois:public
. See GitHub's search help for more detail.-require-min-stars
abort execution if-min-stars
can't be reached during enumeration. If not set some repositories created on a certain date may not be included.-star-overlap int
the number of stars to overlap between queries. Defaults to5
. A an overlap is used to avoid missing repositories whose star count changes during enumeration.
-log level
set the level of logging. Can bedebug
,info
(default),warn
orerror
.-workers int
the total number of concurrent workers to use. Default is1
.-help
displays help text.
Refer to Milestone 1 for details on the algorithm.
10 has been successfully tested, although lower may be possible.
TODO -- more detail
A single GitHub Personal Access Token took about 4 hours to return all projects with >= 20 stars.
Faster performance can be achieved with more Personal Access Tokens and additional workers.
Generally, use 1 worker for each Personal Access Token.
More workers than tokens may result in secondary rate limits.
It is possible that more restricted searches will succeed with more workers per token.
Rather than installing the binary, use go run
to run the command.
For example:
$ go run ./cmd/enumerate_github [FLAGS]...
Limiting the data allows for runs to be completed quickly. For example:
$ go run ./cmd/enumerate_github \
-log=debug \
-start=2022-06-14 \
-end=2022-06-21 \
-min-stars=20