🐛 Deduplicate projects in cron job by excluding URL queries and fragments #2201
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What kind of change does this PR introduce?
bug fix
What is the current behavior?
cron/internal/data/projects.csv
has duplicate entries, which leads to duplicates in the public BigQuery datae.g.:
In the latest BigQuery data, there are two repos with the name
github.com/adobe-fonts/source-code-pro
. One received a score of 4.8 and the other 4.4. The difference is only due to a rate limit error that occurred during one run. They return identical results for me locally.What is the new behavior (if this is a feature change)?**
Uses https://pkg.go.dev/net/url#URL to pull out
host
andpath
when deduplicatingcron/internal/data/projects.csv
entries. Ignoring queries, fragments, etc.Which issue(s) this PR fixes
NONE
Special notes for your reviewer
Changes to
cron/internal/data/projects.csv
were generated by runningmake add-projects
Does this PR introduce a user-facing change?
For user-facing changes, please add a concise, human-readable release note to
the
release-note
(In particular, describe what changes users might need to make in their
application as a result of this pull request.)