Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Deduplicate projects in cron job by excluding URL queries and fragments #2201

Merged
merged 1 commit into from
Aug 26, 2022

Conversation

spencerschrock
Copy link
Member

What kind of change does this PR introduce?

bug fix

What is the current behavior?

cron/internal/data/projects.csv has duplicate entries, which leads to duplicates in the public BigQuery data
e.g.:

github.com/adobe-fonts/source-code-pro,criticality_score:0.386850
github.com/adobe-fonts/source-code-pro#release,

In the latest BigQuery data, there are two repos with the name github.com/adobe-fonts/source-code-pro. One received a score of 4.8 and the other 4.4. The difference is only due to a rate limit error that occurred during one run. They return identical results for me locally.

What is the new behavior (if this is a feature change)?**

Uses https://pkg.go.dev/net/url#URL to pull out host and path when deduplicating cron/internal/data/projects.csv entries. Ignoring queries, fragments, etc.

  • Tests for the changes have been added (for bug fixes/features)

Which issue(s) this PR fixes

NONE

Special notes for your reviewer

Changes to cron/internal/data/projects.csv were generated by running make add-projects

Does this PR introduce a user-facing change?

For user-facing changes, please add a concise, human-readable release note to
the release-note

(In particular, describe what changes users might need to make in their
application as a result of this pull request.)

NONE

@codecov
Copy link

codecov bot commented Aug 26, 2022

Codecov Report

Merging #2201 (f34e0db) into main (9460030) will increase coverage by 2.40%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main    #2201      +/-   ##
==========================================
+ Coverage   42.28%   44.68%   +2.40%     
==========================================
  Files          95       95              
  Lines        7871     7871              
==========================================
+ Hits         3328     3517     +189     
+ Misses       4283     4087     -196     
- Partials      260      267       +7     

Copy link
Member

@naveensrinivasan naveensrinivasan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@github-actions
Copy link

Integration tests success for
[f34e0db]
(https://github.com/ossf/scorecard/actions/runs/2935923579)

@azeemshaikh38 azeemshaikh38 merged commit 11ff78e into ossf:main Aug 26, 2022
@spencerschrock spencerschrock deleted the project-dedup branch August 29, 2022 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants