-
Notifications
You must be signed in to change notification settings - Fork 625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wire up typosquatting checks when new packages are published #7206
Conversation
☔ The latest upstream changes (presumably 5210643) made this pull request unmergeable. Please resolve the merge conflicts. |
☔ The latest upstream changes (presumably #7395) made this pull request unmergeable. Please resolve the merge conflicts. |
This has been heavily rebased after the various background worker changes in the last couple of weeks — although @Turbo87 looked at this a while back, I'd appreciate a re-review (preferably from him, but I'm also grateful to anyone else who wants to cast a quick eye over this). Tested locally; all seems to work as expected based on trying to publish a |
☔ The latest upstream changes (presumably d40529a) made this pull request unmergeable. Please resolve the merge conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's give this a try :)
Co-authored-by: Tobias Bieniek <tobias@bieniek.cloud>
This extends our new typosquatting checks (see rust-lang#7206) to detect an attack vector we've seen more recently where a bad actor tries to squat an existing, popular crate by adding or removing a common suffix (such as `-rs` or `-sys`). The suffix list in the configuration has been taken _approximately_ from the most popular suffixes in the existing set of crates, with a small amount of human judgement involved on which ones are more likely to be abused based on recent incidents.
This PR adds experimental support for checking the names of newly published crates for potential typosquatting. It uses essentially the same algorithm as typogard as adapted by @dangardner for offline crates.io checks, reimplemented in Rust by me in the new typomania crate.
I want to emphasise the word experimental in the above paragraph: right now, the only thing that will happen if a new crate might be typosquatting a popular crate is that an e-mail will be sent to a list of interested parties, which at present is defined as myself and @walterhpearce (who has kindly volunteered to help me with this), acting on behalf of the Rust Foundation Security Initiative. No private information is sent in those e-mails; it's a simple notification that crate A might be squatting crates B and C. If anyone else would like to be on that list, I'm happy to update the configuration accordingly.
Should there be actual typosquatting, we'll bring it to the crates.io team via Zulip and we can handle it from there using our usual process. (Well, realistically, I'll probably take off my Rust Foundation hat, put on my crates.io hat, and handle it myself in consultation with the rest of you.)
I would like to run this experiment for about two months before taking any further action, to judge the potential impact of using these checks for anything more formal or in a way that might impact the load on the crates.io team.
To be very clear, the crates.io team can say no to this. I promise I won't be mad.
Possible courses of action after that time period might include:
To lay my cards on the table, my hope is that this ends up with course 3(i) — notifying the crates.io team in general. Using quarantine is a possibility, but it's probably something I'd prefer to keep off unless we're actively under a spam attack.
Technical detail
This is implemented as a new background job that is enqueued when a new crate is published. The background job keeps an in-memory cache of the most popular 3000 crates (as judged by download count) to make the checks speedy and minimise extra load on the database. (Regenerating the cache locally takes about 700 ms, so I'm not terribly concerned about this in practice.)
There's no TTL on the cache because (as @Turbo87 pointed out) the dynos get restarted every 24 hours anyway, which is plenty fresh enough for what we need.
With the top 3000 crates, the cache uses less than 1 MiB of heap. I think we can afford that.
Because we have to send e-mails from the background job,
Emails
now has to be plumbed through theEnvironment
that is exposed to background jobs. I don't think this is a problem in practice, but it is an extra change.Finally, as this is an experiment, the configuration is hardcoded into the new worker job module. If this becomes a longer term thing, this would be split out into our normal configuration system for easier management, but right now this isolates the changes as much as possible in
worker::typosquat
.