Wire up typosquatting checks when new packages are published #7206

LawnGnome · 2023-09-29T02:50:47Z

This PR adds experimental support for checking the names of newly published crates for potential typosquatting. It uses essentially the same algorithm as typogard as adapted by @dangardner for offline crates.io checks, reimplemented in Rust by me in the new typomania crate.

I want to emphasise the word experimental in the above paragraph: right now, the only thing that will happen if a new crate might be typosquatting a popular crate is that an e-mail will be sent to a list of interested parties, which at present is defined as myself and @walterhpearce (who has kindly volunteered to help me with this), acting on behalf of the Rust Foundation Security Initiative. No private information is sent in those e-mails; it's a simple notification that crate A might be squatting crates B and C. If anyone else would like to be on that list, I'm happy to update the configuration accordingly.

Should there be actual typosquatting, we'll bring it to the crates.io team via Zulip and we can handle it from there using our usual process. (Well, realistically, I'll probably take off my Rust Foundation hat, put on my crates.io hat, and handle it myself in consultation with the rest of you.)

I would like to run this experiment for about two months before taking any further action, to judge the potential impact of using these checks for anything more formal or in a way that might impact the load on the crates.io team.

To be very clear, the crates.io team can say no to this. I promise I won't be mad.

Possible courses of action after that time period might include:

If the notifications are way too noisy to be useful, backing this PR out. (I've intentionally kept it as separate as possible — even in places where it would have been simpler to integrate more closely with the core of crates.io — to make this straightforward.)
If the notifications are a little bit too noisy, I may adjust the checks to try to improve the signal to noise ratio. (This may be something I do sooner if it's very obviously bad.)
If the notifications feel just right, then we might do one or both of:
1. Notify the crates.io team using Zendesk or some other method, and/or
2. If the crate quarantine RFC is ultimately finished and accepted, then we could also feed this into a potential quarantine of new crates that are potential typosquats.

To lay my cards on the table, my hope is that this ends up with course 3(i) — notifying the crates.io team in general. Using quarantine is a possibility, but it's probably something I'd prefer to keep off unless we're actively under a spam attack.

Technical detail

This is implemented as a new background job that is enqueued when a new crate is published. The background job keeps an in-memory cache of the most popular 3000 crates (as judged by download count) to make the checks speedy and minimise extra load on the database. (Regenerating the cache locally takes about 700 ms, so I'm not terribly concerned about this in practice.)

There's no TTL on the cache because (as @Turbo87 pointed out) the dynos get restarted every 24 hours anyway, which is plenty fresh enough for what we need.

With the top 3000 crates, the cache uses less than 1 MiB of heap. I think we can afford that.

Because we have to send e-mails from the background job, Emails now has to be plumbed through the Environment that is exposed to background jobs. I don't think this is a problem in practice, but it is an extra change.

Finally, as this is an experiment, the configuration is hardcoded into the new worker job module. If this becomes a longer term thing, this would be split out into our normal configuration system for easier management, but right now this isolates the changes as much as possible in worker::typosquat.

Cargo.toml

src/tests/util/test_app.rs

src/email.rs

src/worker/typosquat/config.rs

src/worker/typosquat/types.rs

src/worker/typosquat/cache.rs

src/worker/typosquat.rs

src/worker/typosquat/cache.rs

bors · 2023-10-03T08:22:17Z

☔ The latest upstream changes (presumably 5210643) made this pull request unmergeable. Please resolve the merge conflicts.

bors · 2023-10-31T09:54:49Z

☔ The latest upstream changes (presumably #7395) made this pull request unmergeable. Please resolve the merge conflicts.

LawnGnome · 2023-11-11T01:57:40Z

This has been heavily rebased after the various background worker changes in the last couple of weeks — although @Turbo87 looked at this a while back, I'd appreciate a re-review (preferably from him, but I'm also grateful to anyone else who wants to cast a quick eye over this).

Tested locally; all seems to work as expected based on trying to publish a serd crate.

src/worker/typosquat/database.rs

src/worker/environment.rs

src/worker/typosquat/mod.rs

src/controllers/krate/publish.rs

bors · 2023-11-13T20:39:35Z

☔ The latest upstream changes (presumably d40529a) made this pull request unmergeable. Please resolve the merge conflicts.

Turbo87

let's give this a try :)

Co-authored-by: Tobias Bieniek <tobias@bieniek.cloud>

This extends our new typosquatting checks (see rust-lang#7206) to detect an attack vector we've seen more recently where a bad actor tries to squat an existing, popular crate by adding or removing a common suffix (such as `-rs` or `-sys`). The suffix list in the configuration has been taken _approximately_ from the most popular suffixes in the existing set of crates, with a small amount of human judgement involved on which ones are more likely to be abused based on recent incidents.

LawnGnome added C-enhancement ✨ Category: Adding new behavior or a change to the way an existing feature works C-internal 🔧 Category: Nonessential work that would make the codebase more consistent or clear A-publish A-backend ⚙️ labels Sep 29, 2023

LawnGnome requested a review from a team September 29, 2023 02:50

LawnGnome self-assigned this Sep 29, 2023

Turbo87 reviewed Sep 29, 2023

View reviewed changes

LawnGnome force-pushed the typomania branch from 4dfe2ee to 48a559c Compare October 13, 2023 15:11

LawnGnome force-pushed the typomania branch from 05603fa to 32d9dcd Compare November 11, 2023 01:45

LawnGnome marked this pull request as ready for review November 11, 2023 01:53

LawnGnome requested a review from Turbo87 November 11, 2023 01:54

Turbo87 reviewed Nov 11, 2023

View reviewed changes

src/worker/typosquat/database.rs Outdated Show resolved Hide resolved

src/worker/environment.rs Show resolved Hide resolved

src/worker/typosquat/mod.rs Outdated Show resolved Hide resolved

Turbo87 reviewed Nov 11, 2023

View reviewed changes

src/controllers/krate/publish.rs Outdated Show resolved Hide resolved

Turbo87 force-pushed the typomania branch from 32d9dcd to ab718e5 Compare November 12, 2023 09:11

Turbo87 approved these changes Nov 14, 2023

View reviewed changes

LawnGnome and others added 6 commits November 14, 2023 11:26

worker: plumb Emails through the environment

15e6f49

cargo: add typomania dependency

c7b55c5

email: add notification e-mail for potential typosquatting

a6976cf

worker: add a job to check for typosquats

019a9af

Update src/controllers/krate/publish.rs

9baa7d6

Co-authored-by: Tobias Bieniek <tobias@bieniek.cloud>

typosquat: move to top level module

c176ab6

LawnGnome force-pushed the typomania branch from 323d221 to c176ab6 Compare November 14, 2023 19:29

LawnGnome added 2 commits November 14, 2023 11:37

typosquat: fix doc lint errors

5f6298a

worker: explain why typosquat cache only builds once

1705535

LawnGnome merged commit 7608eab into rust-lang:main Nov 14, 2023

bors mentioned this pull request Nov 14, 2023

Use the same feature name validation rule from Cargo #7500

Merged

LawnGnome mentioned this pull request Nov 21, 2023

typosquat: add suffix checks #7571

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wire up typosquatting checks when new packages are published #7206

Wire up typosquatting checks when new packages are published #7206

LawnGnome commented Sep 29, 2023 •

edited

Loading

bors commented Oct 3, 2023

bors commented Oct 31, 2023

LawnGnome commented Nov 11, 2023

bors commented Nov 13, 2023

Turbo87 left a comment

Wire up typosquatting checks when new packages are published #7206

Wire up typosquatting checks when new packages are published #7206

Conversation

LawnGnome commented Sep 29, 2023 • edited Loading

Technical detail

bors commented Oct 3, 2023

bors commented Oct 31, 2023

LawnGnome commented Nov 11, 2023

bors commented Nov 13, 2023

Turbo87 left a comment

Choose a reason for hiding this comment

LawnGnome commented Sep 29, 2023 •

edited

Loading