A browser extension that vastly reduces the amount of spam in YouTube comments using a variety of techniques
By default, the following types of comments are allowed, ignore the below filters:
- The video author's comments
- The video's pinned comment
- Verified (checkmark) YouTube accounts
I manually assembled some commonly used phrases from spam comments, which act as a first line of defense.
Scammers pretending to be the video author are detected using a slightly modified version of Mapbox's pixelmatch library.
NSFW detection is handled by my JavaScript port of Yahoo's OpenNSFW model. It's the most accurate (and least biased) NSFW classifier I could find, so I ported it for use in this extension. This can be disabled in the extension's option page if you desire.
Comments containing websites that are not in our allowed sites list are marked as spam out of an abundance of caution. Below are the steps used to generate this list.
First, the list of allowed websites is retrieved from the Tranco project using these filters:
- Lists: Alex, Cisco Umbrella, Majestic
- Number of days: last 30
- Combination method: Dowdall rule
- Aggregate from full list
- Only include pay-level domains
- No TLD filtering
- Output length: 1 millon
See the latest Tranco list here
Second, we remove the TLDs *.edu, *.gov, and *.mil, as we blanket-allow them since registration requires approval from a regulatory organization.
Third, we remove domains that are present in StephenBlack's hosts file, which is a collection of known bad sites.
Fourth, we remove domains containing some common words/phrases that are often present in spam/abusive URLs.
Fifth, we limit the list to the top 100,000 to save space and lessen the chance of malicious URLs slipping through our filters.
Lastly, we add some known good domains to the list. If a YouTuber's legitimate domain gets flagged falsely, consider submitting a pull request to add it to the allow list.
Once one of a user's comments is marked as spam, all subsequent comments by that user are marked as spam.