Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a spam check #11217

Merged
merged 12 commits into from
Jul 13, 2019
Merged

Add a spam check #11217

merged 12 commits into from
Jul 13, 2019

Conversation

Gargron
Copy link
Member

@Gargron Gargron commented Jun 30, 2019

When we receive a remote status that mentions local users who are not following that account (and it's not a response to something involving the sender), we want to check if a similar message has already been sent before with different recipients. For this we store hashes of the 10 most recent messages by the sender, where the hash is based on the normalized body of the status.

To normalize the body, we remove all mention links, then remove all HTML tags. We normalize the unicode and convert everything to lowercase. Finally, we also strip out all whitespace and line breaks.

The hash is generated using the locality-sensitive Nilsimsa algorithm. Slightly different inputs result in very similar outputs. These can then be compared using Nilsima Compare Value which can be a number between -128 and 128, where 128 indicates identical strings. This paper indicates that a threshold of 54 is mostly free of false positives.

Of course, the shorter the messages are, the more inaccurate this algorithm becomes. For this reason, for messages shorter than 10 characters, we fall back on the MD5 algorithm.


The limitation of this approach is that the first person to receive a spam message will see it.


A positive spam check auto-silences the offender and generates an automated report to have a human in the loop. Unsilencing the account raises its trust level to 1 which prevents further spam checks on it.

@Gargron Gargron force-pushed the feature-spam-check branch 4 times, most recently from c4b5340 to 320eb3a Compare June 30, 2019 19:36
@Gargron Gargron marked this pull request as ready for review June 30, 2019 19:36
@ClearlyClaire
Copy link
Contributor

ClearlyClaire commented Jun 30, 2019

Interesting. I have a hard time figuring out how well it would perform, though. Therefore I'm very uneasy with automated silencing based on a single “near-duplicated” post.
The flagging part could report the actual status if that one is public/unlisted. I'm not sure the flagging part should flag private toots or DMs…

EDIT: Also, this is for toots coming from remote instances. We might want something similar for local users?

@ClearlyClaire
Copy link
Contributor

Hm, this seems very likely to silence+report people who reply to non-followers with short messages.
This also sounds like it would be a problem for customer service-like accounts who would provide similar answers to similar questions.
And I'm very worried about accidentally leaking private/direct toots.

@ClearlyClaire
Copy link
Contributor

I would also add a comment to the report, making it very clear it is an automated report and what were the criteria. Otherwise this is going to be really confusing.

@ClearlyClaire
Copy link
Contributor

ClearlyClaire commented Jun 30, 2019

Also, automatically silencing the offending account can be unexpected, especially if you skip creating a report (e.g. when there is already an unresolved report).

@Gargron
Copy link
Member Author

Gargron commented Jun 30, 2019

I would also add a comment to the report, making it very clear it is an automated report and what were the criteria.

But is this going to be a localized string? Based on DEFAULT_LOCALE?

@ClearlyClaire
Copy link
Contributor

I would also add a comment to the report, making it very clear it is an automated report and what were the criteria.

But is this going to be a localized string? Based on DEFAULT_LOCALE?

Yes, probably

@Gargron
Copy link
Member Author

Gargron commented Jun 30, 2019

And I'm very worried about accidentally leaking private/direct toots.

Please do be mindful of the wording. By "leaking" you mean exposing them to the mods/admins of the instance that receives them within the admin UI. If we don't auto-report such toots while auto-silencing, it will lead to false positives never being corrected. If we exempt private/direct toots from the spam check, the spammers will switch to them, making this whole exercise pointless.

@ClearlyClaire
Copy link
Contributor

Yes, leaking to the admins. Since it's automated, we really risk disclosing it to the admins. We can still open an automated report without including the status if it is private (and make note of that in the report comments)

@trinity-1686a
Copy link

trinity-1686a commented Jun 30, 2019

I don't do ruby so I don't understand most of the code, but you seem to use the Levenshtein distance. Quoting a paper about Nilsimsa :

To determine if two messages present the same textual content, their Nilsimsa digests are compared, checking the number of bits in the same position that have the same value.

The Levenshtein fail to acknowledge the same position condition. For instance a hash of "01010101" and one of "10101010" have nothing in common according to the required metric, but are at a Levenshtein distance of only 2 (one add one delete)

@Gargron
Copy link
Member Author

Gargron commented Jun 30, 2019

Thank you for pointing it out! The Ruby example I was following seems to be really wrong on that point then. So then the appropriate comparison is to simply iterate over one digest and check each character for match in the other digest at the same position, and count up when there is a match?

@trinity-1686a
Copy link

The proper way would be to do that at the bit level, doing it at the character level means the actual threshold is somewhere in between 10-40 (assuming hex string), or 10-80 (assuming any character, 8b encoding), instead of the fixed 10 you chose. In a lower level language, I would xor raw hashes and count 1s, but that might not be very idiomatic in ruby

@kaniini
Copy link
Contributor

kaniini commented Jul 1, 2019

fwiw in my own testing, I found Nilsimsa strategy to yield too many false positives.

@Gargron Gargron force-pushed the feature-spam-check branch from 6d70ce7 to 9dd3503 Compare July 1, 2019 02:38
@Gargron
Copy link
Member Author

Gargron commented Jul 1, 2019

EDIT: Also, this is for toots coming from remote instances. We might want something similar for local users?

The biggest spam vector in the fediverse is "other instances", not the local server. Local moderation is already quite sufficient with human moderators and approval-mode registrations being available.

fwiw in my own testing, I found Nilsimsa strategy to yield too many false positives.

I have found TLSH to be unusable for our purposes due to the 256 characters minimum requirement, other algorithms do not have Ruby bindings or require new system-wide dependencies to be installed.

I believe to mitigate false positives by limiting the circumstances under which the spam check is carried out, i.e. only when local users are mentioned and they are not following the author and the author is not simply responding to something that mentions them.

Furthermore, human moderators are in the loop thanks to an automatic report, so false positives can be noticed and addressed. I am introducing account trust levels so that unsilencing the account once will ensure the same spam check will not hit it again.

@ClearlyClaire
Copy link
Contributor

The biggest spam vector in the fediverse is "other instances", not the local server. Local moderation is already quite sufficient with human moderators and approval-mode registrations being available.

In nearly all cases of the recent spam waves, the offending accounts were hosted on Mastodon instances. Having the spam check run for local user would catch them with the same criteria but would catch them faster.

@@ -88,13 +88,17 @@ def auto_silence_account!
end

def auto_report_status!
ReportService.new.call(Account.representative, @account, status_ids: [@status.id]) unless @account.targeted_reports.unresolved.exists?
ReportService.new.call(Account.representative, @account, status_ids: @status.distributable? ? [@status.id] : nil, comment: I18n.t('spam_check.spam_detected_and_silenced')) unless @account.targeted_reports.unresolved.exists?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will still silence without sending a report if a report is already opened for that account.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's say someone else has reported that account for spamming already. Not silencing just because of that doesn't make sense to me. If you ask me, the presence of any report would be enough to highlight the fact that the account is silenced, since that is displayed on the reports screen.

@Mstrodl
Copy link

Mstrodl commented Jul 1, 2019

@Gargron What happens if spam instances just use a new account to become a different "sender" each time? Won't that completely avoid your checks? Or by 10 most recent messages by the sender do you mean one instance when you say "sender"?

@ClearlyClaire
Copy link
Contributor

@Mstrodl this is based on accounts and not instances. If a spammer keeps switching accounts, this will be ineffective. From what I understand, though, this is mainly aimed at stopping spam like the ongoing wave where accounts are manually(?) created across legitimate instances, so this approach can be effective.

@ClearlyClaire
Copy link
Contributor

A couple remarks:

  • Whenever the spam checker kicks in, it should probably add something in the admin log
  • the expiration duration of 3 months seem pretty long

@Gargron
Copy link
Member Author

Gargron commented Jul 1, 2019

Whenever the spam checker kicks in, it should probably add something in the admin log

The admin log is based on actor-action-target and in this case we don't have a good way of representing the system itself as actor (this comes back to having a "system" account for more general purposes)

the expiration duration of 3 months seem pretty long

What do you think would be more reasonable? I'm trying to avoid the case where accounts send spam with a super low frequency to avoid being detected.

@nolanlawson
Copy link
Contributor

If I understand correctly, can't this spam filter be defeated by changing a single word in the spam message? Since that would change the checksum?

It seems like we could quickly wind up with the same kind of email spam TH4T L00KS L1K3 TH1S. Or randomly switches some words for synonyms.

Most of the time the spammers seem to include a link: go here to learn about our "movement," go here to buy our stuff, etc. So for that reason maybe

we remove all mention links

^ this should be reconsidered? Checking to see if a message keeps containing the same link to the same website may be a simpler and more effective filter. (Although since the spammer could just create multiple URL-shortened links, you'd have to follow the redirect chain to see the final URL. And filter out query params, etc.)

@Gargron
Copy link
Member Author

Gargron commented Jul 1, 2019

If I understand correctly, can't this spam filter be defeated by changing a single word in the spam message? Since that would change the checksum?

Not quite, that's what Nilsimsa is for. The checksum would be quite similar.

we remove all mention links

^ this should be reconsidered?

Mention links are @nolan @Gargron etc. Not just any links. Mostly the spammer has already shown that he's willing to send messages without any links and they're just as annoying so focusing on links is not the right path.

@bclindner
Copy link
Contributor

I think is a good start, and with the Nilsimsa hashing this will probably catch out copy-paste spam waves. However, I think it goes without saying that this will only do its job for a short period. I'm certain spammers will quickly pick up on this and move to harder-to-detect methods like image spam.

I think this PR could be improved by making it reactive to text in content warnings as well. That's probably the easiest escape hatch for current spam patterns.

@Mstrodl
Copy link

Mstrodl commented Jul 1, 2019

Keep in mind also that many spammers will also just swap around a few words / mess with capitalization automatically based upon a random chance which might just give it enough difference according to the algorithm to let them through. (E.g. hey vs hello, ur vs your, and others) This is already something that happens with DM bots on messaging services like Skype

@bclindner
Copy link
Contributor

bclindner commented Jul 1, 2019

As an addendum, I am concerned about the inclusion of anti-spam measures as part of Mastodon's core development scope. Spam prevention measures are a rat race, and the public development of anti-spam to be directly integrated into the next version makes it very easy to see the next move of anti-spam measures and adjust accordingly, effectively making simpler measures like these a waste of development time. Additionally, I think more complex measures would be unreasonable to feasibly maintain as part of the core application stack. Even now, I'm certain spammers have already found a way around this and have switched up their tactics.

While I am woefully unequipped to speak on Mastodon architecturally, I think the best solution here might be a way to implement reactive 3rd-party anti-spam filtering "plug-ins" within Mastodon, if possible. This would spread out and open up development of anti-spam systems that integrate directly with an instance without having to straight-up fork the repo and host a custom version, and this would also allow a level of integration the REST API's stateless design can't allow.

@Mstrodl
Copy link

Mstrodl commented Jul 1, 2019

@bclindner I like the idea of being able to point mastodon at "integrations" sort of like Github's PR integrations (CircleCI and the like). This way they can be written in any language (not just Ruby) and could potentially mean a shared database could be used across instances if someone wanted

@Gargron
Copy link
Member Author

Gargron commented Jul 1, 2019

I like the idea of being able to point mastodon at "integrations" sort of like Github's PR integrations (CircleCI and the like). This way they can be written in any language (not just Ruby) and could potentially mean a shared database could be used across instances if someone wanted

  1. If it's not part of Mastodon's own code, it will not benefit admins with little/no technical background
  2. Sharing moderation data with 3rd parties is a big privacy no-no

Spam prevention measures are a rat race, and the public development of anti-spam to be directly integrated into the next version makes it very easy to see the next move of anti-spam measures and adjust accordingly, effectively making simpler measures like these a waste of development time.

It is a rat race but that doesn't mean there's no point in raising the barrier of entry in the default installation.

@Gargron Gargron force-pushed the feature-spam-check branch from 096440c to 8e46e2d Compare July 12, 2019 02:04
@Gargron Gargron force-pushed the feature-spam-check branch from 8e46e2d to 182bb68 Compare July 12, 2019 02:33
@Gargron Gargron requested a review from ClearlyClaire July 12, 2019 02:39
@Gargron
Copy link
Member Author

Gargron commented Jul 12, 2019

This PR has now been tested in production, and the spam check adds all matched statuses to the automatic report for diagnosis (as long as those are not private). The spam check does not run for local accounts because there doesn't seem to be an agreement over whether that is desired, but that can be changed in future PRs.

Copy link
Contributor

@ClearlyClaire ClearlyClaire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'm still not 100% convinced about the reliability of such a filter (both in terms of false positives and false negatives), so I'd be more comfortable if there was an option to disable it. But if it ran without issues on m.s., it can be enabled by default.

Also, this seems pretty useless for small or single-user instances, checking for spam at the source instance seems to make more sense to me (the recent spam waves were done by creating a bunch of accounts on various “trustworthy” instances that most probably would run the spam detection code, and that would catch those accounts much faster and silence them for everyone). But as you said, that can be another PR.

@Gargron Gargron merged commit 6ff67be into master Jul 13, 2019
@Gargron Gargron deleted the feature-spam-check branch July 21, 2019 01:48
hiyuki2578 pushed a commit to ProjectMyosotis/mastodon that referenced this pull request Oct 2, 2019
* Add a spam check

* Use Nilsimsa to generate locality-sensitive hashes and compare using Levenshtein distance

* Add more tests

* Add exemption when the message is a reply to something that mentions the sender

* Use Nilsimsa Compare Value instead of Levenshtein distance

* Use MD5 for messages shorter than 10 characters

* Add message to automated report, do not add non-public statuses to
automated report, add trust level to accounts and make unsilencing
raise the trust level to prevent repeated spam checks on that account

* Expire spam check data after 3 months

* Add support for local statuses, reduce expiration to 1 week, always create a report

* Add content warnings to the spam check and exempt empty statuses

* Change Nilsimsa threshold to 95 and make sure removed statuses are removed from the spam check

* Add all matched statuses into automatic report
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants