Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regular expressions with domain lists #866

Closed
dissona opened this issue Feb 27, 2023 · 11 comments
Closed

Regular expressions with domain lists #866

dissona opened this issue Feb 27, 2023 · 11 comments

Comments

@dissona
Copy link

dissona commented Feb 27, 2023

Manjaro KDE 5.26.5
Kernel 5.15.93-1-MANJARO
opensnitch 1.5.7

I am using rule:
list of domains/IPs > To this list of domains (regular expressions)

I have mybase list (7.7MiB) from https://github.com/DNSCrypt/dnscrypt-proxy/wiki/Public-blocklist

Issue:
it blocks sites that are not on the list for example mozilla.org and addons.mozilla.org

What i have tried:
Removing all lines with * or #, so its just a list of domains
Removing all lines with mozilla

Im not sure what the issue is, but it could be that the regular expressions used by opensnitch are to broad?

@gustavo-iniguez-goya
Copy link
Collaborator

Hi @dissona ,

Set LogLevel to DEBUG under Preferences -> Nodes, and filter the log from a terminal like this:
`$ tail -f /var/log/opensnitchd.log | grep "Regexp list match:"

That way you'll see what regular expression is matching a domain. Post it here so we can debug it.

If you're using a generic domains list (only domains, without regexps) with a rule type of regular expressions, then yes, it'll consider for example "mozilla.org" as ".*mozilla.org"

@dissona
Copy link
Author

dissona commented Feb 27, 2023

egexp list match�[0m: mozilla.org, ozilla.org

looks like it matches anything, if there is no expression given

How can i fix the mybase list with this syntax? I tried the hosts format on the other setting but does not allow me to use regex or have an easy way to block all subdomains from this massive list

@gustavo-iniguez-goya
Copy link
Collaborator

in a list of regular expressions every entry is compiled as regexp, so for example "mozilla.org" will match *mozilla.org* (support.mozilla.org, www.mozilla.org, momomozilla.org, etc)

in this case:
egexp list match�[0m: mozilla.org, ozilla.org

"ozilla.org" is an entry of your list, so it's matching "*ozilla.org" -> addons.mozilla.org, www.mozilla.org, etc.

I downloaded mybase list and "ozilla.org" does not appear in the list, only memozilla.org, supportmozilla.org and download-stats.mozilla.org

How can i fix the mybase list with this syntax?

I'd start reviewing that suspicious entry "ozilla.org" and deleting it.

Then if you want to filter all the subdomains of that list, I'd convert the domains to regular expressions:
$ cat mybase.txt | sed 's/\([0-9a-z\-].*\)/.*(^|\\.)\1$/' > mybase_regexp.txt

(which means: given a domain xyz.net, filter xyz.net or any subdomain of *.xyz.net)

playground: https://go.dev/play/p/JzQCeNH4OH1

@dissona
Copy link
Author

dissona commented Feb 28, 2023

I downloaded mybase list and "ozilla.org" does not appear in the list, only memozilla.org, supportmozilla.org and download-stats.mozilla.org

its on line 449213

$ cat mybase.txt | sed 's/\([0-9a-z\-].*\)/.*(^|\\.)\1$/' > mybase_regexp.txt

Thankyou, this seemed to have worked but its taking so long to resolve DNS requests my internet is unusable with the list enabled

@gustavo-iniguez-goya
Copy link
Collaborator

if you're using Deny for that rule, change it to Reject.

@dissona
Copy link
Author

dissona commented Feb 28, 2023

no difference, I also tried changing process monitor method from ebpf to proc and it still takes 30~ seconds to resolve dns requests on firefox

my log has these errors if thats any help

eBPF Failed to load /etc/opensnitchd/opensnitch.o: open /etc/opensnitchd/opensnitch.o: no such file or directory
Error while pinging UI service: rpc error: code = DeadlineExceeded desc = context deadline exceeded, state: READY

@gustavo-iniguez-goya
Copy link
Collaborator

its on line 449213

oops, you're right.

hmm, after loading this list with regexps, the daemon is using 100% of the CPU.

For now this is a limitation, regexp lists will only work with small lists of regexps

@dissona
Copy link
Author

dissona commented Mar 1, 2023

ok thankyou, I will use hosts format instead

it says in https://github.com/DNSCrypt/dnscrypt-proxy/wiki/Filters#filter-patterns
Other pattern types are slower and should be used with moderation.

maybe whatever code they are using for domain/subdomains wildcards can be implemented into opensnitch?

@gustavo-iniguez-goya
Copy link
Collaborator

In principle those filters are just regular expressions, they should work fine with opensnitch.
The problem is the mybase list, it's huge, and due to how our regexp list feature is coded we've got a bottleneck there.

I've been using this list since I added regexp lists: https://github.com/mmotti/pihole-regex/blob/master/regex.list
It's small, but it's generic enough to filter unwanted domains, not listed on popular block lists.

There're some more info on reddit:
https://www.reddit.com/r/pihole/comments/awvk13/can_anyone_recommend_some_good_regex_filters/
https://www.reddit.com/r/pihole/comments/b3fj60/regex_megathread/

Maybe we could maintain a list of regexps.

@gustavo-iniguez-goya
Copy link
Collaborator

wiki updated to reflect all this: https://github.com/evilsocket/opensnitch/wiki/block-lists#lists-of-domains-with-regular-expressions

@gustavo-iniguez-goya
Copy link
Collaborator

closing as this is a limitation right now, that would require a lot of work for little benefit. But I've tagged it just in case some day or someone wants to fix it.

@gustavo-iniguez-goya gustavo-iniguez-goya closed this as not planned Won't fix, can't repro, duplicate, stale Apr 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants