Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the direction for read-in of files (AdBlock sources only) #227

Closed
spirillen opened this issue Mar 11, 2021 · 1 comment
Closed

Change the direction for read-in of files (AdBlock sources only) #227

spirillen opened this issue Mar 11, 2021 · 1 comment

Comments

@spirillen
Copy link
Contributor

Therefore, I'm willing to change the direction: What about PyFunceble trying to code as much as possible - if not all.

@keczuppp thank you for your table which I will use for the tests.

Is this new direction fair enough (for everyone)?

cc @kulfoon @spirillen @dnmTX @

I'm fine with this, as long it won't eat up a lot of resources trying to figure out what and how to "understand" the input for extraction of actual data. Since most blacklists are in a specific form adblock/regex/hosts/RPZ/Squid(-guard), whereof adblock probably is the hardest to decode and write rules for to get the right domain to test. The domain can be both first separated by commas and last separated by pipes and "hidden" in a lot of other ways, I'll predict this would be rather hush on the cpu if it all should be done by one worker, vs the current way; where we determine which decoder to be used for the given input.

@funilrys
Copy link
Owner

The change of direction was only meant for the ad block decoder (yet) and the way it works from an end-user POV.

Such directions should be decided individually for each decoder. The discussion about the way the adblocker works is following almost since the day I decided to release and adblock decoder into PyFunceble. Nobody said it easy but I took the challenge hopping that if it is used, I will get enough user-input to improve the decoder.

I'm probably one of few on earth who took that virtually impossible to solve challenge of decoding the domains of an adblocker. There is no real way to predict what is going on behind all list maintainers mind because there is a lot of possibilities.

Therefore, I take advantage of the major release to just remove the --aggressive flag and implement the decoding of the missing entries/format submitted by @keczuppp.

In fact, in 4.0.0 such "direction" was discreetly implemented in the hosts file decoder for example but not in the RPZ decoder.

PyFunceble will never try to guess the format of the input source for you. It does not make sense. It is virtually impossible and resource consuming at the same time.

What I meant with the change of direction regarding the ad block decoder is the fact, that we will decode everything instead of trying to guess what is relevant or not - which is actually the case for that specific decoder.

・・・
Sent from my supposedly smart phone

@spirillen spirillen changed the title Change the direction for read-in of files Change the direction for read-in of files (AdBlock sources only) Mar 13, 2021
funilrys added a commit that referenced this issue Mar 14, 2021
This patch closes #227.
This patch fixes #13.

To quote @keczuppp (#13):

> [.. ] but it seems you extract way too much in this mode on your own
> and it might cause troubles...

Therefore, I decided to rewrite the decoder completely.

This patch introduces a real split between what is normally decoded and
what is decoded within the aggressive mode.

Within the "standard" mode, we only decode what is supposed to be
blocked.
On the other side, within the "aggressive" mode, we decode
everything provided by the "standard" mode, plus everything behind a
'domain=' option or an 'href=' directive - if effective.

Please report to the tests to understand the differences on a more
deeper level and keep in mind that this new "direction" will evolve
with the time.

Decoding AdBlock or Filter lists is not an easy job and I hope to get
much more feedback in the future. I didn't implement this because
I have a use for it. But rather because it was asked by someone and
I wanted to see if I was capable of implementing it.

Now it's fully part of PyFunceble and people using it shouldn't be
afraid to submit the "weird things" they find while using the decoder.

Contributors:
  * @dnmTX
  * @jawz101
  * @keczuppp
  * @kulfoon
  * @spirillen
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants