Change the direction for read-in of files (AdBlock sources only) #227

spirillen · 2021-03-11T22:01:21Z

Therefore, I'm willing to change the direction: What about PyFunceble trying to code as much as possible - if not all.

@keczuppp thank you for your table which I will use for the tests.

Is this new direction fair enough (for everyone)?

cc @kulfoon @spirillen @dnmTX @

I'm fine with this, as long it won't eat up a lot of resources trying to figure out what and how to "understand" the input for extraction of actual data. Since most blacklists are in a specific form adblock/regex/hosts/RPZ/Squid(-guard), whereof adblock probably is the hardest to decode and write rules for to get the right domain to test. The domain can be both first separated by commas and last separated by pipes and "hidden" in a lot of other ways, I'll predict this would be rather hush on the cpu if it all should be done by one worker, vs the current way; where we determine which decoder to be used for the given input.

funilrys · 2021-03-12T05:12:26Z

The change of direction was only meant for the ad block decoder (yet) and the way it works from an end-user POV.

Such directions should be decided individually for each decoder. The discussion about the way the adblocker works is following almost since the day I decided to release and adblock decoder into PyFunceble. Nobody said it easy but I took the challenge hopping that if it is used, I will get enough user-input to improve the decoder.

I'm probably one of few on earth who took that virtually impossible to solve challenge of decoding the domains of an adblocker. There is no real way to predict what is going on behind all list maintainers mind because there is a lot of possibilities.

Therefore, I take advantage of the major release to just remove the --aggressive flag and implement the decoding of the missing entries/format submitted by @keczuppp.

In fact, in 4.0.0 such "direction" was discreetly implemented in the hosts file decoder for example but not in the RPZ decoder.

PyFunceble will never try to guess the format of the input source for you. It does not make sense. It is virtually impossible and resource consuming at the same time.

What I meant with the change of direction regarding the ad block decoder is the fact, that we will decode everything instead of trying to guess what is relevant or not - which is actually the case for that specific decoder.

・・・
Sent from my supposedly smart phone

@keczuppp

This patch closes #227. This patch fixes #13. To quote @keczuppp (#13): > [.. ] but it seems you extract way too much in this mode on your own > and it might cause troubles... Therefore, I decided to rewrite the decoder completely. This patch introduces a real split between what is normally decoded and what is decoded within the aggressive mode. Within the "standard" mode, we only decode what is supposed to be blocked. On the other side, within the "aggressive" mode, we decode everything provided by the "standard" mode, plus everything behind a 'domain=' option or an 'href=' directive - if effective. Please report to the tests to understand the differences on a more deeper level and keep in mind that this new "direction" will evolve with the time. Decoding AdBlock or Filter lists is not an easy job and I hope to get much more feedback in the future. I didn't implement this because I have a use for it. But rather because it was asked by someone and I wanted to see if I was capable of implementing it. Now it's fully part of PyFunceble and people using it shouldn't be afraid to submit the "weird things" they find while using the decoder. Contributors: * @dnmTX * @jawz101 * @keczuppp * @kulfoon * @spirillen

spirillen mentioned this issue Mar 11, 2021

Adblock decoder ignore some portion when decoding #13

Closed

spirillen changed the title ~~Change the direction for read-in of files~~ Change the direction for read-in of files (AdBlock sources only) Mar 13, 2021

funilrys closed this as completed Mar 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change the direction for read-in of files (AdBlock sources only) #227

Change the direction for read-in of files (AdBlock sources only) #227

spirillen commented Mar 11, 2021

funilrys commented Mar 12, 2021

Change the direction for read-in of files (AdBlock sources only) #227

Change the direction for read-in of files (AdBlock sources only) #227

Comments

spirillen commented Mar 11, 2021

funilrys commented Mar 12, 2021