-
-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change the direction for read-in of files (AdBlock sources only) #227
Comments
The change of direction was only meant for the ad block decoder (yet) and the way it works from an end-user POV. Such directions should be decided individually for each decoder. The discussion about the way the adblocker works is following almost since the day I decided to release and adblock decoder into PyFunceble. Nobody said it easy but I took the challenge hopping that if it is used, I will get enough user-input to improve the decoder. I'm probably one of few on earth who took that virtually impossible to solve challenge of decoding the domains of an adblocker. There is no real way to predict what is going on behind all list maintainers mind because there is a lot of possibilities. Therefore, I take advantage of the major release to just remove the In fact, in PyFunceble will never try to guess the format of the input source for you. It does not make sense. It is virtually impossible and resource consuming at the same time. What I meant with the change of direction regarding the ad block decoder is the fact, that we will decode everything instead of trying to guess what is relevant or not - which is actually the case for that specific decoder. ・・・ |
This patch closes #227. This patch fixes #13. To quote @keczuppp (#13): > [.. ] but it seems you extract way too much in this mode on your own > and it might cause troubles... Therefore, I decided to rewrite the decoder completely. This patch introduces a real split between what is normally decoded and what is decoded within the aggressive mode. Within the "standard" mode, we only decode what is supposed to be blocked. On the other side, within the "aggressive" mode, we decode everything provided by the "standard" mode, plus everything behind a 'domain=' option or an 'href=' directive - if effective. Please report to the tests to understand the differences on a more deeper level and keep in mind that this new "direction" will evolve with the time. Decoding AdBlock or Filter lists is not an easy job and I hope to get much more feedback in the future. I didn't implement this because I have a use for it. But rather because it was asked by someone and I wanted to see if I was capable of implementing it. Now it's fully part of PyFunceble and people using it shouldn't be afraid to submit the "weird things" they find while using the decoder. Contributors: * @dnmTX * @jawz101 * @keczuppp * @kulfoon * @spirillen
I'm fine with this, as long it won't eat up a lot of resources trying to figure out what and how to "understand" the input for extraction of actual data. Since most blacklists are in a specific form adblock/regex/hosts/RPZ/Squid(-guard), whereof adblock probably is the hardest to decode and write rules for to get the right domain to test. The domain can be both first separated by commas and last separated by pipes and "hidden" in a lot of other ways, I'll predict this would be rather hush on the cpu if it all should be done by one worker, vs the current way; where we determine which decoder to be used for the given input.
The text was updated successfully, but these errors were encountered: