Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adblock decoder ignore some portion when decoding #13

Closed
funilrys opened this issue Oct 2, 2018 · 76 comments
Closed

Adblock decoder ignore some portion when decoding #13

funilrys opened this issue Oct 2, 2018 · 76 comments
Assignees
Labels

Comments

@funilrys
Copy link
Owner

funilrys commented Oct 2, 2018

As reported by @dnmTX at Ultimate-Hosts-Blacklist/dev-center#9:

everything with ##[href^=....

are ignored.

@funilrys funilrys added the bug label Oct 2, 2018
@funilrys funilrys self-assigned this Oct 2, 2018
@dnmTX
Copy link

dnmTX commented Oct 2, 2018

Caught another bug:
This section in the original lists are rules that removing elements(only) from legit sites:
original

After PyFunceble filter the lists the end result in domain.list is:
end_result
Notice how legit sites are being blocked?

@funilrys
Copy link
Owner Author

funilrys commented Oct 2, 2018

Okay you have to explain me AdBlock then @dnmTX 😸 I'm not a big fan of it as its syntax is confusing.

So how do I differ legit from bad site in adblock ? I though that adblock was only about blocking not whitelisting 🤔

@dnmTX
Copy link

dnmTX commented Oct 2, 2018

OK......
i'll do the basics only to be more clear:
If you want to block domain you need to add || in the front and ^ at the end(it will catch the subdomains as well)
If you want to block just element in that website,you need to find it(chrome dev-tools helps a lot with that) and add ## after the domain name.Example:
Open yahoo.com(i removed soooo many elemnts from that page you wouldn't recognize it).
Now...look at your yahoo page and compare to mine:
yahoo

Much cleaner,no videos,no annoyances.
Rules examples:
yahoo.com###applet_p_50000278
yahoo.com###applet_p_32209491
yahoo.com###applet_p_50000277
yahoo.com###applet_p_63802
yahoo.com###applet_p_63796
yahoo.com###sticky-lrec2-footer

@funilrys
Copy link
Owner Author

funilrys commented Oct 2, 2018

Okay so what about this format ? Which of the following mark the domain as a bad or good boy ?

||google.com$script,image
||api.google.com/papi/action$popup
facebook.com###player-above-2
~github.com,hello.world##
@@||cnn.com/*ad.xml
!||world.hello/*ad.xml
!@@||funceble.world/js
yahoo.com,msn.com,api.hello.world#@#awesomeWorld
!funilrys.com##body
hubgit.com|oohay.com|ipa.elloh.dlorw#@#awesomeWorld

I know you will not find them in real world but they are part of the tests for the decoder.

@dnmTX
Copy link

dnmTX commented Oct 2, 2018

The ##[href^=.... it's different.it's embedded in the iframe and this how you blocking those domains

@funilrys
Copy link
Owner Author

funilrys commented Oct 2, 2018

Okay I'm working on that implementation.

So in this

hubgit.com|oohay.com|ipa.elloh.dlorw#@#awesomeWorld

they are all legit right ?

@dnmTX
Copy link

dnmTX commented Oct 2, 2018

||google.com$script,image -this rule will not allow any scripts or images to be shown or executed on that domain
||api.google.com/papi/action$popup -this rule will stop the popup coming from that link
facebook.com###player-above-2 -this one will hide element(looks like a video player) on that page
~github.com,hello.world## -hmmmm haven't seen this one
@@||cnn.com/*ad.xml -this rule will whitelist that link on the webpage (@@ in front is whitelisting)
!||world.hello/*ad.xml -this will block it(! in the front is comment)
!@@||funceble.world/js -this will whitelist that js script (! in the front is comment)
yahoo.com,msn.com,api.hello.world#@#awesomeWorld -don't know
!funilrys.com##body -this will block element
hubgit.com|oohay.com|ipa.elloh.dlorw#@#awesomeWorld -don't know

@dnmTX
Copy link

dnmTX commented Oct 2, 2018

Stay put,let me do some research on #@# rule cause i'm using AdGuard and haven't seen such a rule there

@dnmTX
Copy link

dnmTX commented Oct 2, 2018

Ok... in the above example the rule #@# allows(whitelists) that particular element on the listed domains,so yes,all those domains are legit

@funilrys
Copy link
Owner Author

funilrys commented Oct 2, 2018

Okay let me implement this issue first with the current format will then review with you for all tests as those need some hotfix. Never thought about whitelisting 😹

@dnmTX
Copy link

dnmTX commented Oct 2, 2018

I know,it's Java,more complex.Took me a while to get around it but i'm getting there

@dnmTX
Copy link

dnmTX commented Oct 2, 2018

@funilrys make it simple.Everything that has || in front and ^ at the end stays.
Everything that has href in it stays(filtered of course to leave the domain only).The rest should be removed as it's rules that don't really concern any of us who will use the lists in hosts format.

@funilrys
Copy link
Owner Author

funilrys commented Oct 2, 2018

Yeah but if I do that, I'll invalidate AdBlock/filter list like https://github.com/MajkiIT/polish-ads-filter 😸

@funilrys
Copy link
Owner Author

funilrys commented Oct 2, 2018

Only need to take some time to understand how it works properly then will clean the mess I created!

@dnmTX
Copy link

dnmTX commented Oct 2, 2018

Look at this one for example.In it,all legit domains with rules to block certain elements only.

@dnmTX
Copy link

dnmTX commented Oct 2, 2018

Ok,i know it will take time but meanwhile,for everyone who uses the lists with dnsmasq etc etc and not adblockers. Can you PLEASE add https://raw.githubusercontent.com/Dawsey21/Lists/master/main-blacklist.txt to be filtered properly.

@dnmTX
Copy link

dnmTX commented Oct 2, 2018

Also you can start here,it's very well explained and will help you understand the basics:
https://kb.adguard.com/en/general/how-to-create-your-own-ad-filters

@dnmTX
Copy link

dnmTX commented Oct 2, 2018

Ok,i know it will take time but meanwhile,for everyone who uses the lists with dnsmasq etc etc and not adblockers. Can you PLEASE add https://raw.githubusercontent.com/Dawsey21/Lists/master/main-blacklist.txt to be filtered properly.

PLEASE
untitled

funilrys added a commit that referenced this issue Oct 2, 2018
This patch fix #13.

Reverse:
  * Of the last patch for the way we check for URL.

Introduction:
  * Of new test cases.
  * Of the force update for all version which are older than `0.94.3`.
    * Because of this patch.

Review:
  * Of the way we extract domain and URL from the given adblock file.

Deprecation:
  * Of all version which are equal or older than `0.109.0`.

Thanks:
  * To @dnmTX
  * @adblockplus for their documentation
    * cf: https://adblockplus.org/filter-cheatsheet
@funilrys
Copy link
Owner Author

funilrys commented Oct 2, 2018

@dnmTX ,

PyFunceble is fixed, please look at the tests for details.

As you mentioned, there was really an issue with my way of handling adblock lists. Therefor here is the eratum:

Please understand by self.expected the list of extracted domains from the given input (self.lines).

self.lines = [
            "||funilrys.github.io$script,image",
            "||google.com^$script,image",
            "||twitter.com^helloworld.com",
            "||api.google.com/papi/action$popup",
            "facebook.com###player-above-2",
            "~github.com,hello.world##.wrapper",
            "@@||cnn.com/*ad.xml",
            "!||world.hello/*ad.xml",
            "bing.com,bingo.com#@##adBanner",
            "!@@||funceble.world/js",
            "yahoo.com,~msn.com,api.hello.world#@#awesomeWorld",
            "!funilrys.com##body",
            "hello#@#badads",
            "hubgit.com|oohay.com|ipa.elloh.dlorw#@#awesomeWorld",
            '##[href^="https://funceble.funilrys.com/"]',
            "[AdBlock Plus 2.0]",
            '##div[href^="http://funilrys.com/"]',
            'com##[href^="ftp://funceble.funilrys-funceble.com/"]',
            "/banner/*/img^" "|github.io|",
            "|github.io|",
            "||api.funilrys.com/widget/$",
        ]

        self.expected = [
            "funilrys.github.io",
            "google.com",
            "twitter.com",
            "api.google.com",
            "funceble.funilrys.com",
            "funilrys.com",
            "github.io",
            "api.funilrys.com",
]

As the tests were passed without any issue (cf.) I can attest that the next release and the current development version do not take any false positive anymore.

Please let me know if there is something else.

This issue will be closed on next release!

Cheers,
Nissar

@funilrys
Copy link
Owner Author

funilrys commented Oct 2, 2018

@dnmTX
Copy link

dnmTX commented Oct 2, 2018

@funilrys from what i can tell and understand is self.lines is the example of if there is any domains there not to be added for filtering as they are legit? Am i close?

What about anything with ##div[href^=...,those are usually bad ones that need blocking?

Another thing(just to make sure).Example:
||api.funilrys.com/widget/$ what this is is partial link that could be api.funilrys.com/widget/bla/bla/bla/ad.js and the adblocker will catch it but the thing is that because that domain is hosting some ad or telemetry script(google usually does that) that doesn't mean that the actual domain is bad.The question is if there is certain rule how that domain will be considered,as bad or as good?

@funilrys
Copy link
Owner Author

funilrys commented Oct 2, 2018

@dnmTX self.lines contains random lines that can be found in regular AdBlock. The objective of the code I write/wrote is to get as output the list self.expected which is in more practical way, what we are going to test (the bad ones).

So from your point of view self.expected represent the bad one we have to test.

About ##div[href^=...
It's there because usually you have ##[href^=... but those variant also exist:

  • ##div[href^=...
  • com##div[href^=...
  • com##[href^=...

With my review, the domain which is in the href attribute is extracted and formatted (remove protocol and "decorators") 😸

@dnmTX
Copy link

dnmTX commented Oct 2, 2018

Actually from my point of view the self.expected should be considered the good ones with exception of everything that has href in it.
Bad ones should start with || and end with ^ including all the href variations.

@funilrys
Copy link
Owner Author

funilrys commented Oct 2, 2018

Wow you lost me 😹

For clarification, those are example of format do not consider those domains we are only talking about extracted domain from matched format 😸

|  Expected/Extracted/Tested by PyFunceble 	| Line (example)                             	|
|------------------------------------------	|--------------------------------------------	|
| funilrys.github.io                       	| ||funilrys.github.io$script,image          	|
| google.com                               	| ||google.com^$script,image                 	|
| twitter.com                              	| ||twitter.com^helloworld.com               	|
| api.google.com                           	| ||api.google.com/papi/action$popup         	|
| funceble.funilrys.com                    	| ##[href^="https://funceble.funilrys.com/"] 	|
| funilrys.com                             	| ##div[href^="http://funilrys.com/"]        	|
| github.io                                	| |github.io|                                	|
| api.funilrys.com                         	| ||api.funilrys.com/widget/$                	|


Also if we match for example hello.world##ad-selector we do not extract hello.world as a bad one.

@funilrys
Copy link
Owner Author

funilrys commented Oct 2, 2018

Maybe I misunderstood something 🤔

@dnmTX
Copy link

dnmTX commented Oct 2, 2018

Also if we match for example hello.world##ad-selector we do not extract hello.world as a bad one.

Ok,that's good,that's how it's suppose to be but......
If we match ||api.google.com/papi/action$popup do we extract api.google.com as bad one or not?
This is where the tricky part is,cause in this example api.google.com is very legit domain that hosts ad scripts and so on but also hosts things that without them the web page will be broken.

@spirillen
Copy link
Contributor

OT to @keczuppp

I most clicked on a wrong "version"...

image

@spirillen
Copy link
Contributor

spirillen commented Mar 11, 2021

Moved my answer to #227 as It's OT to OP's post and I hope there will be more activities in replies to this topic Therefore, I'm willing to change the direction:

@keczuppp
Copy link

keczuppp commented Mar 12, 2021

(my reply is also a reply to #227 (comment) at the same time):

  • yeah, to create a good quality Adblock Decoder is not an easy task, it's tricky, it's a good challenge in programming skills, to extract good domains and avoid false positives at the same time, if you don't feel up to it or you think it's too complicated and not worth to continue the developement, you can replace it with an easier and simpler "decode everything" Adblock Decoder mode, but it's not a magic potion to solve the Adblock Decoder problem, such mode finally will not be better because it will give too many useless false positives which will clutter the output list, there seem to be no easy way to go (no "magic bullet")...
  • the default "HOSTS" mode should stay, because it's limited to parsing only a few kind of HOSTS compatible filters and that should not cause any problems, it's easy to implement and very useful for hosts USERS as a great supplement for their HOSTS lists, if anything cause problems in this mode, just limit extraction to what is listed here Adblock decoder ignore some portion when decoding #13 (comment), but it seems you extract way too much in this mode on your own and it might cause troubles...

funilrys added a commit that referenced this issue Mar 14, 2021
This patch closes #227.
This patch fixes #13.

To quote @keczuppp (#13):

> [.. ] but it seems you extract way too much in this mode on your own
> and it might cause troubles...

Therefore, I decided to rewrite the decoder completely.

This patch introduces a real split between what is normally decoded and
what is decoded within the aggressive mode.

Within the "standard" mode, we only decode what is supposed to be
blocked.
On the other side, within the "aggressive" mode, we decode
everything provided by the "standard" mode, plus everything behind a
'domain=' option or an 'href=' directive - if effective.

Please report to the tests to understand the differences on a more
deeper level and keep in mind that this new "direction" will evolve
with the time.

Decoding AdBlock or Filter lists is not an easy job and I hope to get
much more feedback in the future. I didn't implement this because
I have a use for it. But rather because it was asked by someone and
I wanted to see if I was capable of implementing it.

Now it's fully part of PyFunceble and people using it shouldn't be
afraid to submit the "weird things" they find while using the decoder.

Contributors:
  * @dnmTX
  * @jawz101
  * @keczuppp
  * @kulfoon
  * @spirillen
@funilrys
Copy link
Owner Author

Please take my commit and the underlying tests as the response. Is it still too much @keczuppp ?

Let's discuss the future of that specific decoder. I'll inject any future report about missing decoding into the tests. So the more reports, the better that decoder will be 😄

As I wrote, I'm not one of those who write a filter list... So help or directions are welcome!

@keczuppp
Copy link

keczuppp commented Mar 17, 2021

Hello, I was already trying to test the new version of Adblock Decoder (4.0.0b35) but:

error
 Finished processing dependencies for PyFunceble-dev==4.0.0b35

D:\download_big_temp\_koding\PyFunceble-dev>pyfunceble
Traceback (most recent call last):
  File "D:\download_big_temp\_koding\Python37\Scripts\pyfunceble-script.py", line 33, in <module>
    sys.exit(load_entry_point('PyFunceble-dev==4.0.0b35', 'console_scripts', 'pyfunceble')())
  File "D:\download_big_temp\_koding\Python37\lib\site-packages\pyfunceble_dev-4.0.0b35-py3.7.egg\Py
Funceble\cli\entry_points\pyfunceble\cli.py", line 1022, in tool
  File "D:\download_big_temp\_koding\Python37\lib\site-packages\pyfunceble_dev-4.0.0b35-py3.7.egg\Py
Funceble\config\loader.py", line 370, in start
  File "D:\download_big_temp\_koding\Python37\lib\site-packages\pyfunceble_dev-4.0.0b35-py3.7.egg\Py
Funceble\config\loader.py", line 331, in get_config_file_content
  File "D:\download_big_temp\_koding\Python37\lib\site-packages\pyfunceble_dev-4.0.0b35-py3.7.egg\Py
Funceble\helpers\dict.py", line 290, in from_yaml_file
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\user\\AppData\\Local\\Temp\\tmp_
oox9mr2'
  • also even if somehow you help me to fix Python errors, I still don't know how to use it from PyFunceble without doing any DNS queries etc (how to use it just like the standalone decoder: just to put an input file and get an output file), as I want to put some fitler list to test and decode it but I'm not interested spending hours waiting until it finish useless DNS queries (they are useless and garbare in case of testing the Adblock Decoder), was unable to find any info about skipping DNS queries in Adblock Decoder mode, in the documentation.

@funilrys
Copy link
Owner Author

@keczuppp Thanks for the notice. I'll update the AdBlock decoder project as soon as possible.

The simple way, is the pyfunceble --syntax --adblock --aggressive -f [file] arguments. 😄

Note to self: Cleanup documentation.

@keczuppp
Copy link

keczuppp commented Mar 18, 2021

So I've tried the newest version v4.0.0b36. and:

Errors 1 log (EasyList)
D:\download_big_temp\_koding>pyfunceble --syntax --adblock --aggressive -f easylist.txt

########  ##    ## ######## ##     ## ##    ##  ######  ######## ########  ##       ########
##     ##  ##  ##  ##       ##     ## ###   ## ##    ## ##       ##     ## ##       ##
##     ##   ####   ##       ##     ## ####  ## ##       ##       ##     ## ##       ##
########     ##    ######   ##     ## ## ## ## ##       ######   ########  ##       ######
##           ##    ##       ##     ## ##  #### ##       ##       ##     ## ##       ##
##           ##    ##       ##     ## ##   ### ##    ## ##       ##     ## ##       ##
##           ##    ##        #######  ##    ##  ######  ######## ########  ######## ########

You are using the Beta version of PyFunceble 4.0.0!
Please take the time to communicate with us when you notice
something unusual.


Fatal Error: 'bool' object has no attribute 'replace'
Traceback (most recent call last):
  File "d:\download_big_temp\_koding\python37\lib\site-packages\PyFunceble\cli\system\launcher.py",
line 864, in start
    self.fill_to_test_queue_from_protocol()
  File "d:\download_big_temp\_koding\python37\lib\site-packages\PyFunceble\cli\system\launcher.py",
line 593, in fill_to_test_queue_from_protocol
    handle_file(protocol)
  File "d:\download_big_temp\_koding\python37\lib\site-packages\PyFunceble\cli\system\launcher.py",
line 533, in handle_file
    cidr2subject=self.cidr2subject,
  File "d:\download_big_temp\_koding\python37\lib\site-packages\PyFunceble\cli\utils\testing.py", li
ne 228, in get_subjects_from_line
    .set_data_to_convert(line)
  File "d:\download_big_temp\_koding\python37\lib\site-packages\PyFunceble\converter\adblock_input_l
ine2subject.py", line 429, in get_converted
    result.update(self._decode_v5(self.data_to_convert))
  File "d:\download_big_temp\_koding\python37\lib\site-packages\PyFunceble\converter\adblock_input_l
ine2subject.py", line 382, in _decode_v5
    result.update(self._decode_options(options.split(",")))
  File "d:\download_big_temp\_koding\python37\lib\site-packages\PyFunceble\converter\adblock_input_l
ine2subject.py", line 211, in _decode_options
    result.add(self.extract_base(matched))
  File "d:\download_big_temp\_koding\python37\lib\site-packages\PyFunceble\converter\adblock_input_l
ine2subject.py", line 156, in extract_base
    subject = subject.replace("*", "").replace("~", "")
AttributeError: 'bool' object has no attribute 'replace'

Process pyfunceble_tester_worker_2:
Process pyfunceble_tester_worker_1:
Traceback (most recent call last):
  File "d:\download_big_temp\_koding\python37\lib\multiprocessing\connection.py", line 312, in _recv
_bytes
    nread, err = ov.GetOverlappedResult(True)
BrokenPipeError: [WinError 109] Potok został zakończony

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "d:\download_big_temp\_koding\python37\lib\multiprocessing\process.py", line 297, in _bootstr
ap
    self.run()

D:\download_big_temp\_koding>
Errors 2 log (EasyList Polish)

D:\download_big_temp\_koding>pyfunceble --syntax --adblock --aggressive -f easylistpolish.txt

########  ##    ## ######## ##     ## ##    ##  ######  ######## ########  ##       ########
##     ##  ##  ##  ##       ##     ## ###   ## ##    ## ##       ##     ## ##       ##
##     ##   ####   ##       ##     ## ####  ## ##       ##       ##     ## ##       ##
########     ##    ######   ##     ## ## ## ## ##       ######   ########  ##       ######
##           ##    ##       ##     ## ##  #### ##       ##       ##     ## ##       ##
##           ##    ##       ##     ## ##   ### ##    ## ##       ##     ## ##       ##
##           ##    ##        #######  ##    ##  ######  ######## ########  ######## ########

You are using the Beta version of PyFunceble 4.0.0!
Please take the time to communicate with us when you notice
something unusual.




Subject
 Status      Source
----------------------------------------------------------------------------------------------------
 ----------- ----------
←[30m←[42mczasdzieci.pl
           VALID       SYNTAX
←[30m←[42mwlbetathome.adsrv.eacdn.com
           VALID       SYNTAX
←[30m←[42mmbank.pl
           VALID       SYNTAX
←[30m←[42mrtb.4finance.com
           VALID       SYNTAX
←[30m←[42mapp.freshmail.com
           VALID       SYNTAX
←[30m←[42mmandarinodesign.eu
           VALID       SYNTAX
←[30m←[42mad-work.pl
           VALID       SYNTAX
←[30m←[42mwirtualnyregion.pl
           VALID       SYNTAX
←[30m←[42madsearch.pl
           VALID       SYNTAX
←[30m←[42maffiliates-solutions.com
           VALID       SYNTAX
←[30m←[42madsnet.pl
           VALID       SYNTAX
←[30m←[42mbanmax.com
           VALID       SYNTAX
←[30m←[42mconverti.se
           VALID       SYNTAX
←[30m←[42mconvertiser.com
           VALID       SYNTAX
←[30m←[42mhub.com.pl
           VALID       SYNTAX
←[30m←[42mincontext.pl
           VALID       SYNTAX
←[30m←[42mleadstar.pl
           VALID       SYNTAX
←[30m←[42mmedia.stsaff.pl
           VALID       SYNTAX
←[30m←[42mnetsalesmedia.pl
           VALID       SYNTAX
←[30m←[42mpocketads.pl
           VALID       SYNTAX
←[30m←[42mreklamawadowice24.pl
           VALID       SYNTAX
←[30m←[42mrewords.pl
           VALID       SYNTAX
←[30m←[42mrenormaliseras.xyz
           VALID       SYNTAX
←[30m←[42mshopeneo.network
           VALID       SYNTAX
←[30m←[42mspacead.pl
           VALID       SYNTAX
←[30m←[42mtvtoss.com
           VALID       SYNTAX
←[30m←[42mwaytogrow.pl
           VALID       SYNTAX
←[30m←[42mad.admitad.com
           VALID       SYNTAX
←[30m←[42mad.e-lider.pl
           VALID       SYNTAX
←[30m←[42mad.eko-7.com.pl
           VALID       SYNTAX
←[30m←[42maff.bstatic.com
           VALID       SYNTAX
←[30m←[42mbusinessclick.biz.pl
           VALID       SYNTAX
←[30m←[42mavanti.fashion
           VALID       SYNTAX
←[30m←[42mcdn.leadbit.com
           VALID       SYNTAX
←[30m←[42mcomperialead.pl
           VALID       SYNTAX
←[30m←[42mcontexthub.net
           VALID       SYNTAX
←[30m←[42mczasdzieci.home.pl
           VALID       SYNTAX
←[30m←[42mec.hub2.com.pl
           VALID       SYNTAX
←[30m←[42meuphoniserent.xyz
           VALID       SYNTAX
←[30m←[42mlnaff.pl
           VALID       SYNTAX
←[30m←[42moffersprovider.widget.onet.pl
           VALID       SYNTAX
←[30m←[42mpartnerzyapi.ceneo.pl
           VALID       SYNTAX
←[30m←[42mppwidget.skapiec.pl
           VALID       SYNTAX
←[30m←[42mqwerty1.co.pl
           VALID       SYNTAX
←[30m←[42mr.pless.nazwa.pl
           VALID       SYNTAX
←[30m←[42mreklamy.hostings.pl
           VALID       SYNTAX
←[30m←[42msmartclick.pl
           VALID       SYNTAX
←[30m←[42msolutions4ad.com
           VALID       SYNTAX
←[30m←[42mstatic.travelist.pl
           VALID       SYNTAX
←[30m←[42msystem.mondeos.pl
           VALID       SYNTAX
←[30m←[42mthc-thc.com
           VALID       SYNTAX
←[30m←[42mtmefekt.pl
           VALID       SYNTAX
←[30m←[42mwydawca.lead.network
           VALID       SYNTAX
←[30m←[42mpopups.afftrack001.com
           VALID       SYNTAX
←[30m←[42m24gliwice.pl
           VALID       SYNTAX
←[30m←[42m24opole.pl
           VALID       SYNTAX
←[30m←[42mfilmweb.com
           VALID       SYNTAX
←[30m←[42m40ton.net
           VALID       SYNTAX
←[30m←[42m7dni.com.pl
           VALID       SYNTAX
←[30m←[42mad.polskiprzemysl.com.pl
           VALID       SYNTAX
←[30m←[42mad.prv.pl
           VALID       SYNTAX
←[30m←[42m300polityka.pl
           VALID       SYNTAX
←[30m←[42madform.net
           VALID       SYNTAX
←[30m←[42maferyprawa.eu
           VALID       SYNTAX
←[30m←[42maferyprawa.eu
           VALID       SYNTAX
←[30m←[42maferyprawa.eu
           VALID       SYNTAX
←[30m←[42maferyprawa.eu
           VALID       SYNTAX
←[30m←[42maferyprawa.eu
           VALID       SYNTAX
←[30m←[42maferyprawa.eu
           VALID       SYNTAX
←[30m←[42maferyprawa.eu
           VALID       SYNTAX
←[30m←[42maferyprawa.eu
           VALID       SYNTAX
←[30m←[42maferyprawa.eu
           VALID       SYNTAX
←[30m←[42maferyprawa.eu
           VALID       SYNTAX
←[30m←[42maktyw14.net
           VALID       SYNTAX
←[30m←[42maktyw14.net
           VALID       SYNTAX
←[30m←[42maktyw14.net
           VALID       SYNTAX
←[30m←[42malicdn.com
           VALID       SYNTAX
←[30m←[42mtelchina.pl
           VALID       SYNTAX
←[30m←[42mallebiznes.pl
           VALID       SYNTAX
←[30m←[42mwlodawa.net
           VALID       SYNTAX
←[30m←[42mandroidpolska.pl
           VALID       SYNTAX
←[30m←[42mangielskieespresso.pl
           VALID       SYNTAX
←[30m←[42mbelekaj.eu
           VALID       SYNTAX
←[30m←[42mapp.travellead.pl
           VALID       SYNTAX
←[30m←[42marcheton.pl
           VALID       SYNTAX
←[30m←[42marpass.nazwa.pl
           VALID       SYNTAX
←[30m←[42matthost.pl
           VALID       SYNTAX
←[30m←[42mwarownie.pl
           VALID       SYNTAX
←[30m←[42maudiostereo.pl
           VALID       SYNTAX
←[30m←[42mautoline.com.pl
           VALID       SYNTAX
←[30m←[42mautomotivesuppliers.pl
           VALID       SYNTAX
←[30m←[42mautomotivesuppliers.pl
           VALID       SYNTAX
←[30m←[42mautorak.com.pl
           VALID       SYNTAX
←[30m←[42mb24tv.pl
           VALID       SYNTAX
←[30m←[42mnadwisla24.pl
           VALID       SYNTAX
←[30m←[42mb24tv.pl
           VALID       SYNTAX
←[30m←[42mbaby-shower.pl
           VALID       SYNTAX
←[30m←[42mbankier.pl
           VALID       SYNTAX
←[30m←[42mziemiakepinska.pl
           VALID       SYNTAX
←[30m←[42mbatuu.pl
           VALID       SYNTAX
←[30m←[42mwizaz.pl
           VALID       SYNTAX
←[30m←[42mbaxu.pl
           VALID       SYNTAX
←[30m←[42mbeerpubs.pl
           VALID       SYNTAX
←[30m←[42mbetonline.net.pl
           VALID       SYNTAX
←[30m←[42mbezale.pl
           VALID       SYNTAX
←[30m←[42mbezale.pl
           VALID       SYNTAX
←[30m←[42mbezale.pl
           VALID       SYNTAX
←[30m←[42mbezale.pl
           VALID       SYNTAX
←[30m←[42mbezale.pl
           VALID       SYNTAX
←[30m←[42mbezale.pl
           VALID       SYNTAX
←[30m←[42mbezale.pl
           VALID       SYNTAX
←[30m←[42mbezale.pl
           VALID       SYNTAX
←[30m←[42mbielskiedrogi.pl
           VALID       SYNTAX
←[30m←[42mbielskiedrogi.pl
           VALID       SYNTAX
←[30m←[42mbielskiedrogi.pl
           VALID       SYNTAX
←[30m←[42mbiotechnologia.pl
           VALID       SYNTAX
←[30m←[42mbitcoin-online.pl
           VALID       SYNTAX
←[30m←[42mbithub.pl
           VALID       SYNTAX
←[30m←[42mblendy.pl
           VALID       SYNTAX
←[30m←[42mblogomaniak.pl
           VALID       SYNTAX
←[30m←[42mblogprezesa.pl
           VALID       SYNTAX
←[30m←[42mblogspot.com
           VALID       SYNTAX
←[30m←[42mmistrzbranzy.pl
           VALID       SYNTAX
←[30m←[42mbobrowniki.tv
           VALID       SYNTAX
←[30m←[42mbobrowniki.tv
           VALID       SYNTAX
←[30m←[42mbokser.org
           VALID       SYNTAX
←[30m←[42mbolec.info
           VALID       SYNTAX
←[30m←[42mbooking.com
           VALID       SYNTAX
←[30m←[42mkazimierzdolny24.pl
           VALID       SYNTAX
←[30m←[42mbronradom.pl
           VALID       SYNTAX
←[30m←[42mbronradom.pl
           VALID       SYNTAX
←[30m←[42mbronradom.pl
           VALID       SYNTAX
←[30m←[42mbronradom.pl
           VALID       SYNTAX
←[30m←[42mbronradom.pl
           VALID       SYNTAX
←[30m←[42mbstok.pl
           VALID       SYNTAX
←[30m←[42mburdadigital.pl
           VALID       SYNTAX
←[30m←[42mfocus.pl
           VALID       SYNTAX
←[30m←[42mbusiarze.com.pl
           VALID       SYNTAX
←[30m←[42mbusiarze.com.pl
           VALID       SYNTAX
←[30m←[42mbytomski.pl
           VALID       SYNTAX
←[30m←[42mc.spolecznosci.net
           VALID       SYNTAX
←[30m←[42mdlastudenta.pl
           VALID       SYNTAX
←[30m←[42mcba.pl
           VALID       SYNTAX
←[30m←[42mcdn-lubimyczytac.pl
           VALID       SYNTAX
←[30m←[42mlubimyczytac.pl
           VALID       SYNTAX
←[30m←[42mcdn.dcsaas.net
           VALID       SYNTAX
←[30m←[42msklepbazant.pl
           VALID       SYNTAX
←[30m←[42mswiatmodeli.eu
           VALID       SYNTAX
←[30m←[42marchigame.pl
           VALID       SYNTAX
←[30m←[42mszczecinek.com
           VALID       SYNTAX
←[30m←[42mceneo.pl
           VALID       SYNTAX
←[30m←[42mexerim.pl
           VALID       SYNTAX
←[30m←[42mkomputery-pc.info
           VALID       SYNTAX
←[30m←[42mpmi24.info
           VALID       SYNTAX
←[30m←[42mpch24.info
           VALID       SYNTAX
←[30m←[42mnowinylokalne.pl
           VALID       SYNTAX
←[30m←[42mpgo24.pl
           VALID       SYNTAX
←[30m←[42mppw.fishing
           VALID       SYNTAX
←[30m←[42mtromil.pl
           VALID       SYNTAX
←[30m←[42mgrojec24.net
           VALID       SYNTAX
←[30m←[42mcentrumkultury.eu
           VALID       SYNTAX
←[30m←[42mcentrumkultury.eu
           VALID       SYNTAX
←[30m←[42mceny-zlomu.pl
           VALID       SYNTAX
←[30m←[42mchemiabudowlana.info
           VALID       SYNTAX
←[30m←[42mceny-zlomu.pl
           VALID       SYNTAX
←[30m←[42mchemiaibiznes.com.pl
           VALID       SYNTAX
←[30m←[42mchlodnictwoiklimatyzacja.pl
           VALID       SYNTAX
←[30m←[42mciechanowinaczej.pl
           VALID       SYNTAX
←[30m←[42mciechanowinaczej.pl
           VALID       SYNTAX
←[30m←[42mcmas.pl
           VALID       SYNTAX
←[30m←[42mcn-tryton.pl
           VALID       SYNTAX
←[30m←[42mcodziennikmlawski.pl
           VALID       SYNTAX
←[30m←[42mcodziennikmlawski.pl
           VALID       SYNTAX
←[30m←[42mcodziennikmlawski.pl
           VALID       SYNTAX
←[30m←[42mcontentstream.pl
           VALID       SYNTAX
←[30m←[42mshareinfo.pl
           VALID       SYNTAX
←[30m←[42mcowwilanowie.pl
           VALID       SYNTAX
←[30m←[42mczestochowskie24.pl
           VALID       SYNTAX
←[30m←[42mcyfrowaekonomia.pl
           VALID       SYNTAX
←[30m←[42mczestochowskie24.pl
           VALID       SYNTAX
←[30m←[42mdentoforum.pl
           VALID       SYNTAX
←[30m←[42mdi.com.pl
           VALID       SYNTAX
←[30m←[42mdi.com.pl
           VALID       SYNTAX
←[30m←[42mdirect.money.pl
           VALID       SYNTAX
←[30m←[42mdobrewiadomosci.eu
           VALID       SYNTAX
←[30m←[42mdodajauto.pl
           VALID       SYNTAX
←[30m←[42mdodajauto.pl
           VALID       SYNTAX
←[30m←[42mdogosfera.pl
           VALID       SYNTAX
←[30m←[42mdogosfera.pl
           VALID       SYNTAX
←[30m←[42mdomenergo.com
           VALID       SYNTAX
←[30m←[42mdopilar.pl
           VALID       SYNTAX
←[30m←[42mdopilar.pl
           VALID       SYNTAX
←[30m←[42mdrzewkozabutelke.pl
           VALID       SYNTAX
←[30m←[42mdx-team.org
           VALID       SYNTAX
←[30m←[42mdynacrems.wp.pl
           VALID       SYNTAX
←[30m←[42mdz-ow.pl
           VALID       SYNTAX
←[30m←[42mdziennikpolski24.pl
           VALID       SYNTAX
←[30m←[42mdziennikzwiazkowy.com
           VALID       SYNTAX
←[30m←[42mdzierzgon-twojemiasto.pl
           VALID       SYNTAX
←[30m←[42mdzisiajwgliwicach.pl
           VALID       SYNTAX
←[30m←[42me-hotelarz.pl
           VALID       SYNTAX
←[30m←[42me-hotelarz.pl
           VALID       SYNTAX
←[30m←[42me-kg.pl
           VALID       SYNTAX
←[30m←[42me-kolo.pl
           VALID       SYNTAX
←[30m←[42me-kolo.pl
           VALID       SYNTAX
←[30m←[42me-petrol.pl
           VALID       SYNTAX
←[30m←[42me-pingpong.pl
           VALID       SYNTAX
←[30m←[42me-pingpong.pl
           VALID       SYNTAX
←[30m←[42me-play.eu
           VALID       SYNTAX
←[30m←[42me-pingpong.pl
           VALID       SYNTAX
←[30m←[42me-play.eu
           VALID       SYNTAX
←[30m←[42me-stargard.pl
           VALID       SYNTAX
←[30m←[42mebarlinek.pl
           VALID       SYNTAX
←[30m←[42mebookpoint.pl
           VALID       SYNTAX
←[30m←[42mswiatczytnikow.pl
           VALID       SYNTAX
←[30m←[42mautofanatyk.pl
           VALID       SYNTAX
←[30m←[42mebroker.pl
           VALID       SYNTAX
←[30m←[42micyfrowypolsat.pl
           VALID       SYNTAX
←[30m←[42mmojeanonse.pl
           VALID       SYNTAX
←[30m←[42mec.bankier.pl
           VALID       SYNTAX
←[30m←[42mmavelo.pl
           VALID       SYNTAX
←[30m←[42mmiedziak.info.pl
           VALID       SYNTAX
←[30m←[42mtutajglogow.pl
           VALID       SYNTAX
←[30m←[42mtutajlegnica.pl
           VALID       SYNTAX
←[30m←[42mtutajpolkowice.pl
           VALID       SYNTAX
←[30m←[42mec.bankier.pl
           VALID       SYNTAX
←[30m←[42micyfrowypolsat.pl
           VALID       SYNTAX
←[30m←[42mechogorzowa.pl
           VALID       SYNTAX
←[30m←[46mechogorzowa.pl^
           INVALID     SYNTAX
←[30m←[42medunews.pl
           VALID       SYNTAX
←[30m←[42meduson.pl
           VALID       SYNTAX
←[30m←[42mefilmy.tv
           VALID       SYNTAX
Fatal Error: 'bool' object has no attribute 'replace'
←[30m←[46meduson.pl^
           INVALID     SYNTAX
Traceback (most recent call last):
  File "d:\download_big_temp\_koding\python37\lib\site-packages\PyFunceble\cli\system\launcher.py",
line 864, in start
    self.fill_to_test_queue_from_protocol()
  File "d:\download_big_temp\_koding\python37\lib\site-packages\PyFunceble\cli\system\launcher.py",
line 593, in fill_to_test_queue_from_protocol
    handle_file(protocol)
  File "d:\download_big_temp\_koding\python37\lib\site-packages\PyFunceble\cli\system\launcher.py",
line 533, in handle_file
    cidr2subject=self.cidr2subject,
  File "d:\download_big_temp\_koding\python37\lib\site-packages\PyFunceble\cli\utils\testing.py", li
ne 228, in get_subjects_from_line
    .set_data_to_convert(line)
  File "d:\download_big_temp\_koding\python37\lib\site-packages\PyFunceble\converter\adblock_input_l
ine2subject.py", line 429, in get_converted
    result.update(self._decode_v5(self.data_to_convert))
  File "d:\download_big_temp\_koding\python37\lib\site-packages\PyFunceble\converter\adblock_input_l
ine2subject.py", line 382, in _decode_v5
    result.update(self._decode_options(options.split(",")))
  File "d:\download_big_temp\_koding\python37\lib\site-packages\PyFunceble\converter\adblock_input_l
ine2subject.py", line 211, in _decode_options
    result.add(self.extract_base(matched))
  File "d:\download_big_temp\_koding\python37\lib\site-packages\PyFunceble\converter\adblock_input_l
ine2subject.py", line 156, in extract_base
    subject = subject.replace("*", "").replace("~", "")
AttributeError: 'bool' object has no attribute 'replace'
←[30m←[42mekologia.guru
           VALID       SYNTAX
←[30m←[42megorzow.pl
           VALID       SYNTAX

←[30m←[42mekologia.guru
           VALID       SYNTAX
←[30m←[42mekologia.pl
           VALID       SYNTAX
←[30m←[42mekorodzice.pl
           VALID       SYNTAX
←[30m←[42mekstrastats.pl
           VALID       SYNTAX
Process pyfunceble_tester_worker_2:
Traceback (most recent call last):
Process pyfunceble_producer_worker_1:
  File "d:\download_big_temp\_koding\python37\lib\multiprocessing\process.py", line 297, in _bootstr
ap
    self.run()
Process pyfunceble_tester_worker_1:
  File "d:\download_big_temp\_koding\python37\lib\site-packages\PyFunceble\cli\processes\workers\bas
e.py", line 434, in run
    raise exception
Traceback (most recent call last):
Errors 3 log (Official Polish Filters for AdBlock, uBlock Origin & AdGuard)
D:\download_big_temp\_koding>pyfunceble --syntax --adblock --aggressive -f adblock_ublock.txt

########  ##    ## ######## ##     ## ##    ##  ######  ######## ########  ##       ########
##     ##  ##  ##  ##       ##     ## ###   ## ##    ## ##       ##     ## ##       ##
##     ##   ####   ##       ##     ## ####  ## ##       ##       ##     ## ##       ##
########     ##    ######   ##     ## ## ## ## ##       ######   ########  ##       ######
##           ##    ##       ##     ## ##  #### ##       ##       ##     ## ##       ##
##           ##    ##       ##     ## ##   ### ##    ## ##       ##     ## ##       ##
##           ##    ##        #######  ##    ##  ######  ######## ########  ######## ########

You are using the Beta version of PyFunceble 4.0.0!
Please take the time to communicate with us when you notice
something unusual.


Fatal Error: 'bool' object has no attribute 'replace'
Traceback (most recent call last):
  File "d:\download_big_temp\_koding\python37\lib\site-packages\PyFunceble\cli\system\launcher.py",
line 864, in start
    self.fill_to_test_queue_from_protocol()
  File "d:\download_big_temp\_koding\python37\lib\site-packages\PyFunceble\cli\system\launcher.py",
line 593, in fill_to_test_queue_from_protocol
    handle_file(protocol)
  File "d:\download_big_temp\_koding\python37\lib\site-packages\PyFunceble\cli\system\launcher.py",
line 533, in handle_file
    cidr2subject=self.cidr2subject,
  File "d:\download_big_temp\_koding\python37\lib\site-packages\PyFunceble\cli\utils\testing.py", li
ne 228, in get_subjects_from_line
    .set_data_to_convert(line)
  File "d:\download_big_temp\_koding\python37\lib\site-packages\PyFunceble\converter\adblock_input_l
ine2subject.py", line 429, in get_converted
    result.update(self._decode_v5(self.data_to_convert))
  File "d:\download_big_temp\_koding\python37\lib\site-packages\PyFunceble\converter\adblock_input_l
ine2subject.py", line 382, in _decode_v5
    result.update(self._decode_options(options.split(",")))
  File "d:\download_big_temp\_koding\python37\lib\site-packages\PyFunceble\converter\adblock_input_l
ine2subject.py", line 211, in _decode_options
    result.add(self.extract_base(matched))
  File "d:\download_big_temp\_koding\python37\lib\site-packages\PyFunceble\converter\adblock_input_l
ine2subject.py", line 156, in extract_base
    subject = subject.replace("*", "").replace("~", "")
AttributeError: 'bool' object has no attribute 'replace'

D:\download_big_temp\_koding>

funilrys added a commit that referenced this issue Mar 18, 2021
This patch fixes the error provided by @keczuppp at #13.

Contributors:
  * @keczuppp
@funilrys
Copy link
Owner Author

@keczuppp, b37 is available and it should fix the error you reported.

Thanks again for testing !

@funilrys
Copy link
Owner Author

@keczuppp, the adblock-decoder is also upgraded to use the 4.0.0bX of PyFunceble.

@keczuppp
Copy link

keczuppp commented Mar 18, 2021

yep, good work:

more tests later

@keczuppp
Copy link

And don't laught at me, fvcktard.

@keczuppp
Copy link

keczuppp commented Mar 19, 2021

  • strange, I tried today the standalone Adblock Decoder and it crashes and throws errors
  • also it wasn't crashing yesterday, and I didn't change anything since yesterday, but I'm not sure whether I have been testing the same filter lists yesterday or not, so perhaps the bug was here already yesterday
spoiler errors log

D:\download_big_temp\_koding>adblock2plain --aggressive -o output2.txt easylistpolish.txt
Traceback (most recent call last):
  File "d:\download_big_temp\_koding\python37\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "d:\download_big_temp\_koding\python37\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "D:\download_big_temp\_koding\Python37\Scripts\adblock2plain.exe\__main__.py", line 7, in <mo
dule>
  File "d:\download_big_temp\_koding\python37\lib\site-packages\adblock_decoder\cli.py", line 104, i
n adblock2plain
    args.input_file, args.aggressive, output=args.output
  File "d:\download_big_temp\_koding\python37\lib\site-packages\adblock_decoder\core\adblock2plain.p
y", line 80, in process_conversion
    for line in self.input:
  File "d:\download_big_temp\_koding\python37\lib\encodings\cp1250.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 1350: character maps to <unde
fined>

D:\download_big_temp\_koding>adblock2plain --aggressive -o output2.txt easylist.txt
Traceback (most recent call last):
  File "d:\download_big_temp\_koding\python37\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "d:\download_big_temp\_koding\python37\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "D:\download_big_temp\_koding\Python37\Scripts\adblock2plain.exe\__main__.py", line 7, in <mo
dule>
  File "d:\download_big_temp\_koding\python37\lib\site-packages\adblock_decoder\cli.py", line 104, i
n adblock2plain
    args.input_file, args.aggressive, output=args.output
  File "d:\download_big_temp\_koding\python37\lib\site-packages\adblock_decoder\core\adblock2plain.p
y", line 80, in process_conversion
    for line in self.input:
  File "d:\download_big_temp\_koding\python37\lib\encodings\cp1250.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x83 in position 5977: character maps to <unde
fined>

@funilrys
Copy link
Owner Author

And don't laught at me, fvcktard.

Who is laughing at you?

We are all here for some constructive work, enhancement, and discussion in our free time.
I personally take any input I can get regarding the decoder. I don't have time to laugh at someone when they are giving constructive inputs.

If it is because of my emoji, sorry if it offended you. It wasn't meant to harm.


Your last 3 cases are now into the source code that's going to be deployed next.

I'm going to look into the issue of the standalone decoder later.

funilrys added a commit to PyFunceble/adblock-decoder that referenced this issue Mar 19, 2021
@funilrys
Copy link
Owner Author

@keczuppp Please update and test the adblock-decoder

@keczuppp
Copy link

keczuppp commented Mar 20, 2021

funilrys :
I'm going to look into the issue of the standalone decoder later.
@keczuppp Please update and test the adblock-decoder

  • it's fixed now in 1.2.0

funilrys :
Your last 3 cases are now into the source code that's going to be deployed next.

  • it's fixed now in the standalone Adblock Decoder 1.2.0
  • but in the embeded Adblock Decoder PyFunceble 4.0.0b39 it's fixed at 66%
    because site21.com is still not being extracted

Also:

  • I know you have a reason to not do so in the embeded Adblock Decoder,
    but shouldn't at least the standalone Adblock Decoder remove duplicates and then sort in a-z order?
    If not internally, then at least in the output.
    A parameter like --clean could be added.

==================================================

OFF-TOPIC

funilrys :
Who is laughing at you?
If it is because of my emoji,

Yes, it was about "simple way" + the emoji, your comment looks like you wanted to show how stupid I am just because I missed something which you describe as "simple" (a parameter + the fact it should be put in the conjunction with other parameters) in the (big) documentation, which might be not so obvious.

Was it more funny to you, than your friend being unable to view history of my comment #13 (comment) ?
Why didn't you laught at him the same like at me?
Oh, because he is your firend, so you can't laught at your friends, just like you did at me.
I could laught (by putting a laugh emoji) at him just like you at me,
furthermore, I could lught at you (by putting a laugh emoji), every time the Adblock Decoder or PyFunceble crashes making you looking like a fool (but your are a good developer and bugs are normal thing in programming, unavoidable by a human being).
And then say: "sorry if it offended you. It wasn't meant to harm." but I'm not a such person.

funilrys : sorry if it offended you. It wasn't meant to harm.

Really? Then why didn't you explain what was the purpose of the emoiji in your comment then.
Do you want to just tell me you put the emoji for no reason, or just because you were in a happy mood and just by accident it was looking like you were laughting at me...Saying "It wasn't meant to harm you" and at the same time avoiding to explain what was the purpose of the emojii then , seems not too clear. I don't believe your cheap explanation, lie to yourself, I spent much time analysing, whether your intention was to laught at me or not, and something said me very clear you were. Just don't do it again at me, or watch where you put laugh emojis, it's not Facebook, but since they implemented emojis in GitHub it seems it turned into FaceHub, abused by trolls, they abuse emojis to troll other people at every occasion., it's a plague and infection of GitHub. I consider you a positive person and great developer overall, but just don't do it again.

@keczuppp
Copy link

keczuppp commented Mar 21, 2021

As for the domain= filters (for ex. domain=page1.com|page2.com):

  • the last 3 failures from my previous comment were just example failures, the point is the decoder, instead of fixing particular domain= failures, should extract all domains regardless what is on the left or right side of domain=, this inculdes @@ filters (because in --aggressive mode we should extract all domains)
  • but currently the decoder extracts about a half of domains, to prove it I copy-pasted all domain= lines from several popular / big adblock filter lists + two polish lists: EasyList, EasyPrivacy, AdGuard Base, AdGuard Tracking Protection, Official Polish Filters for AdBlock, uBlock Origin & AdGuard, EasyList Polish and put into a single file / list (see domain=.zip), which contain about 11425 domains, but Adblock Decoder extracts only about a half (5426 domains)
  • by the way, the number of false positives is reasonably low, 39 out of 5426 domains :
false hits
&Type=Event.CPT&
-300x250.
-Background-1280x10241.
-spotify-com.akamaized.net
-tag.js
.
.cdn.digitaloceanspaces.com
.cdnjquery.com
.ch
.com
.criteo.com
.criteo.net
.digitaloceanspaces.com
.engageya.com
.filma24.
.gif
.html
.html|
.imagetwist.com
.impact-ad.jp
.jpg
.js
.js|
.m3u8
.min.js
.mp3
.mp4
.mp4.kakaoad.
.mp4|
.netdna-ssl.com
.php
.pl
.pornhub.com
.r.msn.com
.roofandfloor.com
.smithsonian.museum
.ssl-images-amazon.com
.ts
.xml

domain=.zip

@spirillen

This comment has been minimized.

@keczuppp

This comment has been minimized.

@spirillen

This comment has been minimized.

@funilrys

This comment has been minimized.

@keczuppp

This comment has been minimized.

@spirillen

This comment has been minimized.

@keczuppp
Copy link

keczuppp commented Apr 1, 2021

funilrys, can we get some cleaning in this thread, could you put in the spoiler your OFF-TOPIC #13 (comment), just like I did with my OFF-TOPICS, thx

@keczuppp
Copy link

keczuppp commented Aug 24, 2021

OK, so I've just tested the newest PyFunceble dev right now and I've noticed that the reported issues mentioned in :
#13 (comment) and #13 (comment) have been fixed.


The summarision:

keczuppp:
As for the last 3 failures, many of such failures can be found in
https://easylist-downloads.adblockplus.org/easylistpolish.txt
The list contains about 2961 domains, but only 2459 are found by
Adblock Decoder (with --aggressive option), which gives 83% efficiency.

  • the current EasyList Polish contains 3190 domains and the newest Pyfunceble has extracted 3106 what gives 97% efficiency compared to 83% previously

keczuppp:
currently the decoder extracts about a half of domains, to prove it I copy-pasted all domain= lines from several popular / big adblock filter lists + two polish lists: EasyList, EasyPrivacy, AdGuard Base, AdGuard Tracking Protection, Official Polish Filters for AdBlock, uBlock Origin & AdGuard, EasyList Polish and put into a single file / list (see domain=.zip), which contain about 11425 domains, but Adblock Decoder extracts only about a half (5426 domains)

  • the newest PyFunceble has extracted 8298 what gives 73% compared to previously 48%

Good improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants