Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Creating a List of Tags to Ignore/Use for Booru sites #2446

Closed
ShiroyukiX opened this issue Mar 26, 2022 · 12 comments
Closed

[Question] Creating a List of Tags to Ignore/Use for Booru sites #2446

ShiroyukiX opened this issue Mar 26, 2022 · 12 comments

Comments

@ShiroyukiX
Copy link

(Please excuse my inexperience with github. It's my first time ever posting here.)

How would I go about creating a blacklist of tags for booru sites (rule34us, gelbooru, danbooru, safebooru, etc) or any that use tags/tag-like systems in organizing artwork?

I checked the Issues page and configuration doc extensively and cannot find a solution. I know you can use --filter "'TAG' not in tag" but haven't seen any mention of a case with multiple tags besides --filter "'TAG1' not in tag and 'TAG2' not in tag". I tried image-filter but my understanding of Python and JSON is nonexistent. I tend to search these sites and download by artist or character, blocking any tags I dislike through a blacklist; I can't catch every single weird tag used for the same subject so the blacklist is big. I don't have any account with these websites, if that information is relevant.

Is this even possible to begin with? I want to believe there is a solution to filter the results without have to input multiple and statements into a command.

@mikf
Copy link
Owner

mikf commented Mar 28, 2022

I wanted to suggest using something like

"image-filter": "not any(t in tags for t in ('tag1', 'tag2', 'tag3', 'tag4'))"

and that would theoretically work, but it doesn't due to how Python handles variable look-ups. You get a "NameError: name 'tags' is not defined" if you try, even though it is defined.

Fixing the root cause of this is possible, but complicated.
Would it be OK to have a function that does this check, e.g. something like

"image-filter": "not contains(tags, ('tag1', 'tag2', 'tag3', 'tag4'))"

because adding just that is rather easy.

@Hrxn
Copy link
Contributor

Hrxn commented Mar 28, 2022

[..]. Would it be OK to have a function that does this check, e.g. something like

"image-filter": "not contains(tags, ('tag1', 'tag2', 'tag3', 'tag4'))"

because adding just that is rather easy.

I think this is the best solution for such cases, yes..

@ShiroyukiX
Copy link
Author

[..]. Would it be OK to have a function that does this check, e.g. something like

"image-filter": "not contains(tags, ('tag1', 'tag2', 'tag3', 'tag4'))"

because adding just that is rather easy.

I think this is the best solution for such cases, yes..

i tried to use this for both rule34us and the base gelbooru module but the program spits out the following error:

gelbooru: FilterError: Evaluating filter expression failed (NameError: name 'contains' is not defined)
rule34us: FilterError: Evaluating filter expression failed (NameError: name 'contains' is not defined)

this is how i have it written in the config file, which is a copy of the gallery-dl-example.conf with no changes. Let me know if you need the whole document. Maybe I'm placing it in the wrong spot?

NOTE: I tried these two tags with/without the underscore as a test for this.

        "rule34us":
        {
            "image-filter": "not contains(tags, ('azur_lane', 'genshin_impact'))"
        },

        "gelbooru":
        {
            "image-filter": "not contains(tags, ('azur_lane', 'genshin_impact'))"
        },

@Hrxn
Copy link
Contributor

Hrxn commented Mar 29, 2022

No, you're simply a bit too early, this function is not added yet! 😄

@github-account1111
Copy link

github-account1111 commented Mar 29, 2022

Is this request for excluding certain tags from e.g. filenames or just straight up not downloading the files that contain certain tags?

@ShiroyukiX
Copy link
Author

Is this request for excluding certain tags from e.g. filenames or just straight up not downloading the files that contain certain tags?

this is to avoid downloading files that contain certain tags.

mikf added a commit that referenced this issue Mar 30, 2022
and add it to globals() in compiled expressions for --filter etc
@mikf
Copy link
Owner

mikf commented Mar 30, 2022

What you tried in #2446 (comment) should now work (413b777), but be aware that different boorus have different tag structures. Some use underscores, some spaces, and for some it is called tag_string instead of tags.

@ShiroyukiX
Copy link
Author

What you tried in #2446 (comment) should now work (413b777), but be aware that different boorus have different tag structures. Some use underscores, some spaces, and for some it is called tag_string instead of tags.

Had to install the latest dev version to test it out and it appears to work. On that note, for the different boorus, what command do I use to determine the tag structure before downloading? I assume -j.

Also, for tag_string, i replace (tags, ('tag1', 'tag2')) with (tag_string, ('tag1', 'tag2')) for boorus that use this option, if I'm understanding what you're saying is correct.

@ShiroyukiX
Copy link
Author

ShiroyukiX commented Apr 2, 2022

I may have ran into an issue, though I cannot say if it's a filtering or tag reading problem (or whatever the heck it is).

While attempting to download from rule34us shinano_(azur_lane) artwork, it for some reason only grabs 10 files. I "sync" my image-filter and site blacklist for make sure I have both up-to-date and my current setup should have about 90 files download from the search; it does not. I'm certain that what appears in my search will download from previous attempts, so I don't believe it's a tag I filtered out.

EDIT: it does appear that gallery-dl is skipping these files. I tried downloading a specific result from what the search gave me and it doesn't appear in the log nor my directory. I checked the JSON with -j and none of the tags listed are filtered out.

EDIT2: Checked a new search with an artist and, again, it's downloading less than what my search result is giving (41 / 53). Commenting out the tags fixes the result so I'm assuming it's a tagging issue of some sort; not sure if it's only rule34us that does this.

@ShiroyukiX
Copy link
Author

I may have ran into an issue, though I cannot say if it's a filtering or tag reading problem (or whatever the heck it is).

While attempting to download from rule34us shinano_(azur_lane) artwork, it for some reason only grabs 10 files. I "sync" my image-filter and site blacklist for make sure I have both up-to-date and my current setup should have about 90 files download from the search; it does not. I'm certain that what appears in my search will download from previous attempts, so I don't believe it's a tag I filtered out.

EDIT: it does appear that gallery-dl is skipping these files. I tried downloading a specific result from what the search gave me and it doesn't appear in the log nor my directory. I checked the JSON with -j and none of the tags listed are filtered out.

EDIT2: Checked a new search with an artist and, again, it's downloading less than what my search result is giving (41 / 53). Commenting out the tags fixes the result so I'm assuming it's a tagging issue of some sort; not sure if it's only rule34us that does this.

I may have figured out the problem, though results may vary.

Gallery-dl skips all sub-categorized versions of a tag if it's filtered with image-filter. For example, if you blacklist animal, gallery-dl will skip any image with a related tag (animal_ears, animal_humanoid, etc), even if animal is not used for the image. You have to blacklist the sub-categorized versions only to avoid the issue.

This only appears to happen with rule34us as their tagging system is janky (character tags are under general tags, for instance), but it may happen for other booru sites. Also, defining the first argument as tags_general seems to work as well.

I'm still not sure if this problem is site-specific or program-specific, though my conjecture favors the former. You may want to test it out for other sites to confirm your search result numbers match the number of files downloaded, and make sure to define your filter as mentioned in #2446 (comment) to catch the tags accurately.

mikf added a commit that referenced this issue Apr 8, 2022
add a third argument that gets used
when the values o search are given as a string
@YooPita
Copy link

YooPita commented Mar 29, 2023

I will leave the information here, since there are no more similar topics on the Internet. Perhaps someone will be useful. I needed to exclude two tags at the same time on the e621 site and I did not understand how. But after researching the problem, I came up with the following solution for the filter:

--filter "not ('tag1' in tags['general'] and 'tag2' in tags['general'])"

You can also use it in a config file gallery-dl.conf:

"image-filter": "not ('tag1' in tags['general'] and 'tag2' in tags['general'])"
or additional example
"image-filter": "not ('tag1' in tags['general'] and ('tag2' in tags['general'] or 'tag3' in tags['general']))"

@rautamiekka
Copy link
Contributor

"image-filter": "not ('tag1' in tags['general'] and ('tag2' in tags['general'] or 'tag3' in tags['general']))"

Code like that insta-boils my piss no matter the language, instead try this untested adaptation of this StackOverflow answer:

"image-filter": "any(True for _e in ('tag1','tag2','tag3') if _e in tags['general'])"

Not sure if you need a list comprehension (any([...])) for this trick since I didn't test it, I sure hope not.

It should return immediately if a blacklisted keyword is found and being a generator, should use minimal RAM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants