Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions, Feedback, and Suggestions #4 #5262

Open
mikf opened this issue Mar 1, 2024 · 229 comments
Open

Questions, Feedback, and Suggestions #4 #5262

mikf opened this issue Mar 1, 2024 · 229 comments

Comments

@mikf
Copy link
Owner

mikf commented Mar 1, 2024

Continuation of the previous issue as a central place for any sort of question or suggestion not deserving their own separate issue.

Links to older issues: #11, #74, #146.

@BakedCookie
Copy link

For most sites I'm able to sort files into year/month folders like this:

"directory": ["{category}", "{search_tags}", "{date:%Y}", "{date:%m}"]

However for redgifs it doesn't look like there's a date keyword available for directory. There's only a date keyword available for filename. Is this an oversight?

@mikf
Copy link
Owner Author

mikf commented Mar 2, 2024

Yep, that's a mistake that happened when adding support for galleries in 5a6fd80.
Will be fixed with the next git push.

edit: 82c73c7

@taskhawk
Copy link

taskhawk commented Mar 6, 2024

There's a typo in extractor.reddit.client-id & .user-agent:

"I'm not a rebot"

@the-blank-x
Copy link
Contributor

There's also another typo in extractor.reddit.client-id & .user-agent, "reCATCHA"

@biggestsonicfan
Copy link

Can you grab all the media from quoted tweets? Example.

mikf added a commit that referenced this issue Mar 7, 2024
#5262 (comment)

It's implemented as a search for 'quoted_tweet_id:…' on Twitter.
mikf added a commit that referenced this issue Mar 7, 2024
#5262 (comment)

This on was on the same line as the previous one ... (9fd851c)
@mikf
Copy link
Owner Author

mikf commented Mar 7, 2024

Regarding typos, thanks for pointing them out.
I would be surprised if there aren't at least 10 more somewhere in this file.

@biggestsonicfan
This is implemented as a search for quoted_tweet_id:…- on Twitter's end.
I've added an extractor for it similar to the hashtags one (40c0553), but it only does said search under the hood.

@BakedCookie
Copy link

BakedCookie commented Mar 7, 2024

Normally %-encoded characters in the URL get converted nicely when running gallery-dl, eg.

https://gelbooru.com/index.php?page=post&s=list&tags=nighthawk_%28circle%29
gives me a nighthawk_(circle) folder

but for this url:
https://gelbooru.com/index.php?page=post&s=list&tags=shin%26%23039%3Bya_%28shin%26%23039%3Byanchi%29

I'm getting a shin'ya_(shin'yanchi) folder. Shouldn't I be getting a shin'ya_(shin'yanchi) folder instead?

EDIT: Actually, I think there's just something wrong with that URL. I had it saved for a long time and searching that tag normally gives a different URL (https://gelbooru.com/index.php?page=post&s=list&tags=shin%27ya_%28shin%27yanchi%29). I still got valid posts from the weird URL so I didn't think much of it.

@mikf
Copy link
Owner Author

mikf commented Mar 7, 2024

%28 and so on are URL escaped values, which do get resolved.
#039; is the HTML escaped value for '.

You could use {search_tags!U} to convert them.

@taskhawk
Copy link

taskhawk commented Mar 8, 2024

Is there support to remove metadata like this?

gallery-dl -K https://www.reddit.com/r/carporn/comments/axo236/mean_ctsv/

...
preview['images'][N]['resolutions'][N]['height']
  144
preview['images'][N]['resolutions'][N]['url']
  https://preview.redd.it/mcerovafack21.jpg?width=108&crop=smart&auto=webp&s=f8516c60ad7fa17c84143d549c070738b8bcc989
preview['images'][N]['resolutions'][N]['width']
  108
...

Post-processor:

"filter-metadata":
    {
      "name": "metadata",
      "mode": "delete",
      "event": "prepare",
      "fields": ["preview[images][0][resolutions]"]
    }

I've tried a few variations but no dice.

"fields": ["preview[images][][resolutions]"]
"fields": ["preview[images][N][resolutions]"]
"fields": ["preview['images'][0]['resolutions']"]

@YuanGYao
Copy link

YuanGYao commented Mar 8, 2024

Hello, I left a comment in #4168 . Does the _pagination method of the WeiboExtractor class in weibo.py return when data["list"] is an empty list?
When I used gallery-dl to batch download the album page of Weibo, the download also appeared incomplete.
Through testing on the web page, I found that Weibo's getImageWall api sometimes returns an empty list when the image is not completely loaded. I think this may be what causes gallery-dl to terminate the download.

@mikf
Copy link
Owner Author

mikf commented Mar 8, 2024

@taskhawk
fields selectors are quite limited and can't really handle lists.
You might want to use a python post processor (example) and write some code that does this.

def remove_resolutions(metadata):
    for image in metadata["preview"]["images"]:
        del image["resolutions"]

(untested, might need some check whether preview and/or images exists)

@YuanGYao
Yes, the code currently stops when Weibo's API returns no more results (empty list).
This is probably not ideal, as I've hinted at in #4168 (comment)

@YuanGYao
Copy link

YuanGYao commented Mar 9, 2024

@mikf
Well, I think for Weibo's album page, since_id should be used to determine whether the image is fully loaded.
I updated my comment in #4168(comment) and attached the response returned by Weibo's getImageWall API.
I think this should help solve this problem.

@BakedCookie
Copy link

Not sure if I'm missing something, but are directory specific configurations exclusive to running gallery-dl via the executable?

Basically, I have a directory for regular tags, and a directory for artist tags. For regular tags I use "directory": ["{category}", "{search_tags}", "{date:%Y}", "{date:%m}"] since the tag number is manageable. For artist tags though, there's way more of them so this "directory": ["{category}", "{search_tags[0]!u}", "{search_tags}", "{date:%Y}", "{date:%m}"] makes more sense.

So right now the only way I know to get this per-directory configuration to work, is to copy the gallery-dl executable everywhere I want to use a master configuration override. Am I missing something? It feels like there should be a better way.

@Hrxn
Copy link
Contributor

Hrxn commented Mar 11, 2024

Huh? No, the configuration works always in the same way. You're simply using different configuration files?

@BakedCookie
Copy link

@Hrxn

From the readme:

When run as executable, gallery-dl will also look for a gallery-dl.conf file in the same directory as said executable.

It is possible to use more than one configuration file at a time. In this case, any values from files after the first will get merged into the already loaded settings and potentially override previous ones.

I want to override my master configuration %APPDATA%\gallery-dl\config.json in specific directories with a local gallery-dl.conf but it seems like that's only possible with the standalone executable.

@taskhawk
Copy link

taskhawk commented Mar 11, 2024

You can load additional configuration files from the console with:

-c, --config FILE           Additional configuration files

You just need to specify the path to the file and any options there will overwrite your main configuration file.

Edit: From my understanding, yeah, automatic loading of local config files in each directory is only possible having the standalone executable in each directory. Are different directory options the only thing you need?

@BakedCookie
Copy link

@taskhawk

Thanks, that's exactly what I was looking for! Guess I didn't read the documentation thoroughly enough.

For now the only thing I'd want to override is the directory structure for artist tags. I don't think it's possible to determine from the metadata alone if a given tag is the name of an artist or not, so I thought the best way to go about it is to just have a separate directory for artists, and use a configuration override. So yeah, loading that override with the -c flag works great for that purpose, thanks again!

@taskhawk
Copy link

taskhawk commented Mar 11, 2024

You kinda can, but you need to enable tags for Gelbooru in your configuration to get them, which will require an additional request:

    "gelbooru": {
      "directory": {
        "search_tags in tags_artists": ["{category}", "{search_tags[0]!u}", "{search_tags}", "{date:%Y}", "{date:%m}"],
        ""                           : ["{category}", "{search_tags}", "{date:%Y}", "{date:%m}"]
      },
      "tags": true
    },

Set "tags": true in your config and run a test with gallery-dl -K "https://gelbooru.com/index.php?page=post&s=list&tags=TAG" so you can see the tags_* keywords.

Of course, this depends on the artists being correctly tagged. Not sure if it happens on Gelbooru, but at least in other boorus and booru-like sites I've come across posts with the artist tagged as a general tag instead of an artist tag. Another limitation is that your search tag can only include one artist at a time, doing more will require a more complex expression to check all tags are present in tags_artists.

What I do instead is that I inject a keyword to influence where it will be saved, like this:

gallery-dl -o keywords='{"search_tags_type":"artists"}' "https://gelbooru.com/index.php?page=post&s=list&tags=ARTIST"

And in my config I have

    "gelbooru": {
      "directory": ["boorus", "{search_tags_type}", "{search_tags}"]
    },

You can have:

    "gelbooru": {
      "directory": {
        "search_tags_type == 'artists'": ["{category}", "{search_tags[0]!u}", "{search_tags}", "{date:%Y}", "{date:%m}"],
        ""                             : ["{category}", "{search_tags}", "{date:%Y}", "{date:%m}"]
      }
    },

You can do this for other tag types, like general, copyright, characters, etc.

Because it's a chore to type that option every time I made a wrapper script, so I just call it like this because artists is my default:

~/script.sh "TAG"

For other tag types I can do:

~/script.sh --copyright "TAG"
~/script.sh --characters "TAG"
~/script.sh --general "TAG"

@BakedCookie
Copy link

Thanks for pointing out there's a tags option available for the gelbooru extractor. I already used it in the kemono extractor to get the name of the artist, but it didn't occur to me that gelbooru might also have such an option (and just accepted that the tags aren't categorized).

For artists I store all the url's in their respective gelbooru.txt, rule34.txt, etc files like so:

https://gelbooru.com/index.php?page=post&s=list&tags=john_doe
https://gelbooru.com/index.php?page=post&s=list&tags=blue-senpai
https://gelbooru.com/index.php?page=post&s=list&tags=kaneru
.
.
.

And then just run gallery-dl -c gallery-dl.conf -i gelbooru.txt. Since the search_tags ends up being the artist anyway, getting tags_artists is probably not worth the extra request. Same for general tags, and copyright tags, in their respective directories. With this workflow I can't immediately see where I'd be able to utilize keyword injection, but it's definitely a useful feature that I'll keep in mind.

@Wiiplay123
Copy link
Contributor

When I'm making an extractor, what do I do if the site doesn't have different URL patterns for different page types? Every single page is just a numerical ID that could be a forum post, image, blog post, or something completely different.

@mikf
Copy link
Owner Author

mikf commented Mar 19, 2024

@Wiiplay123 You handle everything with a single extractor and decide what type of result to return on the fly. The gofile code is a good example for this I think, or aryion.

@I-seah
Copy link

I-seah commented Mar 20, 2024

Hi, what options should I use in my config file to change the format of dates in metadata files? I would like to use "%Y-%m-%dT%H:%M:%S%z" for the values of "date" and "published" (from coomer/kemono downloads).

And would it also be possible to do this for json files that ytdl creates? I downloaded some videos with gallery-dl but the dates got saved as "upload_date": "20230910" and "timestamp": 1694344011, so I think it might be better to convert the timestamp to a date to get a more precise upload time, but I'm not sure if it's possible to do that either.

@topchaser
Copy link

Trying to download this:
https://misskey.gg/notes/9yp3zt35c3

using:
gallery-dl misskey:https://misskey.gg/notes/9yp3zt35c3

produces this error:
[downloader.http][warning] ('Connection broken: IncompleteRead(0 bytes read, 58762 more expected)', IncompleteRead(0 bytes read, 58762 more expected)) (1/5)

until it hits 5/5 then fails. It happens for all misskey.gg links. In contrast, misskey.io links work without even needing to preface the link with "misskey:". For example:
https://misskey.io/notes/9ru7yqi5u4j6070a

Is there anything I can do to make misskey.gg links work?

@HilalSoorty
Copy link

Trying to download resources from Imgur:
URL: https://imgur.com/search/score?q=code
(CMD: gallery-dl --range 1-10 https://imgur.com/search/score?q=code)
But the --range flag not working properly even after putting this flag it leads to downloading of unlimited resources.

@hunter-gatherer8
Copy link
Contributor

Is there a "correct" way to convert large deviantart "*.gif" files to webm?

I suppose this is doable with "exec" post-processor, but this seems quite tricky, especially given this functionality "almost" exists for ugoira:

  1. You need to check extension somehow, to convert only gifs.
  2. Makes sense to check size, to only convert gifs that are actually 50+MB videos, not pictures.
  3. You have to manually add "rm", "{_path}" to the command, I guess, which seems hacky, and gallery-dl won't "know" anything about it.

So, maybe I'm missing the correct way to do that? And if there is none, maybe it makes sense to add an ugoira-like filter specifically for that to gallery-dl?

@biggestsonicfan
Copy link

I'm finding case-differences in my twitter directory ("Username" vs "UserName"). It's a btrfs partition under linux so it can handle that, but what's the best way to find out what twitter currently considers the case of the username? Should I take a current tweet, convert it to i/web/status and dump json info?

@biggestsonicfan
Copy link

I realize I am double posting here, but I think I have a solution for at least fanbox posts for this. Fanbox metadata has isSupported and I think the supported plan fee amount. Perhaps a new argument to metadata-skip should be supported in which metadata will not be overwritten unless currently supported for that tier/plan?

@mikf
Copy link
Owner Author

mikf commented Oct 16, 2024

@biggestsonicfan
When there is a metadata field like this, you can use a filter statement to control if a post processor should run and potentially overwrite a previous file. You could even use it to put metadata into different supported/not supported directories.

{
    "extractor": {
        "fanbox": {
            "postprocessors": [
                {
                    "name": "metadata",
                    "event": "post",
                    "filename": "{id}.json",
                    "directory": ["metadata", "supported"],
                    "filter": "locals().get('isSupported')"
                },
                {
                    "name": "metadata",
                    "event": "post",
                    "filename": "{id}.json",
                    "directory": ["metadata", "unsupported"],
                    "filter": "not locals().get('isSupported')"
                }
            ]
        }
    }
}

metadata post processor also support archive functionality, by the way.

@biggestsonicfan
Copy link

Holy Christ, I've only recently started using filters with gallery-dl and I didn't realize it had potential like this.

@topchaser
Copy link

Trying to download this: https://misskey.gg/notes/9yp3zt35c3

using: gallery-dl misskey:https://misskey.gg/notes/9yp3zt35c3

produces this error: [downloader.http][warning] ('Connection broken: IncompleteRead(0 bytes read, 58762 more expected)', IncompleteRead(0 bytes read, 58762 more expected)) (1/5)

until it hits 5/5 then fails. It happens for all misskey.gg links. In contrast, misskey.io links work without even needing to preface the link with "misskey:". For example: https://misskey.io/notes/9ru7yqi5u4j6070a

Is there anything I can do to make misskey.gg links work?

I would like to amend this to say that misskey.gg links actually do successfully download, but only sometimes? Specific links appear to seemingly always fail in the way I explained in this reply, but others will succeed no problem. It appears that I simply didn't let the command run long enough to reach media it would successfully download. Maybe I need to be logged in to download everything? I haven't tried that yet, but also I don't think I will, since the profile I tried to scrape hosts their media elsewhere, so I have no incentive to make an account just to test this.

@biggestsonicfan
Copy link

@mikf Looks like adding both those filters as a fanbox postprocessor is throwing everything in "unsupported":

            "postprocessors": [
                {
                    "name": "metadata",
                    "event": "prepare",
                    "mode": "json",
                    "directory": ["json", "supported"],
                    "extension-format": "json",
                    "filter": "locals().get('isSupported')"
                },
                {
                    "name": "metadata",
                    "event": "prepare",
                    "mode": "json",
                    "directory": ["json", "unsupported"],
                    "extension-format": "json",
                    "filter": "not locals().get('isSupported')"
                },
                {
                    "name": "mtime",
                    "event": "file,post"
                }
            ]

Also not entirely sure this would work out well anyway anymore. As higher tier metadata that I don't support also set isSupported to True. I feel like the filter for supported/unsupported should check against my current pledged tier, and should be put in a directory like ["json", "supported", "{feeRequired}"],.

I'll play with the filter system a bit to see if I can fine tune it.

@fireattack
Copy link
Contributor

fireattack commented Oct 20, 2024

When installing from source using python3 -m pip install -U -I --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/master.tar.gz, why it always says "Successfully installed gallery_dl-1.26.7.dev0" or "Successfully installed gallery_dl-1.26.8" (yes it somehow even changes, after i uninstall and try again!) despite we're on 1.27.x for a while?

D:\>python -m pip install -U -I --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/master.tar.gz
Collecting https://github.com/mikf/gallery-dl/archive/master.tar.gz
  Downloading https://github.com/mikf/gallery-dl/archive/master.tar.gz
     | 731.8 kB 4.8 MB/s 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: gallery_dl
  Building wheel for gallery_dl (pyproject.toml) ... done
  Created wheel for gallery_dl: filename=gallery_dl-1.27.7.dev0-py3-none-any.whl size=568515 sha256=3d320818d2c31ee30f0b1225cc3504bb7654c740640bf33ef53048f0bd8cce09
  Stored in directory: C:\Users\ikena\AppData\Local\Temp\pip-ephem-wheel-cache-em7n5c60\wheels\2a\d6\3c\0feca23fdac400e011e04ee81cac7ad83f629ec3e2cc8dc1ed
Successfully built gallery_dl
Installing collected packages: gallery_dl
Successfully installed gallery_dl-1.26.7.dev0

D:\>pip uninstall gallery-dl
Found existing installation: gallery-dl 1.26.7.dev0
Uninstalling gallery-dl-1.26.7.dev0:
  Would remove:
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl-1.26.7.dev0.dist-info\*
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\*
    c:\users\ikena\appdata\local\programs\python\python311\scripts\gallery-dl.exe
  Would not remove (might be manually added):
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\archive.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\2ch.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\agnph.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\ao3.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\bluesky.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\boosty.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\cien.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\civitai.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\cohost.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\hentainexus.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\koharu.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\wikimedia.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\postprocessor\hash.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\postprocessor\rename.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\update.py
Proceed (Y/n)? y
  Successfully uninstalled gallery-dl-1.26.7.dev0

D:\>python -m pip install -U -I --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/master.tar.gz
Collecting https://github.com/mikf/gallery-dl/archive/master.tar.gz
  Downloading https://github.com/mikf/gallery-dl/archive/master.tar.gz
     | 731.8 kB 4.8 MB/s 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: gallery_dl
  Building wheel for gallery_dl (pyproject.toml) ... done
  Created wheel for gallery_dl: filename=gallery_dl-1.27.7.dev0-py3-none-any.whl size=568515 sha256=1558b598d4f62ed90cd0e8a60ff03659fc55991b3678cc90bf289ab37e6d2677
  Stored in directory: C:\Users\ikena\AppData\Local\Temp\pip-ephem-wheel-cache-p2jedlq2\wheels\2a\d6\3c\0feca23fdac400e011e04ee81cac7ad83f629ec3e2cc8dc1ed
Successfully built gallery_dl
Installing collected packages: gallery_dl
Successfully installed gallery_dl-1.26.8

D:\>

I understand it's inaccurate but I can't figure out why. There is no 1.26.7 or 1.26.8 string left in this repo from what I can tell:

image

@biggestsonicfan
Copy link

biggestsonicfan commented Oct 20, 2024

@fireattack I personally use this command on Windows python -m pip install --force-reinstall --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/master.tar.gz with the --force-reinstall flag. Might help.

@fireattack
Copy link
Contributor

fireattack commented Oct 20, 2024

>python3 -m pip install -U -I --force-reinstall --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/m
aster.tar.gz
Collecting https://github.com/mikf/gallery-dl/archive/master.tar.gz
  Downloading https://github.com/mikf/gallery-dl/archive/master.tar.gz
     - 731.8 kB 643.2 kB/s 0:00:01
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: gallery_dl
  Building wheel for gallery_dl (pyproject.toml) ... done
  Created wheel for gallery_dl: filename=gallery_dl-1.27.7.dev0-py3-none-any.whl size=568515 sha256=9fb25b7eefd00dcd729bc26fa937ad531a0741f26d9c7fc7fbdb86bc578611e5
  Stored in directory: C:\Users\ikena\AppData\Local\Temp\pip-ephem-wheel-cache-be3ub4hf\wheels\2a\d6\3c\0feca23fdac400e011e04ee81cac7ad83f629ec3e2cc8dc1ed
Successfully built gallery_dl
Installing collected packages: gallery_dl
Successfully installed gallery_dl-1.26.9.dev0

Now it says 1.26.9.dev0. despite the built wheel clearly says gallery_dl-1.27.7.dev0-py3-none-any.whl. Did pip just randomly calculate these version numbers on its own?

@biggestsonicfan
Copy link

I feel like something has gone awry for sure. Try creating a fresh venv and installing in that, just in case?

@fireattack
Copy link
Contributor

fireattack commented Oct 20, 2024

Ah thanks, I figured it out. Apparently I have billions of gallery_dl (some are called gallery-dl, even) installed in my system.

And doing pip uninstall gallery_dl will only uninstall one of them.. others will happily continue to exist (pip list will only list one of them, too.)

So, I have to run pip uninstall gallery_dl multiple times until pip list reports none and then re-install.

I suspect this is caused by -I argument in the command given in README:

-I, --ignore-installed
Ignore the installed packages, overwriting them. This can break your system if the existing package is of a different version or was installed with a different package manager!

(environment variable: PIP_IGNORE_INSTALLED)

Maybe we shouldn't let the users use it unless really needed, @mikf ? (Or change to --force-reinstall instead.)

Log if interested
Microsoft Windows [Version 10.0.19045.5011]
(c) Microsoft Corporation. All rights reserved.

D:\3>pip uninstall gallery-dl
Found existing installation: gallery-dl 1.26.9.dev0
Uninstalling gallery-dl-1.26.9.dev0:
  Would remove:
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl-1.26.9.dev0.dist-info\*
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\*
    c:\users\ikena\appdata\local\programs\python\python311\scripts\gallery-dl.exe
  Would not remove (might be manually added):
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\archive.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\agnph.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\ao3.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\boosty.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\cien.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\civitai.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\cohost.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\hentainexus.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\koharu.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\postprocessor\hash.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\postprocessor\rename.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\update.py
Proceed (Y/n)? y
  Successfully uninstalled gallery-dl-1.26.9.dev0

D:\3>pip list | findstr gallery
gallery_dl                     1.27.0.dev0

D:\3>pip uninstall gallery_dl
Found existing installation: gallery_dl 1.27.0.dev0
Uninstalling gallery_dl-1.27.0.dev0:
  Would remove:
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl-1.27.0.dev0.dist-info\*
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\archive.py
Proceed (Y/n)? y
  Successfully uninstalled gallery_dl-1.27.0.dev0

D:\3>python3 -m pip install -U -I --force-reinstall --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/m
Collecting https://github.com/mikf/gallery-dl/archive/m
  ERROR: HTTP error 404 while getting https://github.com/mikf/gallery-dl/archive/m
ERROR: Could not install requirement https://github.com/mikf/gallery-dl/archive/m because of HTTP error 404 Client Error: Not Found for url: https://github.com/mikf/gallery-dl/archive/m for URL https://github.com/mikf/gallery-dl/archive/m

D:\3>python -m pip install -U -I --force-reinstall --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/master.tar.gz
Collecting https://github.com/mikf/gallery-dl/archive/master.tar.gz
  Downloading https://github.com/mikf/gallery-dl/archive/master.tar.gz (731 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 731.8/731.8 kB 2.0 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: gallery_dl
  Building wheel for gallery_dl (pyproject.toml) ... done
  Created wheel for gallery_dl: filename=gallery_dl-1.27.7.dev0-py3-none-any.whl size=568515 sha256=9695c9ea1c21c83ed4dfa7d9c9ad91a1692c7adb7936fe7ea17dbbbdf28a1485
  Stored in directory: C:\Users\ikena\AppData\Local\Temp\pip-ephem-wheel-cache-_uiz8zay\wheels\2a\d6\3c\0feca23fdac400e011e04ee81cac7ad83f629ec3e2cc8dc1ed
Successfully built gallery_dl
Installing collected packages: gallery_dl
Successfully installed gallery_dl-1.27.2.dev0

D:\3>pip list | findstr gallery
gallery_dl                     1.27.2.dev0

D:\3>pip uninstall gallery-dl gallery_dl
Found existing installation: gallery_dl 1.27.2.dev0
Uninstalling gallery_dl-1.27.2.dev0:
  Would remove:
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl-1.27.2.dev0.dist-info\*
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\*
    c:\users\ikena\appdata\local\programs\python\python311\scripts\gallery-dl.exe
  Would not remove (might be manually added):
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\ao3.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\boosty.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\civitai.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\cohost.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\postprocessor\hash.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\postprocessor\rename.py
Proceed (Y/n)? y
  Successfully uninstalled gallery_dl-1.27.2.dev0

D:\3>pip list | findstr gallery
gallery_dl                     1.27.2

D:\3>pip uninstall gallery_dl
Found existing installation: gallery_dl 1.27.2
Uninstalling gallery_dl-1.27.2:
  Would remove:
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl-1.27.2.dist-info\*
Proceed (Y/n)? y
  Successfully uninstalled gallery_dl-1.27.2

D:\3>pip uninstall gallery_dl
Found existing installation: gallery_dl 1.27.7.dev0
Uninstalling gallery_dl-1.27.7.dev0:
  Would remove:
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl-1.27.7.dev0.dist-info\*
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\ao3.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\boosty.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\civitai.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\cohost.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\postprocessor\hash.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\postprocessor\rename.py
Proceed (Y/n)? n

D:\3>pip list | findstr gallery
gallery_dl                     1.27.7.dev0

D:\3>pip uninstall gallery_dl
Found existing installation: gallery_dl 1.27.7.dev0
Uninstalling gallery_dl-1.27.7.dev0:
  Would remove:
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl-1.27.7.dev0.dist-info\*
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\ao3.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\boosty.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\civitai.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\cohost.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\postprocessor\hash.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\postprocessor\rename.py
Proceed (Y/n)? y
  Successfully uninstalled gallery_dl-1.27.7.dev0

D:\3>pip list | findstr gallery

D:\3>python -m pip install -U --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/master.tar.gz
Collecting https://github.com/mikf/gallery-dl/archive/master.tar.gz
  Downloading https://github.com/mikf/gallery-dl/archive/master.tar.gz (731 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 731.8/731.8 kB 6.0 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: gallery_dl
  Building wheel for gallery_dl (pyproject.toml) ... done
  Created wheel for gallery_dl: filename=gallery_dl-1.27.7.dev0-py3-none-any.whl size=568515 sha256=8c3fef44c47c3c0f5b123a50a76a60798804bdfc81509d242dc745c3b561e186
  Stored in directory: C:\Users\ikena\AppData\Local\Temp\pip-ephem-wheel-cache-b8nlpxh6\wheels\2a\d6\3c\0feca23fdac400e011e04ee81cac7ad83f629ec3e2cc8dc1ed
Successfully built gallery_dl
Installing collected packages: gallery_dl
Successfully installed gallery_dl-1.27.7.dev0

@501stRookie
Copy link

Is there a way to download specifically the revisions on an artist's page on kemono.su? For example, one artist has had many of their posts updated with a revision that removed the content, while the original revision retains them. There are hundreds of posts on their page like that, so I was wondering if there was a way to set it to download the original revisions for all of them automatically.

@topchaser
Copy link

topchaser commented Oct 27, 2024

I am getting the error pixiv: Unable to download work 59915441 ('sanity_level' warning) when I try to download this link (NSFW, but you cannot see it unless logged in): https://www.pixiv.net/en/artworks/59915441

I see many mentions of this error: https://github.com/mikf/gallery-dl/issues?q=sanity+level+warning

but I read through many of them trying to understand what to do, and I cannot figure it out. Will someone please tell me how to fix this.

Also, just to vent, I had no idea how long this had been happening, or if any of my attempts to download pixiv profiles prior had been subject to this. I can't retroactively check any logs, since I think I used to have logs, but it would cause redownloading profiles to skip media it already downloaded, which annoyed me. I didn't know if I could disable that specifically, so I just gave up on having logs. So, I potentially am missing media when I intended to get everything. I am a bit sad about it. Also, the "logs" I am describing might actually be something entirely different, and might not have told me of this error anyway. I don't know. I barely manage to get gallery-dl working for myself, so it working at all is essentially where my knowledge on the program ends.

I just noticed that the latest gallery-dl release made this "just werk":
https://github.com/mikf/gallery-dl/releases/tag/v1.27.7

Improvements
[pixiv] implement sanity_level workaround for user artworks results (#4327, #5435, #6339)

I still don't know whether it was possible to download such artwork using gallery-dl before (I thought it was, so I was just asking for someone to explain to me in simple terms how to do it), but, again, it "just werks" now, so, much appreciated.

@biggestsonicfan
Copy link

biggestsonicfan commented Oct 30, 2024

So Seiga is now region-locked. Can I proxy/wireguard just that extractor?

EDIT: I've managed to get Wireguard locally to proxy via a port using wireproxy, but I just need a post(pre)processor to launch it as a daemon and close it when it's done.

EDIT2: Figured it out:

            "actions": {
            "*": "exec wireproxy -c ~/.config/wireproxy/wp-config.conf -d"
            },
            "postprocessors": [
                "json_metadata",
                {
                    "name": "exec",
                    "command": "pkill wireproxy",
                    "event": "finalize"
                }
            ]

@biggestsonicfan
Copy link

biggestsonicfan commented Nov 1, 2024

I hate posting so frequently here but I hate making new issues more. This is once again an issue for me.

I've just supported a user that has a preview image and download urls in their post. I normally parse the json files with a python script, however this preview image had been downloaded previously and I don't overwrite json data anymore. So I will re-run the user with skip set to false, but I really need a solution to separate data if supported or not and by which support tier.

EDIT: I also don't get how the metadata archive works either. Will the metadata entry be the same as the one for the extractor?

@mikf
Copy link
Owner Author

mikf commented Nov 3, 2024

@biggestsonicfan
It should be possible to use the feeRequired value and/or isRestricted in a filter statement to determine whether you can access a post or not. You can also use the metadata option to get plan data ("metadata": "plan") and potentially use that in a filter as well.

however this preview image had been downloaded previously and I don't overwrite json data anymore

A metadata post processor by default runs only when a file gets downloaded ("event": "file"), but you can also have it run for skipped downloads ("event": "file,skip") or just for the post itself ("event": "post") which is never "skipped", although that requires a custom filename.

Will the metadata entry be the same as the one for the extractor?

metadata archive IDs are by default different from the actual file's archive ID, but you can always change that with archive-format and archive-prefix.

@baodrate
Copy link

baodrate commented Nov 7, 2024

could it be allowed that the default config be in toml? so the user does not have to specify --config-toml FILE on the command line every time?

i.e. add to gallery_dl.config._default_configs:

  • ${XDG_CONFIG_HOME}/gallery-dl/config.toml
  • ~/.config/gallery-dl/config.toml

(And it would probably make sense to also add the equivalent yaml paths)

@topchaser
Copy link

Trying to download this: https://misskey.gg/notes/9yp3zt35c3

using: gallery-dl misskey:https://misskey.gg/notes/9yp3zt35c3

produces this error: [downloader.http][warning] ('Connection broken: IncompleteRead(0 bytes read, 58762 more expected)', IncompleteRead(0 bytes read, 58762 more expected)) (1/5)

until it hits 5/5 then fails. It happens for all misskey.gg links. In contrast, misskey.io links work without even needing to preface the link with "misskey:". For example: https://misskey.io/notes/9ru7yqi5u4j6070a

Is there anything I can do to make misskey.gg links work?

The example link I provided no longer seems to be online, but I just noticed when downloading a profile on misskey.gg that the 5/5 timeout error no longer happened. But, it also doesn't appear that it added any older media that I assumed was being skipped. I didn't actually look at what was causing the 5/5 timeout error to see if it was media, but, since it appears to "just werk" at this point, I assume what was timing out simply was not media at all. I don't know. Either way, I am saying that I just noticed this is no longer reproducible.

@biggestsonicfan
Copy link

@mikf
A bit of banging my head against my metadata issues again only to find I am running into this issue. If I find a restricted post, they don't always have a preview image, so no metadata is downloaded. However if I use the "event": "post", my filename convention isn't honored as the metadata is gone.

@God-damnit-all
Copy link
Contributor

--retries -1 is apparently considered invalid now, maybe due to some new argument parsing that tries to perceive -1 as a parameter?

mikf added a commit that referenced this issue Nov 13, 2024
#5262 (comment)

fixes regression introduced in 9e72968

'argparse' sets a flag and changes its behavior when using something
that looks like a negative number as option string, '-4' and '-6' in
this case.
@biggestsonicfan
Copy link

Can the postprocessor use multiple filters? I'm trying but I'm getting TypeError: compile() arg 1 must be a string, bytes or AST object

@mikf
Copy link
Owner Author

mikf commented Nov 14, 2024

@God-damnit-all
Fixed in cd6d6ea. Thanks for reporting, I wouldn't have caught this otherwise.

@biggestsonicfan
Post processors filter expressions can currently only be specified as a simple string and not as a list as is possible for image-filter. You can manually combine conditional expressions with ( cond1 ) and ( cond2 ) and ... though.

@biggestsonicfan
Copy link

biggestsonicfan commented Nov 14, 2024

@mikf

Post processors filter expressions can currently only be specified as a simple string and not as a list as is possible for image-filter.

Gotcha. It might be nice to clarify that in the post-processor docs, as that's where I got the idea to use it as a list.

My idea is to use filters to run specific postprocessors in order if:

  1. The post is not restricted, has a paid plan, has a filename (run as prepare in /paid/plan-cost directory)
  2. The post is not restricted, has a paid plan, has no filenames (run as post in /paid/plan-cost directory)
  3. The post is not restricted and has a filename (run as prepare in /free directory)
  4. The post is not restricted, has a paid plan of 0, and no filename (run as post in /free directory)
  5. The post is restricted (run as post in /not-paid/plan-cost directory)

Which I think would resolve to:

  1. not locals().get('isRestricted') and 'filename' in locals().keys
  2. not locals().get('isRestricted') and not 'filename' in locals().keys
  3. not locals().get('isRestricted') and locals().get('feeRequired') == 0
    1. not locals().get('isRestricted') and locals().get('feeRequired') == 0 and not 'filename' in locals().keys
  4. locals().get('isRestricted')

mikf added a commit that referenced this issue Nov 15, 2024
#5262 (comment)

allow (theoretically*) all filter expression statements
to be a list of individual filters

(*) except for 'filename' and 'directory' conditionals,
as dict keys cannot be lists
@mikf
Copy link
Owner Author

mikf commented Nov 15, 2024

@biggestsonicfan
All filters can now consist of multiple statements, including post processor filters: 5bc3657

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests