Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deviantart | Gallery-dl only downloading Journals and not Polls, Status Updates, from User Posts #3539

Open
Aidanjosiah02 opened this issue Jan 16, 2023 · 24 comments · Fixed by #3541

Comments

@Aidanjosiah02
Copy link

Aidanjosiah02 commented Jan 16, 2023

I am attempting to download all posts made by some artists on Deviantart. However in the "Posts" page it only grabs the "Journals" and excludes "Polls" and "Status Updates". Attempting to use a direct link such as "https://www.deviantart.com/<user>/posts/polls" returns [gallery-dl][error] Unsupported URL '<URL>' even though those posts exist. Using the option --list-keywords also does not show any sign of these other posts.

I use Windows 10, gallery-dl pip version 1.24.2, and the related settings in my config are:

        {
            "client-id": "<id>",
            "client-secret": "<secret>",
            "extra": true,
            "folders": true,
            "group": true,
            "include": ["all", "journal", "scraps"],
            "refresh-token": "<token>",
        }

Removing "journal" from "include" also does not work.

I have attached the verbose of one of my runs.
verbose-t1na-posts.txt

@Aidanjosiah02 Aidanjosiah02 changed the title Deviantart | Gallery-dl only downloading Journals and not Polls, etc. from User Posts Deviantart | Gallery-dl only downloading Journals and not Polls, Status Updates, from User Posts Jan 16, 2023
@ClosedPort22
Copy link
Contributor

Can confirm gallery-dl doesn't support these types yet.

I'm working on this, but the situation seems to be rather complicated:

  • Some of the statuses can be retrieved by using the /user/profile/posts/ endpoint, which, according to dA's documentation, should supersede the /browse/user/journals endpoint currently used by the journal extractor
  • Other statuses (in a different format) can be retrieved from /user/statuses/
  • The OAuth API doesn't seem to support polls. It might be necessary to use the Eclipse API instead.

@ClosedPort22
Copy link
Contributor

ClosedPort22 commented Jan 17, 2023

I've added support for some status posts (#3541). The executables can be found here:
https://github.com/ClosedPort22/gallery-dl/actions/workflows/executables.yml

You can try this out by using gallery-dl https://www.deviantart.com/<user>/posts/statuses. Or you can use "include": ["status"] or "include": "all" (not ["all"]) in your config file to enable this for all user URLs.

Please let me know if you find any bugs or have suggestions on how this could be improved (apart from the currently missing features).

@Aidanjosiah02
Copy link
Author

Aidanjosiah02 commented Jan 17, 2023

Thank you so much; it worked amazingly! That was also pretty quick with the update.

succesful_stauses-verbose.txt

@ClosedPort22
Copy link
Contributor

No problem. I think you closed the issue too soon, though. I'm not the maintainer of the repo and this hasn't been merged into master yet. I'm still working on it.

@Aidanjosiah02 Aidanjosiah02 reopened this Jan 18, 2023
@Aidanjosiah02
Copy link
Author

Aidanjosiah02 commented Jan 18, 2023

When encountering posts that contain neither "deviation" or "status" in the status type, it throws a "KeyError" when searching for either key, namely, "deviation" or "status". In the section starting at line 787 in deviantart.py, the problem occurs when "gallery" is in item as opposed to the only two defined keys. Instead of key = "deviation" if "deviation" in item else "status" on 791, it should include some sort of elif "gallery" in item: sort of thing to deal with "gallery" as well. I am not very familiar with the code of this program, so for now I can't really think of a proper solution.

@ClosedPort22
Copy link
Contributor

When encountering posts that contain neither "deviation" or "status" in the status type, it throws a "KeyError" when searching for either key, namely, "deviation" or "status". In the section starting at line 787 in deviantart.py, the problem occurs when "gallery" is in as opposed to the only two defined keys. Instead of: key = "deviation" if "deviation" in item else "status". It should include some sort of elif "gallery" in item sort of thing to deal with "gallery" as well. I am not very familiar with the code of this program, so for now I can't really think of a proper solution.

Hm, the official documentation made no mention of the gallery field. Can you provide a link to the post that triggered the error?

@Aidanjosiah02
Copy link
Author

Aidanjosiah02 commented Jan 18, 2023

This is one of the accounts that did it, but no longer does?
https://www.deviantart.com/maxeralfa017/posts/statuses
I made a quick workaround earlier, but can't seem to get it to fail anymore.

Instead now it looks like this:
other-error-verbose.txt

@Aidanjosiah02
Copy link
Author

Aidanjosiah02 commented Jan 18, 2023

Found one:
https://www.deviantart.com/dsana/posts/statuses
error-verbose.txt

Here it can't find "status". But it will find "gallery" if you tell if to search for that.
My quick workaround was to say

key = ""
if "deviation" in item:
    key = "deviation"
    yield item[key]
elif "status" in item:
    key = "status"
    yield item[key]
else:
    continue

But I suspect it should be handled in a better way. I'm not sure how many posts it misses using this that it shouldn't.

@ClosedPort22
Copy link
Contributor

ClosedPort22 commented Jan 19, 2023

Got KeyError at the same position in the API response, but there was no gallery field:

            "items": [
                {
                    "type": "thumb_background_deviation"
                }
            ]

The error should be fixed in c4aeca7, and unexpected fields will simply be ignored for now (you can enable the metadata postprocessor if you don't want to lose this information).

@Aidanjosiah02
Copy link
Author

Aidanjosiah02 commented Jan 19, 2023

Thanks for the update! Now when I get the version from yesterday to print, it says {'type': 'thumb_background_deviation'} [deviantart][error] An unexpected error occurred: KeyError - 'status'. I don't know how I thought it was "gallery". Anyway this new version works much better now; none of the statuses and images therein appear to be skipped, though, are placed outside the "Status" subfolder. This isn't really a problem for me since I can rename their paths in the batch script I made, but just a heads up anyway.

@ClosedPort22
Copy link
Contributor

ClosedPort22 commented Jan 19, 2023

This isn't really a problem for me since I can rename their paths in the batch script I made, but just a heads up anyway.

You can specify directory and archive-fmt for status like this (see also: https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractordirectory):

"deviantart": {
    "status": {
        ...
    }
}

I assume people tend to share their own deviations in status updates, and that's why status is disabled by default. I haven't put this version into production yet, but I would probably override the default archive-fmts and use the same archive format for all DeviantArt extractors.

@Aidanjosiah02
Copy link
Author

Aidanjosiah02 commented Jan 19, 2023

Not sure if this is possible with that. I see I didn't explain the problem correctly. The images that come from the user's "stash" are placed in the user's root while the statuses are kept in "Status". At the time of describing the problem I didn't know the images came from the user's "stash", but after seeing that is the case, I'm not so sure if the problem I raised is actually a problem. eg. If someone wants to keep all the stash files separate from all other items.

I did notice something else that may be a minor problem if someone is concerned with preventing duplicates. Images from artists who are not the poster of the status update are placed into the poster's "Status" subdirectory rather than the original artist's directory. The metadata of the status updates appear to be enough for someone to be able to create softlinks across different artists, so having some switch to always "respect des fonds" will still allow consistent access.

These images I'm referring to all seem to have this in their metadata:

"items": [
    {
        "deviation": {
            "url": "https://www.deviantart.com/<artist>/art/<image>"
        }
        "type": "thumb_background_deviation"
    }
]

Still, this is a very minor problem as it seems very rare for this to occur, and we're talking like under 20MB of duplicates per artist.

@ClosedPort22
Copy link
Contributor

ClosedPort22 commented Jan 19, 2023

The images that come from the user's "stash" are placed in the user's root

Yeah, it's been like that since the beginning. After using gallery-dl for a while I decided to change the directory format to {username}/Stash for clarity. The stash extractor can be configured in the same way as described above.

You might see some stashed deviations in the "Status" folder as well, and that's because shared deviations are directly extracted by the status extractor rather than delegated to the stash extractor. It's even possible to merge the output into one directory by using conditional directory naming:

"deviantart": {
    "stash": {
        "directory" : ["{username}", "Stash"]
    },
    "status": {
        "directory": {
            "'sta.sh' in url": ["{username}", "Stash"],
            "": ["{username}", "Status"]
        }
    }
}

Images from artists who are not the poster of the status update are placed into the poster's "Status" subdirectory rather than the original artist's directory.

This can also be achieved through configuration, thanks to gallery-dl's flexibility in this regard.

"deviantart": {
    "status": {
        "directory": ["{author[username]}", "Status"]
    }
}

Or even:

"deviantart": {
    "status": {
        "directory": {
            "author[username] != username": ["{author[username]}", "shared"],
            "": ["{username}", "Status"]
        }
    }
}

@Aidanjosiah02
Copy link
Author

Holy smokes you are a genius! I'll get on applying this after I get some sleep

@ClosedPort22
Copy link
Contributor

By the way, if you would like to minimize the chance of getting duplicate files, you can check out the archive function. I personally recommend including filesize in the archive format because it helps to detect modifications, re-uploads, etc. I'm currently using {_username}_{index}_{download_filesize|content[filesize]}.{extension}.
If you download from artists from the same fandom or topic, you can even use a common archive database for them (I do this for Tumblr and Twitter). This way the shared content between them (e.g. retweets, reblogs, shared deviations) can be collectively managed and will only ever be downloaded once.

@Aidanjosiah02
Copy link
Author

Aidanjosiah02 commented Jan 20, 2023

For the earlier post, this seems to achieve what I need:

"deviantart":
        {
            "stash": {
                "directory": ["deviantart", "{author[username]}-[{author[userid]}]", "Stash"]
            },
            "status": {
                "directory": {
                    "'stash' in subcategory": ["deviantart", "{author[username]}-[{author[userid]}]", "Stash"],
                    "'/art/' in url": ["deviantart", "{author[username]}-[{author[userid]}]", "All"],
                    "": ["deviantart", "{author[username]}-[{author[userid]}]", "Status"]
                }
            },
            "journal": {
                "directory": {
                    "'stash' in subcategory": ["deviantart", "{author[username]}-[{author[userid]}]", "Stash"],
                    "'/art/' in url": ["deviantart", "{author[username]}-[{author[userid]}]", "All"],
                    "": ["deviantart", "{author[username]}-[{author[userid]}]", "Journal"]
                }
            }
        }

I can still create links to the correct files from the status update despite the target image being somewhere else using:

"items": [
    {
        "deviation": {
            "author": {
                "userid": "D3DBBBAF-E006-8D38-8687-0F15E669E9E8"
            }
            "deviationid": "902FA586-DA0F-7E01-F65C-1C163EADEF00"
        }
    }
]

included in the metadata, respecting the fonds and preventing duplicates.

For your last post, I currently do use the archive file, but that's cool you can tell it how to store the info. Also is there any downside you know of to using {author[username]} over {username} for the archive file?
Again, thank you for the help you have given me so far!

@ClosedPort22
Copy link
Contributor

ClosedPort22 commented Jan 20, 2023

Also is there any downside you know of to using {author[username]} over {username} for the archive file?

There really shouldn't be any. One thing that I can think of is that for posts without author[username], the field will simply become None and that may, assuming {index} is not always unique, cause some posts to be skipped erroneously. But in reality I've never seen any posts without it.

I'd also recommend getting a SQLite viewer so you can verify that your archive-fmt is working as intended.

@Aidanjosiah02
Copy link
Author

Aidanjosiah02 commented Jan 20, 2023

One thing that I can think of is that for posts without author[username], the field will simply become None and that may, assuming {index} is not always unique, cause some posts to be skipped erroneously.

I see what you're saying since there are some files with an index of 0. Perhaps using {deviationid}_{download_filesize|content[filesize]}.{extension} would work? I assume the {deviationid} will always be unique regardless of author. And yes I have an SQLite viewer.

@Aidanjosiah02
Copy link
Author

Aidanjosiah02 commented Jan 20, 2023

Nevermind {deviationid} doesn't work with non-images, and I can't seem to find a good replacement. Have you ever seen a post without {author[username]} available, or is this just theoretical?
If {author[username]} can be unavailable I would assume {author[userid]}_{index} also wouldn't work.

Edit: {subcategory}_{deviationid|statusid}_{download_filesize|content[filesize]}.{extension} appears to be reliable. Stash, Journals, deviations, all have a {deviationid} keyword. It only seems to be Statuses that are different and use {statusid}.

@ClosedPort22
Copy link
Contributor

Nevermind deviationid doesn't work with non-images

For now it's possible to use {deviationid|statusid} to get a UUID for every post (see https://github.com/mikf/gallery-dl/blob/master/docs/formatting.md), but this might change when the PR is reviewed by @mikf.

Have you ever seen a post without {author[username]} available, or is this just theoretical?

It's purely theoretical.

@ClosedPort22
Copy link
Contributor

there are some files with an index of 0

Have you actually seen that happening? I always thought it was to prevent the program from crashing in case the API returned something unexpected. It was an issues during development, but it should've been fixed by 013733c.

@Aidanjosiah02
Copy link
Author

Aidanjosiah02 commented Jan 20, 2023

Have you actually seen that happening? I always thought it was to prevent the program from crashing in case the API returned

Yes, actually. These posts in particular:
https://www.deviantart.com/dsana/status-update/20937531
https://www.deviantart.com/dsana/status-update/21573532
I cannot replicate the problem testing with my account's stash, so I really don't know. The metadata literally says "index": 0 in both. One thing they have in common is they both use images from the artist's stash, but other images from stash don't do this.
I keep using this artist for examples since it's the only one I stumbled on that is able to create these problems.
Here's an image:
example

@ClosedPort22
Copy link
Contributor

Maybe you forgot to update to the latest commit? I can't reproduce the issue on 013733c.

@Aidanjosiah02
Copy link
Author

Oh yeah that fixed it.

@mikf mikf reopened this Jan 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants