Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pixiv] postprocessors uses wrong {extension} if there is --ugoira-conv #1507

Closed
TestPolygon opened this issue Apr 26, 2021 · 12 comments
Closed

Comments

@TestPolygon
Copy link

TestPolygon commented Apr 26, 2021

1. postprocessors uses wrong {extension} if there is --ugoira-conv

For example:

"pixiv": {
     "postprocessors": [{
          "filename": "{id}-0.{extension}.html",
          "name": "metadata",
          "mode": "custom",
          "format": "<a href='https://www.pixiv.net/artworks/{id}'>{title}</a>"
     }],
     "ugoira": true,
     "translated-tags": true,
     "directory": ["gallery-dl"],
     "filename": "{id}-{num}.{extension}"
},

I use gallery-dl https://www.pixiv.net/en/artworks/89406523 --ugoira-conv.
The result is follow files:

  • 89406523-0.webm
  • 89406523-0.zip.html <-- "zip"

The expected result should be:

  • 89406523-0.webm
  • 89406523-0.webm.html

Note: I use -0, not -{num} in order to have one HTML for a gallery (which may contain multiple images).


2. --no-download does not work with --ugoira-conv:

Also I frequently use --no-skip --no-download in order to create HTML files with meta info, but do not download the content (if I already have it). (--no-skip since I use also an archive (--download-archive))

But I can't use --no-skip --no-download with --ugoira-conv:

[pixiv][error] Unable to download data:  FileNotFoundError: [Errno 2] 
No such file or directory: '\\\\?\\C:\\Example\\gallery-dl\\89406523.zip'

The expected result for gallery-dl https://www.pixiv.net/en/artworks/89406523 --ugoira-conv --no-skip --no-download is:

  • 89406523-0.webm.html

Note: webm, not zip.

@TestPolygon TestPolygon changed the title [pixiv] postprocessors uses wrong {extension} if there is --ugoira-conv [pixiv] postprocessors uses wrong {extension} if there is --ugoira-conv Apr 26, 2021
@TestPolygon
Copy link
Author

TestPolygon commented Apr 26, 2021

3. By the way: #1354 (comment) (Remove duplicates for the translated tags)

@TestPolygon
Copy link
Author

4. Fetching of metainfo for ugoira is slow

For example:
gallery-dl --no-download --no-skip https://www.pixiv.net/en/users/24036178 (3 ugoira + image).
compare with it:
gallery-dl --no-download --no-skip https://www.pixiv.net/en/users/53180320 (image, ugoira, 10 images).

Fetching of metainfo for images is performed instantly, for ugoira it takes 1+ second for each one.

May be it's a bug? Or it's just planned?

@TestPolygon
Copy link
Author

5. frames[][delay]

By the way, there is frames[][delay] property for ugoira files, but this does not work for the purpose of filenaming: [pixiv][error] FilenameFormatError: Applying filename format string failed (ValueError: Empty attribute in format string).

(from this comment: #1297 (comment))

@mikf
Copy link
Owner

mikf commented Apr 26, 2021

  1. postprocessors uses wrong {extension} if there is --ugoira-conv

This currently depends on the order in which post processors are specified and run.
metadata first and ugoira second -> .zip
ugoira first and metadata second -> .webm

This happens also due to the download process temporarily changing the filename extension from webm to zip since the file it is downloading is a ZIP archive. Should be fixable by disabling this behavior for Ugoira files.

  1. --no-download does not work with--ugoira-conv:

Because the ugoira post processor currently expects to find a convertible ZIP archive.
That's an easy fix.

  1. By the way: [pixiv] Translated tags #1354 (comment) (Remove duplicates for the translated tags)

I guess converting the tag list to a set and back to a list would work?

  1. Fetching of metainfo for ugoira is slow

Each Ugoira needs an extra HTTP request to fetch its URL and metadata, regular images don't.
It ideally would use some sort of lazy evaluation here.

  1. frames[][delay]

See #1388 (comment)
You could do {frames[0][delay]} for the first frame, {frames[1][delay]} for the second etc, but there is currently no good way to deal with a list of dictionaries

@thatfuckingbird
Copy link
Contributor

thatfuckingbird commented Apr 26, 2021

  1. By the way: [pixiv] Translated tags #1354 (comment) (Remove duplicates for the translated tags)

I guess converting the tag list to a set and back to a list would work?

One drawback of doing that is that the untranslated_tags and tags arrays would no longer have the same length so we lose the info of which untranslated tag has which translation (could simply be mitigated though by saving the original tag list with the untranslated/translated pairs instead of untranslated_tags...)

@Twi-Hard
Copy link

Twi-Hard commented Apr 27, 2021

It's important to me that I'm able to tell which tag translated to what.

(could simply be mitigated though by saving the original tag list with the untranslated/translated pairs instead of untranslated_tags...)

If you mean having both the translated and non-translated tags merged into the same list that would mean I can't tell which tags are translated and which aren't (and there'll be duplicate tags but in different languages).

@thatfuckingbird
Copy link
Contributor

If you mean having both the translated and non-translated tags merged into the same list that would mean I can't tell which tags are translated and which aren't (and there'll be duplicate tags but in different languages).

The original data is in a format like

[
{"original":"example1","translated":"translated1"},
{"original":"example2","translated":"translated2"},
...
]

So we could just save that whole thing if dupes get pruned from "tags".

@mikf
Copy link
Owner

mikf commented Apr 27, 2021

  1. and 2. should be fixed with 221015e and e5123f5

Concerning tags, I think it would be better to redo the current tag situation. My proposal:

  • rename translated-tags to just tags and let its value control the format and content of the tags metadata entry
  • "original" -> a list of original Japanese-only tags, same as the current translated-tags: false
  • "translated" -> a list of translated tags, same as the current translated-tags: true with duplicates removed
  • null -> don't modify the tags info returned by Pixiv, i.e. have it be a list of {"name": "…", "translated_name": "…"} objects.
  • ("all" -> combination of "original" and "translated", with an additional tags_translated list)

@thatfuckingbird
Copy link
Contributor

Sounds good. I'm happy with any solution that allows for getting both translated and untranslated and preserves the pairings.

@TestPolygon
Copy link
Author

TestPolygon commented Apr 27, 2021

#1, #2 looks fixed.


"translated" -> a list of translated tags, same as the current translated-tags: true with duplicates removed

And the original tags if there is no translation for them. (just a note)


By the way Pixiv now shows the original tags with the translation:

image

Some time before it looked so:
image

(84677043)


The example of tags from the raw Pixiv API response
    "tags": [
      {
        "tag": "うごイラ",
        "locked": true,
        "deletable": false,
        "userId": "17764675",
        "romaji": "ugoira",
        "translation": {
          "en": "ugoira"
        },
        "userName": "ほたる"
      },
      {
        "tag": "メロンソーダ",
        "locked": true,
        "deletable": false,
        "userId": "17764675",
        "romaji": "meronnso-da",
        "translation": {
          "en": "melon soda"
        },
        "userName": "ほたる"
      },
      {
        "tag": "炭酸",
        "locked": true,
        "deletable": false,
        "userId": "17764675",
        "romaji": "tannsann",
        "userName": "ほたる"
      },
      {
        "tag": "シネマグラフ",
        "locked": false,
        "deletable": true
      }
    ]

(89406523)

image

@mikf
Copy link
Owner

mikf commented Apr 27, 2021

And the original tags if there is no translation for them. (just a note)

That's what the current translated-tags: true already does.
This behavior is also what's causing duplicate entries, it seems. Your example has drawing as regular tag with no translation as well as ドローイング, which also becomes drawing when using the translated version.

tags example from the Pixiv Mobile API used by gallery-dl
      "tags": [
        {
          "name": "鉛筆",
          "translated_name": "pencil"
        },
        {
          "name": "ドローイング",
          "translated_name": "drawing"
        },
        {
          "name": "落書き",
          "translated_name": "doodle"
        },
        {
          "name": "pencildrawing",
          "translated_name": null
        },
        {
          "name": "モノクロ",
          "translated_name": "black&white"
        },
        {
          "name": "drawing",
          "translated_name": null
        },
        {
          "name": "人物",
          "translated_name": "character"
        },
        {
          "name": "アナログ",
          "translated_name": "traditional"
        },
        {
          "name": "portrait",
          "translated_name": null
        },
        {
          "name": "sketch",
          "translated_name": null
        }
      ],

mikf added a commit that referenced this issue Apr 29, 2021
- rename to 'tags'
- use string-values: "japanese", "translated", "noop"
- remove duplicate entries for "translated" tags
@mikf
Copy link
Owner

mikf commented May 7, 2021

So translated-tags got renamed to just tags and accepts the following options:

Let me know if anything should be added/changed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants