Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Save the URL of each image to a text file by making only one HTTP request (or two if the gallery has two pages, or three if it has three pages, etc) #303

Open
a84r7a3rga76fg opened this issue Aug 31, 2024 · 9 comments

Comments

@a84r7a3rga76fg
Copy link

a84r7a3rga76fg commented Aug 31, 2024

Edit: Instead of saving the URL, it'd be even better if it could save the page number, image token and the extension of every image file, e.g. https://exhentai.org/s/d1b07750bd/3042776-1 will be saved as 0001_d1b07750bd.jpg.

@ccloli
Copy link
Owner

ccloli commented Aug 31, 2024

If you want the page link, then just make sure you've set Settings -> Advanced -> Record and save gallery info as File info.txt, and checked includes Page Links, which is the default setting.

Saving image link is not available since the link will only valid for a short time, after that you'll see an error.

If you want to rename the file, for now it's not possible, but if you just want the image token, I guess you can just get the SHA-1 hash of the image file, then get the first 10 letters.

@a84r7a3rga76fg
Copy link
Author

I don't want the image URL. I forgot to add that I want it to perform the action without wasting any GP, credits or hath, and without making too many page requests. I think it should just make one page request if all of the image URLs are in one page.

@ccloli
Copy link
Owner

ccloli commented Aug 31, 2024

So do you mean you want the page links or the image links without costing GP or downloading images?

  • For page link: Search the source code and find getAllPagesURLFin = true;. There're 2 matches, one for normal page grid, and one for MPV, add the following line after both of them, to output page links and stop downloading:

    pushDialog(pageURLsList.join('\n'));
    return;
  • For image URL: There's no way to do that.

  • For image thumbnail URL: The script doesn't use such information, so it's not available.

@a84r7a3rga76fg
Copy link
Author

Preferably none of those. Does my edit not show up? What I'm really looking for is saving the page number, image token (SHA-1) and the extension of every image file, e.g. https://exhentai.org/s/d1b07750bd/3042776-1 will be saved as 0001_d1b07750bd.jpg.

If you're wondering why, it's because the latest restriction has made it impossible for anyone without GP, credit or hath to download the original images. Most people don't have enough GP, credit or hath to download a single gallery. Our only option is to use torrents, and these torrents often have unsorted image files.

@ccloli
Copy link
Owner

ccloli commented Aug 31, 2024

Preferably none of those. Does my edit not show up?

I did saw that but don't understand, probably it's nearly 6 AM in my timezone and I need a sleep. 😴

What I'm really looking for is saving the page number, image token (SHA-1) and the extension of every image file, e.g. https://exhentai.org/s/d1b07750bd/3042776-1 will be saved as 0001_d1b07750bd.jpg.

So do you want to rename the download file? For now it's not possible, but Soon™.

If you just want to extract them from the page link and get a plain text list for such naming (d1b07750bd/3042776-1 -> 0001_d1b07750bd.jpg), probably you can try the first option's code to get page links, and ask GPT to write an automated script for you.

However the page link only contains page number and image token, the file extension is not available. You need to extract the file extension from the thumbnail URL, but the link doesn't include the page number, and the script doesn't grab thumbnail URL actually, so you need to DIY.

An example of image grid's page source code:

image

it's because the latest restriction has made it impossible for anyone without GP, credit or hath to download the original images.

It's not restricted in latest update, but in last year. The latest change is just hide image limits for normal user, all the other rules are still the same as previous 2023-08 updates (latest galleries can grab with image limits only, except peak hours and/or old galleries applied).

Our only option is to use torrents, and these torrents often have unsorted image files.

If what you mean is to download with torrents, then calculate the hash for each file, then compare with the image link, then that'd make sense, but for that case you may probably only need the page number and page token. I'm still quite not understand why you need the file extension since you've already got the file from torrents, and to order them it's pretty sure you need to write a script to do that, then you can just extract the file extension part from the file name.


Time to sleep, if you've anything to update, I'll reply it ~10 hours later, sorry for let you wait. 😴

@a84r7a3rga76fg
Copy link
Author

So do you want to rename the download file?

No, I want the page number, the image token and the extension of every image saved to a text file.

I'm still quite not understand why you need the file extension since you've already got the file from torrents

Sometimes whoever creates the torrent likes to change the extension from png to jpg or jpg to jpeg. You'd be surprised how often they do it.

@ccloli
Copy link
Owner

ccloli commented Aug 31, 2024

Sometimes whoever creates the torrent likes to change the extension from png to jpg or jpg to jpeg. You'd be surprised how often they do it.

Then I'd say I'm afraid it's not available, since to avoid the case you said (see #2 which is just your case), the script extracts the filename from the image file request's response HTTP header, so that it'll use the original file name and correct file extension, and definitely costs limits or GPs.

Since the script is focused on downloading file, so I'm not going to add such feature to extract file extension from thumbnail URL (and the thumbnail URL is only available when you switch to large thumbnail grid layout….

@a84r7a3rga76fg
Copy link
Author

It can be a separate script.

@ccloli
Copy link
Owner

ccloli commented Aug 31, 2024

It can be a separate script.

Then I'd say do it yourself, since it's not related to the script's function, and I do really need a sleep, truly sorry for that. 🥲

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants