Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hitomi.la URL Determination Bug #142

Closed
earfluffy opened this issue Dec 14, 2018 · 2 comments
Closed

Hitomi.la URL Determination Bug #142

earfluffy opened this issue Dec 14, 2018 · 2 comments
Labels

Comments

@earfluffy
Copy link

earfluffy commented Dec 14, 2018

So, I'll start by saying that I don't know if this is a problem specific to the gallery I tried to download, something that affects a significant portion of the website, or some weird interaction with Windows again.

In any case, what happened was I attempted to download the gallery found here (NSFW), but got error messages that looked like so:

C:\Users\earfl>gallery-dl -v https://hitomi.la/galleries/1036181.html
gallery-dl: Version 1.6.1
gallery-dl: Python 3.7.1 - Windows-10-10.0.17134-SP0
gallery-dl: requests 2.21.0 - urllib3 1.24.1
gallery-dl: Starting DownloadJob for 'https://hitomi.la/galleries/1036181.html'
hitomi: Using HitomiGalleryExtractor for 'https://hitomi.la/galleries/1036181.html'
urllib3.connectionpool: Starting new HTTPS connection (1): hitomi.la:443
urllib3.connectionpool: https://hitomi.la:443 "GET /galleries/1036181.html HTTP/1.1" 200 2924
urllib3.connectionpool: Starting new HTTPS connection (1): ba.hitomi.la:443
urllib3.connectionpool: https://ba.hitomi.la:443 "GET /galleries/1036181/000.png HTTP/1.1" 403 None
downloader.http: 403 Client Error: Forbidden for url: https://ba.hitomi.la/galleries/1036181/000.png
download: Failed to download hitomi_1036181_001_000.png
urllib3.connectionpool: Starting new HTTPS connection (2): ba.hitomi.la:443
urllib3.connectionpool: https://ba.hitomi.la:443 "GET /galleries/1036181/001.png HTTP/1.1" 403 None
downloader.http: 403 Client Error: Forbidden for url: https://ba.hitomi.la/galleries/1036181/001.png
download: Failed to download hitomi_1036181_002_001.png
urllib3.connectionpool: Starting new HTTPS connection (3): ba.hitomi.la:443
urllib3.connectionpool: https://ba.hitomi.la:443 "GET /galleries/1036181/002.png HTTP/1.1" 403 None
downloader.http: 403 Client Error: Forbidden for url: https://ba.hitomi.la/galleries/1036181/002.png
download: Failed to download hitomi_1036181_003_002.png

     [And so on]
     ...

I went and checked the gallery to see what the URL should have been, and found that it was aa.hitomi.la instead of ba.hitomi.la. I knew it wasn't a config issue, since I don't have any config options for hitomi.la, so I went to look at the extractor file for hitomi.la.

Short story short, I "fixed" the problem by manually changing the part of the extractor file that determines the image URL, i.e., these two lines:

subdomain = chr(97 + self.gid % 2) + "a"
        base = "https://" + subdomain + ".hitomi.la/galleries/"

into this:

subdomain = "a" + "a"
        base = "https://" + subdomain + ".hitomi.la/galleries/"

and then changing it back when I was done. Since this is obviously not the best way to do things, and I don't really understand how the value of self.gid is determined, I figured I ought to report the bug and see if we could figure out the problem. I should note that the other gallery I was interested in downloading worked just fine.

@Hrxn
Copy link
Contributor

Hrxn commented Dec 14, 2018

It seems like you already found the culprit, good.
Something like this is expected in case of a recent site change, so I'd guess that's what happened here.

@mikf
Copy link
Owner

mikf commented Dec 14, 2018

Hitomi sets the URLs for displayed images in a few Javascript functions. The important ones can be found at https://ltn.hitomi.la/common.js.

chr(97 + self.gid % 2) is basically what subdomain_from_galleryid(g) does, with number_of_frontends being hard coded to 2 and self.gid representing the gallery ID g (1036181 in this case). You either get aa. or ba. as subdomain, depending on the value of g / self.gid.

What I didn't consider when I last looked at this is that g is not the whole gallery ID, but just the last digit of it, and it gets changed to 0 if it originally was 1:

        if (g === 1) {
                g = 0;
        }

So for any gallery whose ID ends in 1, gallery-dl currently uses the wrong subdomain.

@mikf mikf added the bug label Dec 14, 2018
@mikf mikf closed this as completed in 0be7ee3 Dec 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants