Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lensdump] Reddit posts linking to Lensdump files are being processed with the directlinks extractor instead. #5293

Closed
taskhawk opened this issue Mar 6, 2024 · 6 comments

Comments

@taskhawk
Copy link

taskhawk commented Mar 6, 2024

Reddit posts linking directly to media files in Lensdump through the subdomains https://*.l3n.co/ ignore the configuration for lensdump and are being processed with the directlink extractor instead.

For example, some of the newest posts in this subreddit use these links (NSFW):
https://new.reddit.com/r/NewYorkNine/new/

https://a.l3n.co/i/K4cEg9.gif
https://b.l3n.co/i/Kh72OA.jpeg
https://c.l3n.co/i/KhKxwK.jpeg

@Hrxn
Copy link
Contributor

Hrxn commented Mar 6, 2024

Well, yes. It's technically correct.

The links you've posted are all direct links, and gallery-dl detects them as such.

It would be possible to move the job to another extractor based on URL pattern detection of a "direct link" URL handled by gallery-dl, as an additional feature, but the first question should be here: Is it actually worth it?

For example, what kind of metadata do the lensdump direct links provide here?
Versus the metadata provided by the poster of said links on reddit?

@mikf
Copy link
Owner

mikf commented Mar 6, 2024

Supporting direct links with a site-specific extractor has been done for other sites like Flickr, Imgur, Reddit, and probably more, so it might as well be implemented for Lensdump as well.

It's only one real line of code that needs to be updated:

diff --git a/gallery_dl/extractor/lensdump.py b/gallery_dl/extractor/lensdump.py
index d4ccf33b..8ca9d88e 100644
--- a/gallery_dl/extractor/lensdump.py
+++ b/gallery_dl/extractor/lensdump.py
@@ -104,7 +104,7 @@ class LensdumpImageExtractor(LensdumpBase, Extractor):
     filename_fmt = "{category}_{id}{title:?_//}.{extension}"
     directory_fmt = ("{category}",)
     archive_fmt = "{id}"
-    pattern = BASE_PATTERN + r"/i/(\w+)"
+    pattern = r"(?:https?://)?(?:lensdump\.com|\w\.l3n\.co)/i/(\w+)"
     example = "https://lensdump.com/i/ID"
 
     def __init__(self, match):

@taskhawk
Copy link
Author

taskhawk commented Mar 7, 2024

Is it actually worth it?

For example, what kind of metadata do the lensdump direct links provide here? Versus the metadata provided by the poster of said links on reddit?

In my case it is to insert an entry in the Lensdump archive already set up to avoid redownloading again.

It's only one real line of code that needs to be updated:

Oh cool, glad it's simple. Thanks.

@Hrxn
Copy link
Contributor

Hrxn commented Mar 7, 2024

Is it actually worth it?
For example, what kind of metadata do the lensdump direct links provide here? Versus the metadata provided by the poster of said links on reddit?

In my case it is to insert an entry in the Lensdump archive already set up to avoid redownloading again.

The archive for the "directlink" extractor works just as well here. Just saying.

Supporting direct links with a site-specific extractor has been done for other sites like Flickr, Imgur, Reddit, and probably more, so it might as well be implemented for Lensdump as well.

True, but to be fair, a Reddit direct link is also hosted on reddit, so it's kind of a "first party" direct link, as opposed to the common direct link hosted on an external service. Although I will admit that this distinction is not really that important.

@taskhawk
Copy link
Author

taskhawk commented Mar 9, 2024

Found a few Reddit posts linking to Lensdump files with an older URL format ended up going through the directlink extractor, for example:

NSFW
https://i.lensdump.com/i/kXr9kv.gif
https://i1.lensdump.com/i/63MU57.gif
https://i2.lensdump.com/i/6Hf3tD.gif
https://i3.lensdump.com/i/kFdVr2.gif

When loading the URL in the browser it redirects to the media page where the files now use the new URL format, for example:

https://lensdump.com/i/kXr9kv ==> https://a.l3n.co/i/kXr9kv.gif

@mikf
Copy link
Owner

mikf commented Mar 9, 2024

Found a few Reddit posts linking to Lensdump files with an older URL format

Fixed in ac4e29f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants