Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] [reddit] Some v.redd.it links on User Profiles (possibly others) fail to download, 'NoneType' is not iterable error. #3258

Closed
Silent-Soldier opened this issue Nov 19, 2022 · 4 comments · Fixed by #3306
Labels

Comments

@Silent-Soldier
Copy link

I recently ran across this bug while parsing a subreddit, but I can only reliably recreate the issue with a NSFW video link on a users profile so far. Otherwise, the issue is intermittent/fails to occur, no idea why.

Verbose output:

>gallery-dl --verbose "https://www.reddit.com/user/69beautifulporn69/comments/x8p3yf/eufrat/"
2022-11-19 05:29:01 [gallery-dl][debug] Version 1.24.0-dev
2022-11-19 05:29:01 [gallery-dl][debug] Python 3.11.0 - Windows-10-10.0.19045-SP0
2022-11-19 05:29:01 [gallery-dl][debug] requests 2.28.1 - urllib3 1.26.12
2022-11-19 05:29:01 [gallery-dl][debug] Starting DownloadJob for 'https://www.reddit.com/user/69beautifulporn69/comments/x8p3yf/eufrat/'
2022-11-19 05:29:03 [cookies][debug] Extracting cookies from C:\Users\*****\*****\*****\Mozilla\Firefox\Profiles\*****\cookies.sqlite
2022-11-19 05:29:03 [reddit][debug] Using RedditSubmissionExtractor for 'https://www.reddit.com/user/69beautifulporn69/comments/x8p3yf/eufrat/'
2022-11-19 05:29:03 [urllib3.connectionpool][debug] Starting new HTTPS connection (1): oauth.reddit.com:443
2022-11-19 05:29:03 [urllib3.connectionpool][debug] https://oauth.reddit.com:443 "GET /comments/x8p3yf/.json?limit=0&raw_json=1 HTTP/1.1" 200 2307
2022-11-19 05:29:03 [reddit][debug] Using download archive '*****/gallery-dl/.archives/reddit.sqlite3'
2022-11-19 05:29:03 [postprocessor.metadata][debug] Using download archive '*****/gallery-dl/.archives/reddit-metadata.sqlite3'
2022-11-19 05:29:03 [postprocessor.ugoira][debug] using mkvmerge demuxer
2022-11-19 05:29:03 [reddit][debug] Active postprocessor modules: [ClassifyPP, MetadataPP, MtimePP, UgoiraPP]
2022-11-19 05:29:04 [downloader.ytdl][debug] [generic] ypr3fhcnzjm91: Downloading webpage
2022-11-19 05:29:05 [downloader.ytdl][debug] [redirect] Following redirect to https://www.reddit.com/user/69beautifulporn69/comments/x8p3yf/eufrat/
2022-11-19 05:29:05 [downloader.ytdl][debug] [generic] eufrat: Downloading webpage
2022-11-19 05:29:05 [downloader.ytdl][warning] [generic] Falling back on generic information extractor
2022-11-19 05:29:06 [downloader.ytdl][debug] [generic] eufrat: Extracting information
2022-11-19 05:29:06 [downloader.ytdl][error] ERROR: Unsupported URL: https://www.reddit.com/user/69beautifulporn69/comments/x8p3yf/eufrat/
2022-11-19 05:29:06 [reddit][error] An unexpected error occurred: TypeError - argument of type 'NoneType' is not iterable. Please run gallery-dl again with the --verbose flag, copy its output and report this issue on https://github.com/mikf/gallery-dl/issues .
2022-11-19 05:29:06 [reddit][debug]
Traceback (most recent call last):
  File "C:\Users\*****\*****\Roaming\Python\Python311\site-packages\gallery_dl\job.py", line 84, in run
    self.dispatch(msg)
  File "C:\Users\*****\*****\Roaming\Python\Python311\site-packages\gallery_dl\job.py", line 128, in dispatch
    self.handle_url(url, kwdict)
  File "C:\Users\*****\*****\Roaming\Python\Python311\site-packages\gallery_dl\job.py", line 248, in handle_url
    if not self.download(url):
           ^^^^^^^^^^^^^^^^^^
  File "C:\Users\*****\*****\Roaming\Python\Python311\site-packages\gallery_dl\job.py", line 380, in download
    return downloader.download(url, self.pathfmt)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\*****\*****\Roaming\Python\Python311\site-packages\gallery_dl\downloader\ytdl.py", line 69, in download
    if "entries" in info_dict:
       ^^^^^^^^^^^^^^^^^^^^^^
TypeError: argument of type 'NoneType' is not iterable
@mikf
Copy link
Owner

mikf commented Nov 19, 2022

Without cookies I only get a non-fatal error:

[urllib3.connectionpool][debug] https://oauth.reddit.com:443 "GET /comments/x8p3yf/.json?limit=0&raw_json=1 HTTP/1.1" 200 2360
[downloader.ytdl][debug] [generic] ypr3fhcnzjm91: Downloading webpage
[downloader.ytdl][debug] [redirect] Following redirect to https://www.reddit.com/user/69beautifulporn69/comments/x8p3yf/eufrat/
[downloader.ytdl][debug] [generic] eufrat: Downloading webpage
[downloader.ytdl][warning] [generic] Falling back on generic information extractor
[downloader.ytdl][debug] [generic] eufrat: Extracting information
[downloader.ytdl][error] ERROR: Unsupported URL: https://www.reddit.com/user/69beautifulporn69/comments/x8p3yf/eufrat/
[download][error] Failed to download ytdl:https://v.redd.it/ypr3fhcnzjm91

@mikf mikf added the bug label Nov 19, 2022
@InterruptSpeed
Copy link
Contributor

looks like reddit extractor wants to hand off to yt-dlp because the JSON file has is_video=true but it's using the JSON url key/value
"url" : "https://v.redd.it/ypr3fhcnzjm91"
rather than the correct key/value
"fallback_url" : "https://v.redd.it/ypr3fhcnzjm91/DASH_720.mp4?source=fallback"
found within media->reddit_video elements.

A proposed fix would be to check for the existence of fallback_url when the domain is v.redd.it and use that value to hand off to yt-dlp. I can work on that if it makes sense?

@InterruptSpeed
Copy link
Contributor

InterruptSpeed commented Nov 22, 2022

what is a more pythonic fix?
a)

try:
  url = submission["media"]["reddit_video"]["fallback_url"]
except KeyError:
  pass

b)

if "media" in submission \
  and "reddit_video" in submission["media"] \
  and "fallback_url" in submission["media"]["reddit_video"]:
  url = submission["media"]["reddit_video"]["fallback_url"]

to be inserted in the RedditExtractor items() method right before the yield in the elif submission["is_video"]: block

how to test that the change doesn't break other scenarios? can submit pull request for fix if we are on the right track.

@Silent-Soldier
Copy link
Author

Silent-Soldier commented Nov 25, 2022

I believe @InterruptSpeed may be partially correct on this. I've been experimenting with various solutions over the last few days, focusing mainly on cookies being the issue (due to verbose feedback from gallery-dl and yt-dlp independently). Removing cookies altogether, the same behavior exists when trying the URI with yt-dlp by itself.

The "fallback_url" appears to download correctly when passed to yt-dlp, though the audio is cut/nonexistent. I believe the URIs need to be redirected to https://v.redd.it/ypr3fhcnzjm91/DASHPlaylist.mpd (higher quality) or https://v.redd.it/ypr3fhcnzjm91/HLSPlaylist.m3u8 (lower quality)?

mikf added a commit that referenced this issue Nov 27, 2022
* use fallback_url for reddit_video to fix issue 3258

* changed to dash_url to include audio

* update

- use [] instead of .get
- catch TypeErrors in case one of the elements is not a dict

Co-authored-by: InterruptSpeed <steven@docherty.ca>
Co-authored-by: Mike Fährmann <mike_faehrmann@web.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants