Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ARD] Wrong match pattern #22724

Closed
5 tasks done
olifre opened this issue Oct 15, 2019 · 7 comments
Closed
5 tasks done

[ARD] Wrong match pattern #22724

olifre opened this issue Oct 15, 2019 · 7 comments

Comments

@olifre
Copy link

olifre commented Oct 15, 2019

Checklist

  • I'm reporting a broken site support
  • I've verified that I'm running youtube-dl version 2019.09.28
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • I've searched the bugtracker for similar issues including closed ones

Verbose log

$ youtube-dl -v https://www.daserste.de/unterhaltung/serie/familie-dr-kleist/videos/pauline-angert-im-inteview-familie-dr-kleist100.html
[debug] System config: []
[debug] User config: ['--mark-watched', '--external-downloader', 'aria2c']
[debug] Custom config: []
[debug] Command-line args: ['-v', 'https://www.daserste.de/unterhaltung/serie/familie-dr-kleist/videos/pauline-angert-im-inteview-familie-dr-kleist100.html']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2019.09.28
[debug] Python version 3.6.9 (CPython) - Linux-5.2.14-gentoo-x86_64-Intel-R-_Core-TM-_i7-4710MQ_CPU_@_2.50GHz-with-gentoo-2.6
[debug] exe versions: ffmpeg 4.2.1, ffprobe 4.2.1, rtmpdump 2.4
[debug] Proxy map: {}
[generic] pauline-angert-im-inteview-familie-dr-kleist100: Requesting header
WARNING: Falling back on generic information extractor.
[generic] pauline-angert-im-inteview-familie-dr-kleist100: Downloading webpage
[generic] pauline-angert-im-inteview-familie-dr-kleist100: Extracting information
[download] Downloading playlist: Pauline Angert als Luisa Ewald | Familie Dr. Kleist
[generic] playlist Pauline Angert als Luisa Ewald | Familie Dr. Kleist: Collected 1 video ids (downloading 1 of them)
[download] Downloading video 1 of 1
[debug] Default format spec: bestvideo+bestaudio/best
[debug] Invoking downloader on 'https://www.daserste.de/mediasrc/static/null.html'
[download] Destination: Pauline Angert als Luisa Ewald _ Familie Dr. Kleist-pauline-angert-im-inteview-familie-dr-kleist100.html
[debug] aria2c command line: aria2c -c --min-split-size 1M --max-connection-per-server 4 --out 'Pauline Angert als Luisa Ewald _ Familie Dr. Kleist-pauline-angert-im-inteview-familie-dr-kleist100.html.part' --header 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3735.1 Safari/537.36' --header 'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7' --header 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' --header 'Accept-Encoding: gzip, deflate' --header 'Accept-Language: en-us,en;q=0.5' --header 'Referer: https://www.daserste.de/unterhaltung/serie/familie-dr-kleist/videos/pauline-angert-im-inteview-familie-dr-kleist100.html' --header 'Cookie: ROUTEID=http12; JSESSIONID=D9EB4B8DCD77FFAE711D808470CFCAF8' --check-certificate=true --remote-time=true -- https://www.daserste.de/mediasrc/static/null.html

10/15 23:04:08 [NOTICE] Downloading 1 item(s)

10/15 23:04:08 [NOTICE] Übertragung vollständig: /tmp/foo/Pauline Angert als Luisa Ewald _ Familie Dr. Kleist-pauline-angert-im-inteview-familie-dr-kleist100.html.part

Übertragungsergebnisse:
gid   |stat|avg speed  |path/URI
======+====+===========+=======================================================
a2a719|OK  |       0B/s|/tmp/foo/Pauline Angert als Luisa Ewald _ Familie Dr. Kleist-pauline-angert-im-inteview-familie-dr-kleist100.html.part

Statuserläuterung:
(OK):Herunterladen vollständig.
[aria2c] Downloaded 0 bytes
[download] 100% of 0.00B in 00:00
[download] Finished downloading playlist: Pauline Angert als Luisa Ewald | Familie Dr. Kleist

Description

This downloads an empty file, also when not using aria2c as downloader.
Other videos from the same page work fine, but also this video works fine in the browser.

@olifre
Copy link
Author

olifre commented Oct 15, 2019

Mhm - seems the ARD extractor is not even used. Maybe because the URL does not contain the typical - before the ID, which fails the following pattern:

_VALID_URL = r'(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html'

?

@olifre
Copy link
Author

olifre commented Oct 15, 2019

Relaxing the pattern to:

_VALID_URL = r'(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos/(?P<display_id>[^/?#]+)-?(?P<id>[0-9]+))\.html'

appears to make things work.

@olifre olifre changed the title [ard] Some video URLs fail [ARD] Wrong match pattern Oct 17, 2019
@scrouthtv
Copy link

Your pattern does not work for me.

@olifre
Copy link
Author

olifre commented Jun 6, 2020

@scrouthtv Can you elaborate what "does not work" means?

It still works perfectly well here with the example URL I provided, using version 2020.05.29 patching just the expression, with Python 3.7. It's hard to reproduce a "does not work" without any details whatsoever.

@scrouthtv
Copy link

I'm trying
youtube-dl https://www.ardmediathek.de/ard/player/Y3JpZDovL3dkci5kZS9CZWl0cmFnLWQ2NDJjYWEzLTMwZWYtNGI4NS1iMTI2LTU1N2UxYTcxOGIzOQ/tatort-duo-koeln-leipzig-ihr-kinderlein-kommet
and get

[ARDBetaMediathek] Y3JpZDovL3dkci5kZS9CZWl0cmFnLWQ2NDJjYWEzLTMwZWYtNGI4NS1iMTI2LTU1N2UxYTcxOGIzOQ: Downloading JSON metadata
Traceback (most recent call last):
  File "/usr/bin/youtube-dl", line 11, in <module>
    load_entry_point('youtube-dl==2020.6.6', 'console_scripts', 'youtube-dl')()
  File "/usr/lib/python3.8/site-packages/youtube_dl/__init__.py", line 474, in main
    _real_main(argv)
  File "/usr/lib/python3.8/site-packages/youtube_dl/__init__.py", line 464, in _real_main
    retcode = ydl.download(all_urls)
  File "/usr/lib/python3.8/site-packages/youtube_dl/YoutubeDL.py", line 2018, in download
    res = self.extract_info(
  File "/usr/lib/python3.8/site-packages/youtube_dl/YoutubeDL.py", line 797, in extract_info
    ie_result = ie.extract(url)
  File "/usr/lib/python3.8/site-packages/youtube_dl/extractor/common.py", line 530, in extract
    ie_result = self._real_extract(url)
  File "/usr/lib/python3.8/site-packages/youtube_dl/extractor/ard.py", line 395, in _real_extract
    title = player_page['title']
TypeError: 'NoneType' object is not subscriptable

@olifre
Copy link
Author

olifre commented Jun 6, 2020

@scrouthtv Then it's clear, your error is about a completely different expression. While this issue is about the ARD info extractor:

class ARDIE(InfoExtractor):
_VALID_URL = r'(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html'

your problem is with the ARDBetaMediathek extractor, which is found here:
class ARDBetaMediathekIE(InfoExtractor):
_VALID_URL = r'https://(?:beta|www)\.ardmediathek\.de/[^/]+/(?:player|live)/(?P<video_id>[a-zA-Z0-9]+)(?:/(?P<display_id>[^/?#]+))?'

Likely, that regex also needs some kind of adaption, but that warrants a new issue I think.

@scrouthtv
Copy link

Welp, sorry, those line numbers vary from mine for some reason, I wasn't sure which extractor you were talking about and this was the only open issue about ARD so I just guessed.

Never mind me then

github-actions bot added a commit to hellopony/youtube-dl that referenced this issue Mar 24, 2021
* https://github.com/ytdl-org/youtube-dl:
  [ard] improve clip id extraction(ytdl-org#22724)(closes ytdl-org#28528)
  release 2021.03.25
  [ChangeLog] Actualize [ci skip]
  [zoom] Add new extractor(closes ytdl-org#16597, closes ytdl-org#27002, closes ytdl-org#28531)
  [extractor] escape forgotten dot for hostnames in regular expression (ytdl-org#28530)
  [bbc] fix BBC IPlayer Episodes/Group extraction(closes ytdl-org#28360)
  [youtube] Fix default value for youtube_include_dash_manifest (closes ytdl-org#28523)
leshasmlesha pushed a commit to leshasmlesha/youtube-dl that referenced this issue Apr 3, 2021
@dstftw dstftw closed this as completed in d495292 May 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants