Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent character substitution in file name and in --exec option #32871

Closed
1 task
tansy opened this issue Jul 21, 2024 · 11 comments
Closed
1 task

Inconsistent character substitution in file name and in --exec option #32871

tansy opened this issue Jul 21, 2024 · 11 comments

Comments

@tansy
Copy link

tansy commented Jul 21, 2024

Checklist

  • I'm reporting a broken site support issue
  • [ x ] I've verified that I'm running youtube-dl version 2024.07.11
  • [ x ] I've checked that all provided URLs are alive and playable in a browser
  • [ x ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • [ x ] I've searched the bugtracker for similar bug reports including closed ones
  • [ x ] I've read bugs section in FAQ

Verbose log

+ youtube-dl -v -f 242 --exec 'f={}; echo "mplayer \""$f"\"" > "$f.sh"' 'https://www.youtube.com/watch?v=b0zxIfJJLAY'
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'-f', u'242', u'--exec', u'f={}; echo "mplayer \\""$f"\\"" > "$f.sh"', u'https://www.youtube.com/watch?v=b0zxIfJJLAY']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2024.07.11 [16f5bbc46] (single file build)
[debug] ** This version was built from the latest master code at https://github.com/ytdl-org/youtube-dl.
[debug] ** For support, visit the main site.
[debug] Python 2.7.3 (CPython i686 32bit) - OpenSSL 1.0.2h  3 May 2016 - glibc 2.0
[debug] exe versions: none
[debug] Proxy map: {}
[youtube] b0zxIfJJLAY: Downloading webpage
[debug] [youtube] Decrypted nsig pvICy1U6Lntu9fCD => 0Q9hB5mXdi8nMQ
[debug] [youtube] Decrypted nsig ByDm0fiqpTeHEFH3 => q6MTr3f0ifxKuA
[debug] Invoking downloader on u'https://rr2---sn-u2oxu-3ufd.googlevideo.com/videoplayback?...
[dashsegments] Total fragments: 2
[download] Destination: Hello, Assembly!  Retrocoding the World's Smallest Windows App in x86 ASM-b0zxIfJJLAY.webm
[download] 100% of 14.22MiB in 00:14
[exec] Executing command: f='Hello, Assembly!  Retrocoding the World'"'"'s Smallest Windows App in x86 ASM-b0zxIfJJLAY.webm'; echo "mplayer \""$f"\"" > "$f.sh"

$ sh -x Hello,\ Assembly\!\ \ Retrocoding\ the\ World\'s\ Smallest\ Windows\ App\ in\ x86\ ASM-b0zxIfJJLAY.webm.sh
+ mplayer 'Hello, Assembly! Retrocoding the World'\''s Smallest Windows App in x86 ASM-b0zxIfJJLAY.webm'

Playing Hello, Assembly! Retrocoding the World's Smallest Windows App in x86 ASM-b0zxIfJJLAY.webm.
File not found: 'Hello, Assembly! Retrocoding the World's Smallest Windows App in x86 ASM-b0zxIfJJLAY.webm'
Failed to open Hello, Assembly! Retrocoding the World's Smallest Windows App in x86 ASM-b0zxIfJJLAY.webm.


Exiting... (End of file)

$ ls -1 Hello*
Hello, Assembly!  Retrocoding the World's Smallest Windows App in x86 ASM-b0zxIfJJLAY.webm
Hello, Assembly!  Retrocoding the World's Smallest Windows App in x86 ASM-b0zxIfJJLAY.webm.sh

Description

I use `--exec option to create a script that does something with a downloaded file. When there is some character substitution, like ':' to ' - ', consecutive spaces appear in file name string and not string passed by `--exec', resulting in wrong script.

In the example above title "Hello, Assembly! Retrocoding the World's Smallest Windows App in x86 ASM" result in file name "Hello, Assembly! Retrocoding the World's Smallest Windows App in x86 ASM-b0zxIfJJLAY.webm"

"Hello, Assembly! Retrocoding the World's Smallest Windows App in x86 ASM"
"Hello, Assembly!  Retrocoding the World's Smallest Windows App in x86 ASM-b0zxIfJJLAY.webm"

Similar with this title: (@D82C7JS_2iw)

"Docker vs VM: What's the Difference, and Why You Care!"
"Docker vs VM -  What's the Difference, and Why You Care!-D82C7JS_2iw.webm"

Or one from here: uRlxN0_zVHo.

@dirkf
Copy link
Contributor

dirkf commented Jul 21, 2024

Is this a new problem, or did you only just start to use your script?

@tansy
Copy link
Author

tansy commented Jul 21, 2024

Hard to say. I use it for few weeks but didn't pay attention to this. What's going on I realised writing the report. Now, older releases don't work at all but what I digged is that v2024.04.08 tries to make file name with two spaces, even if cannot download it.

$ ./youtube-dl-2024.04.08 -v -f 242 --exec 'f={}; echo "mplayer \""$f"\"" > "$f.sh"' 'https://www.youtube.com/watch?v=b0zxIfJJLAY'

[dashsegments] Total fragments: 2
[download] Destination: Hello, Assembly!  Retrocoding the World's Smallest Windows App in x86 ASM-b0zxIfJJLAY.webm
[download] Got server HTTP error: HTTP Error 403: Forbidden. Retrying fragment 1 (attempt 1 of 10)...
[download] Got server HTTP error: HTTP Error 403: Forbidden. Retrying fragment 1 (attempt 2 of 10)...

and v2024.02.03 doesn't even get to that point.

So it seems to be at least as old as since april.

@aiur-adept
Copy link
Contributor

For context, I'm using the video https://www.youtube.com/watch?v=uRlxN0_zVHo (title Understanding the Linux Backdoor: Implications for Open Source [When Penguins Cry])

It seems the video title is coming from a call to _extract_yt_initial_variable() in youtube.py, which reads info from the webpage via a regex. (link to source)

It reads the key videoDetails from the json, which already contains the double-space:

'Understanding the Linux Backdoor: Implications for Open Source [When Penguins Cry]'

So, the double-space is coming from the webpage itself, not any processing done by youtube-dl.

Then, the filename for output is generated by YoutubeDL.prepare_filename (link to source) which replaces the : with a -.

I'm guessing there are just a certain portion of youtube videos that have extra spaces inserted in their page data for the video title.

Perhaps the youtube API returns the proper filename while this page data is not to be relied on?

@aiur-adept
Copy link
Contributor

SO, to circle back on this, grabbing the video title from page data on youtube sometimes gives a title that includes an extra space. However, on further testing I found that grabbing from the API only (i removed the first branch of the if here and tested with only API query) still results in a title with an extra space.

@tansy youtube-dl is going to output what it's going to output, to put it succinctly. I suggest you simply make use of the output file that youtube_dl gives to determine the output filename. You can use grep on the output or embed the program and intercept output by setting a logger object . Here's an example of embedding to grab the output filename:

from __future__ import unicode_literals
import youtube_dl


class MyLogger(object):
    def debug(self, msg):
        pass

    def warning(self, msg):
        pass

    def error(self, msg):
        print(msg)


def my_hook(d):
    if d['status'] == 'finished':
        print(f'output filename is {d["filename"]}')


ydl_opts = {
    'logger': MyLogger(),
    'progress_hooks': [my_hook],
}
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
    ydl.download(['https://www.youtube.com/watch?v=BaW_jenozKc'])

@aiur-adept
Copy link
Contributor

(the hook will trigger twice, once for the mp4 and once for the m4a, so you should disambiguate by using a regex test to grab the mp4 filename only)

@dirkf
Copy link
Contributor

dirkf commented Jul 30, 2024

Still, it's clearly wrong if the magic filename marker {} in --exec ... is not being replaced by the actual filename. More investigation needed. Perhaps PR #32830 broke something.

@aiur-adept
Copy link
Contributor

aiur-adept commented Jul 31, 2024

The reason this is happening is because the shell collapses multiple spaces depending on the quoting. You have an "extra" layer of quoting that's not needed.

What you have:

youtube-dl -v -f 242 --exec 'f={}; echo "mplayer \""$f"\"" > "$f.sh"' 'https://www.youtube.com/watch?v=b0zxIfJJLAY'

What you should use:

youtube-dl -v -f 242 --exec 'f={}; echo "mplayer \"$f\"" > "$f.sh"' 'https://www.youtube.com/watch?v=b0zxIfJJLAY'

(we have \"$f\" instead of \""$f"\")

@aiur-adept
Copy link
Contributor

@tansy i just tested the solution in my recent comment and it generated the .sh file, and it worked. Let us know how it goes on your end.

@tansy
Copy link
Author

tansy commented Jul 31, 2024

Thank you for your help. Just tested it and it works.
I didn't know I can overdo quotes, after all they worked (almost) all the time. It came from bad experiences with `!'in bash and the problem with $var as opposed to "$var" when file name contains spaces.

@aiur-adept
Copy link
Contributor

@tansy if you look carefully and count quotes, its ultimately because $f was naked, unquoted, in between two quotegroups

@dirkf
Copy link
Contributor

dirkf commented Aug 1, 2024

I think we're all good, then. Thx @aiur-adept.

@dirkf dirkf closed this as completed Aug 1, 2024
@dirkf dirkf added the not-a-bug label Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants