Download fails for video titles with '?' character #265

mohamedusama · 2024-10-04T19:24:32Z

❗ DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE ❗

lack of information will lead to closure of the issue

Describe the bug
I am trying to download wav audio files of songs given album links. It seems that pytube failes specifically for songs with question mark character in the title. Out of 1425 songs, only 15 fails, and all of them are with question mark character in the video title.

code that was used that resulted in the bug

import os
import pandas as pd
import re
from pytubefix import Playlist, YouTube
from pydub import AudioSegment


# Function to sanitize directory names
def sanitize_directory_name(name):
    # Replace invalid characters with underscores
    if name is None:  # Check if name is None
        return "Unnamed"  # Return a default name if None
    return re.sub(r'[<>:"/\\|?*]', '_', name)

# Function to extract song links from a YouTube playlist (album link)
def get_songs_from_album(album_link):
    try:
        playlist = Playlist(album_link, use_oauth=True, allow_oauth_cache=True)
        return [(video.title, video.watch_url) for video in playlist.videos]
    except Exception as e:
        print(f"Error fetching playlist: {e}\n")
        return []

# Function to download only the audio of a song, convert it to .wav, and save it
def download_song(song_title, song_link, folder_path, failed_downloads):
    try:
        wav_file_path = os.path.join(folder_path, f"{song_title}.wav")
        
        # Check if the .wav file already exists
        if os.path.exists(wav_file_path):
            print(f"File already exists: {wav_file_path}")
            return True  # Return True for successful handling (not downloading)
        
        yt = YouTube(song_link, use_oauth=True, allow_oauth_cache=True)
        audio_stream = yt.streams.filter(only_audio=True).first()  # Get only the audio stream
        
        # Temporary path to download the original audio file (likely .webm or .mp4)
        temp_audio_path = os.path.join(folder_path, f"{song_title}.webm")
        
        # Download the audio
        audio_stream.download(output_path=folder_path, filename=f"{song_title}.webm")
        print(f"Downloaded: {song_title} to {temp_audio_path}")
        
        # Convert the downloaded audio to .wav using pydub
        audio = AudioSegment.from_file(temp_audio_path)  # pydub can handle various formats
        audio.export(wav_file_path, format="wav")  # Export as .wav
        
        # Remove the temporary file after conversion
        os.remove(temp_audio_path)
        
        #print(f"Converted and saved: {song_title} to {wav_file_path}")
        return True  # Return True for a successful download
    except Exception as e:
        print(f"Error downloading {song_title}: {e}\n")
        failed_downloads.append((song_title, song_link))  # Append failed download info
        return False  # Return False for failed download

# Main function to process up to 5 albums and download songs as .wav files
def process_albums(df, base_directory="MusicDownloads"):
    expanded_data = []
    failed_downloads = []  # List to store failed downloads
    successful_downloads = 0  # Counter for successful downloads
    
    # Ensure the base directory exists
    os.makedirs(base_directory, exist_ok=True)

    for idx, row in df.iterrows():
        artist = sanitize_directory_name(row['Artists'])
        
        
        for album_num in range(1, 6):  # Loop through albums 1 to 5
            album_title = row.get(f'album {album_num} title')
            album_link = row.get(f'album {album_num} link')
            
            if pd.notna(album_link):  # Check if the album link exists
                # Fetch songs from the album (playlist)
                print('\n', artist, album_title, album_link)
                songs = get_songs_from_album(album_link)
                
                # Create folder path using the naming convention
                folder_name = f"{artist} - {album_title} - Album {album_num}"
                folder_name = sanitize_directory_name(folder_name)
                folder_path = os.path.join(base_directory, folder_name)
                
                # Create the directory if it doesn't exist
                os.makedirs(folder_path, exist_ok=True)
                
                # Download each song and store the information
                for song_title, song_link in songs:
                    if download_song(song_title, song_link, folder_path, failed_downloads):
                        successful_downloads += 1
                    expanded_data.append({
                        'Artist': artist,
                        'Album Name': album_title,
                        'Album Number': album_num,
                        'Song Title': song_title,
                        'Song Link': song_link
                    })
    
    # Create the expanded DataFrame
    expanded_df = pd.DataFrame(expanded_data)

    # Generate file paths for each song
    expanded_df['File Path'] = expanded_df.apply(
        lambda row: os.path.join(
            base_directory, 
            f"{sanitize_directory_name(row['Artist'])} - {sanitize_directory_name(row['Album Name'])} - Album {row['Album Number']}", 
            f"{row['Song Title']}.wav"
        ), 
        axis=1
    )

    return expanded_df, failed_downloads, successful_downloads

# Load singer information from CSV
singers_info = pd.read_csv('Singer project data sheet.csv')
singers_info['Artists'] = singers_info['Artists'].fillna(method='ffill')  # Corrected forward fill

# Process the albums and download songs as wav files
songs, failed_downloads, successful_downloads = process_albums(singers_info)

# Show the expanded DataFrame with song information
print(songs)

total_downloads = successful_downloads + len(failed_downloads)
print(f"\nSummary of Downloads:")
print(f"Total Attempts: {total_downloads}")
print(f"Successful Downloads: {successful_downloads}")
print(f"Failed Downloads: {len(failed_downloads)}")

# Print the list of failed downloads
if failed_downloads:
    print("\nFailed Downloads:")
    for song_title, song_link in failed_downloads:
        print(f"Song: {song_title}, Link: {song_link}")
else:
    print("\nAll downloads were successful!")

Expected behavior
I expected that the all files download without any failures.

Screenshots
pytubefix_output.txt

Desktop (please complete the following information):

OS: Windows 10 Version 22H2 (OS Build 19045.4651)
Python Version 3.7.6
Pytubefix Version 7.2.2

Additional context
Might it have anything to do with regex?

The text was updated successfully, but these errors were encountered:

JuanBindez · 2024-10-04T19:50:59Z

try:

from pytubefix import YouTube
from pytubefix.cli import on_progress

url = "url"

yt = YouTube(url, on_progress_callback = on_progress)
print(yt.title)

ys = yt.streams.get_audio_only()
ys.download(mp3=True, remove_problematic_character="?")

JuanBindez · 2024-10-04T19:52:00Z

https://pytubefix.readthedocs.io/en/latest/user/problematic_characters.html

JuanBindez · 2024-10-04T19:54:46Z

It's not a problem with the library, but rather with your operating system, which isn't accepting writing "?" when saving

jhanley-com · 2024-10-07T01:38:27Z

Not a Bug.

Suggestions:

Do not post big blobs of code. Reduce your code to the absolute minimum required to reproduce the problem with no other features. There are several benefits: 1) more people with attempt to analyze your code; 2) might be suitable for a test case. 3) quicker time to resolve.
Do not require downloading anything to see the problem (pytubefix_output.txt). Instead copy and paste the text into the issue.
Provide the YouTube URL that generates the problem. Most of use have already written code. We can then verify that our independent code reproduces a similar problem.

Solution:

The PyTube library provides helpers to create OS safe file names:

https://pytube.io/en/latest/api.html#stream-object

pytube.helpers.safe_filename()

The stream class provides default_filename. An OS file system compatible filename.

Modify your code to use one or the other.

mohamedusama added the bug Something isn't working label Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Download fails for video titles with '?' character #265

Download fails for video titles with '?' character #265

mohamedusama commented Oct 4, 2024

JuanBindez commented Oct 4, 2024

JuanBindez commented Oct 4, 2024

JuanBindez commented Oct 4, 2024

jhanley-com commented Oct 7, 2024 •

edited

Loading

Download fails for video titles with '?' character #265

Download fails for video titles with '?' character #265

Comments

mohamedusama commented Oct 4, 2024

JuanBindez commented Oct 4, 2024

JuanBindez commented Oct 4, 2024

JuanBindez commented Oct 4, 2024

jhanley-com commented Oct 7, 2024 • edited Loading

jhanley-com commented Oct 7, 2024 •

edited

Loading