Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download fails for video titles with '?' character #265

Open
mohamedusama opened this issue Oct 4, 2024 · 4 comments
Open

Download fails for video titles with '?' character #265

mohamedusama opened this issue Oct 4, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@mohamedusama
Copy link

❗ DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE ❗

lack of information will lead to closure of the issue


Describe the bug
I am trying to download wav audio files of songs given album links. It seems that pytube failes specifically for songs with question mark character in the title. Out of 1425 songs, only 15 fails, and all of them are with question mark character in the video title.


code that was used that resulted in the bug

import os
import pandas as pd
import re
from pytubefix import Playlist, YouTube
from pydub import AudioSegment


# Function to sanitize directory names
def sanitize_directory_name(name):
    # Replace invalid characters with underscores
    if name is None:  # Check if name is None
        return "Unnamed"  # Return a default name if None
    return re.sub(r'[<>:"/\\|?*]', '_', name)

# Function to extract song links from a YouTube playlist (album link)
def get_songs_from_album(album_link):
    try:
        playlist = Playlist(album_link, use_oauth=True, allow_oauth_cache=True)
        return [(video.title, video.watch_url) for video in playlist.videos]
    except Exception as e:
        print(f"Error fetching playlist: {e}\n")
        return []

# Function to download only the audio of a song, convert it to .wav, and save it
def download_song(song_title, song_link, folder_path, failed_downloads):
    try:
        wav_file_path = os.path.join(folder_path, f"{song_title}.wav")
        
        # Check if the .wav file already exists
        if os.path.exists(wav_file_path):
            print(f"File already exists: {wav_file_path}")
            return True  # Return True for successful handling (not downloading)
        
        yt = YouTube(song_link, use_oauth=True, allow_oauth_cache=True)
        audio_stream = yt.streams.filter(only_audio=True).first()  # Get only the audio stream
        
        # Temporary path to download the original audio file (likely .webm or .mp4)
        temp_audio_path = os.path.join(folder_path, f"{song_title}.webm")
        
        # Download the audio
        audio_stream.download(output_path=folder_path, filename=f"{song_title}.webm")
        print(f"Downloaded: {song_title} to {temp_audio_path}")
        
        # Convert the downloaded audio to .wav using pydub
        audio = AudioSegment.from_file(temp_audio_path)  # pydub can handle various formats
        audio.export(wav_file_path, format="wav")  # Export as .wav
        
        # Remove the temporary file after conversion
        os.remove(temp_audio_path)
        
        #print(f"Converted and saved: {song_title} to {wav_file_path}")
        return True  # Return True for a successful download
    except Exception as e:
        print(f"Error downloading {song_title}: {e}\n")
        failed_downloads.append((song_title, song_link))  # Append failed download info
        return False  # Return False for failed download

# Main function to process up to 5 albums and download songs as .wav files
def process_albums(df, base_directory="MusicDownloads"):
    expanded_data = []
    failed_downloads = []  # List to store failed downloads
    successful_downloads = 0  # Counter for successful downloads
    
    # Ensure the base directory exists
    os.makedirs(base_directory, exist_ok=True)

    for idx, row in df.iterrows():
        artist = sanitize_directory_name(row['Artists'])
        
        
        for album_num in range(1, 6):  # Loop through albums 1 to 5
            album_title = row.get(f'album {album_num} title')
            album_link = row.get(f'album {album_num} link')
            
            if pd.notna(album_link):  # Check if the album link exists
                # Fetch songs from the album (playlist)
                print('\n', artist, album_title, album_link)
                songs = get_songs_from_album(album_link)
                
                # Create folder path using the naming convention
                folder_name = f"{artist} - {album_title} - Album {album_num}"
                folder_name = sanitize_directory_name(folder_name)
                folder_path = os.path.join(base_directory, folder_name)
                
                # Create the directory if it doesn't exist
                os.makedirs(folder_path, exist_ok=True)
                
                # Download each song and store the information
                for song_title, song_link in songs:
                    if download_song(song_title, song_link, folder_path, failed_downloads):
                        successful_downloads += 1
                    expanded_data.append({
                        'Artist': artist,
                        'Album Name': album_title,
                        'Album Number': album_num,
                        'Song Title': song_title,
                        'Song Link': song_link
                    })
    
    # Create the expanded DataFrame
    expanded_df = pd.DataFrame(expanded_data)

    # Generate file paths for each song
    expanded_df['File Path'] = expanded_df.apply(
        lambda row: os.path.join(
            base_directory, 
            f"{sanitize_directory_name(row['Artist'])} - {sanitize_directory_name(row['Album Name'])} - Album {row['Album Number']}", 
            f"{row['Song Title']}.wav"
        ), 
        axis=1
    )

    return expanded_df, failed_downloads, successful_downloads

# Load singer information from CSV
singers_info = pd.read_csv('Singer project data sheet.csv')
singers_info['Artists'] = singers_info['Artists'].fillna(method='ffill')  # Corrected forward fill

# Process the albums and download songs as wav files
songs, failed_downloads, successful_downloads = process_albums(singers_info)

# Show the expanded DataFrame with song information
print(songs)

total_downloads = successful_downloads + len(failed_downloads)
print(f"\nSummary of Downloads:")
print(f"Total Attempts: {total_downloads}")
print(f"Successful Downloads: {successful_downloads}")
print(f"Failed Downloads: {len(failed_downloads)}")

# Print the list of failed downloads
if failed_downloads:
    print("\nFailed Downloads:")
    for song_title, song_link in failed_downloads:
        print(f"Song: {song_title}, Link: {song_link}")
else:
    print("\nAll downloads were successful!")


Expected behavior
I expected that the all files download without any failures.


Screenshots
pytubefix_output.txt


Desktop (please complete the following information):

  • OS: Windows 10 Version 22H2 (OS Build 19045.4651)
  • Python Version 3.7.6
  • Pytubefix Version 7.2.2

Additional context
Might it have anything to do with regex?

@mohamedusama mohamedusama added the bug Something isn't working label Oct 4, 2024
@JuanBindez
Copy link
Owner

try:

from pytubefix import YouTube
from pytubefix.cli import on_progress

url = "url"

yt = YouTube(url, on_progress_callback = on_progress)
print(yt.title)

ys = yt.streams.get_audio_only()
ys.download(mp3=True, remove_problematic_character="?")

@JuanBindez
Copy link
Owner

@JuanBindez
Copy link
Owner

It's not a problem with the library, but rather with your operating system, which isn't accepting writing "?" when saving

@jhanley-com
Copy link

jhanley-com commented Oct 7, 2024

Not a Bug.

Suggestions:

  1. Do not post big blobs of code. Reduce your code to the absolute minimum required to reproduce the problem with no other features. There are several benefits: 1) more people with attempt to analyze your code; 2) might be suitable for a test case. 3) quicker time to resolve.
  2. Do not require downloading anything to see the problem (pytubefix_output.txt). Instead copy and paste the text into the issue.
  3. Provide the YouTube URL that generates the problem. Most of use have already written code. We can then verify that our independent code reproduces a similar problem.

Solution:

The PyTube library provides helpers to create OS safe file names:

https://pytube.io/en/latest/api.html#stream-object

pytube.helpers.safe_filename()

The stream class provides default_filename. An OS file system compatible filename.

Modify your code to use one or the other.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: waiting
Development

No branches or pull requests

3 participants