Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to correctly generate npy files from a midi file #3

Open
aihobbyist opened this issue Sep 28, 2018 · 1 comment
Open

How to correctly generate npy files from a midi file #3

aihobbyist opened this issue Sep 28, 2018 · 1 comment

Comments

@aihobbyist
Copy link

I'm trying to train a model and test it, the first step being to generate npy files representing numpy arrays from midi files (as the current dataset uploaded only includes individual midi files). I'm trying the following code (from Testfile.py):

import math
import random
import os
import shutil
from dataprocessing import pretty_midi_to_piano_roll
import matplotlib.pyplot as plt
import pretty_midi
from pypianoroll import Multitrack, Track
import librosa.display
from utils import *

def get_bar_piano_roll(piano_roll):
    if int(piano_roll.shape[0] % 64) is not 0:
        if LAST_BAR_MODE == 'fill':
            piano_roll = np.concatenate((piano_roll, np.zeros((64 - piano_roll.shape[0] % 64, 128))), axis=0)
        elif LAST_BAR_MODE == 'remove':
            piano_roll = np.delete(piano_roll,  np.s_[-int(piano_roll.shape[0] % 64):], axis=0)
    piano_roll = piano_roll.reshape(-1, 64, 128)
    return piano_roll

LAST_BAR_MODE = 'remove'

# convert midi files to npy - midi files will be in the ./Classic/ directory

l = [f for f in os.listdir('./Classic/')]
count = 0
count2 = 0
for i in range(len(l)):
    try:
        multitrack = Multitrack(beat_resolution=4, name=os.path.splitext(l[i])[0])
        x = pretty_midi.PrettyMIDI(os.path.join('./Classic/', l[i]))
        multitrack.parse_pretty_midi(x)
        category_list = {'Piano': [], 'Drums': []}
        program_dict = {'Piano': 0, 'Drums': 0}
        for idx, track in enumerate(multitrack.tracks):
            if track.is_drum:
                category_list['Drums'].append(idx)
            else:
                category_list['Piano'].append(idx)
        tracks = []
        merged = multitrack[category_list['Piano']].get_merged_pianoroll()
        merged = multitrack.get_merged_pianoroll()
        tracks = [(Track(merged, program=0, is_drum=False, name=os.path.splitext(l[i])[0]))]
        mt = Multitrack(None, tracks, multitrack.tempo, multitrack.downbeat, multitrack.beat_resolution, multitrack.name)
        pr = get_bar_piano_roll(merged)
        pr_clip = pr[:, :, 24:108]
        if int(pr_clip.shape[0] % 4) != 0:
            pr_clip = np.delete(pr_clip, np.s_[-int(pr_clip.shape[0] % 4):], axis=0)
        pr_re = pr_clip.reshape(-1, 64, 84, 1)
        print(pr_re.shape)

        # pr_re.shape will be something like (4, 64, 84, 1) or (8, 64, 84, 1) etc

        for j in range(len(pr_re)):
            # this would save each part of pr_re, with each part being of shape (64, 84, 1)
            np.save(os.path.join('./datasets/Classic/train', 'classic_piano_train_' + str(count2) + '.npy'), pr_re[j])
            count2 += 1
        count += 1
        print(str(count))
    except:
        print('Wrong', l[i])
        continue

Per the readme, this is of course incorrect as train data should be arrays of shape (1, 64, 84, 1) and in the above I'm generating them with shape (64, 84, 1). How should midi files be properly converted to numpy arrays in the shape (1, 64, 84, 1)?

@sumuzhao
Copy link
Owner

Hello, the codes for preprocessing are a bit messy. We'll take care of them and modify them later.
Now please use our provided npy files to have a try. Actually, the methods of preprocessing really depends on what you want. You can modify the codes to meet your requirements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants