Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: RWC POP melody parser #12

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 101 additions & 0 deletions parsers/data/rwcmelodypop/metadata.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
Piece No.,Cat. Suffix,Tr. No.,Title,Artist (Vocal),Singer Information,Length,Tempo,Live Instruments Used,Drum Information
No. 1,M01,Tr. 01,Eien no replica,Kazuo Nishi,Male,3:29,135,Gt,Drum sequences
No. 2,M01,Tr. 02,Magic in your eyes,Hiromi Yoshii,Female,3:42,100,Gt & Bs,Drum sequences
No. 3,M01,Tr. 03,HORO,MIT,Vocal group (1 male + 1 female),3:15,111,,Drum sequences
No. 4,M01,Tr. 04,Spice of Life,Hisayoshi Kazato,Male,4:02,86,Gt,Drum sequences
No. 5,M01,Tr. 05,Koino Ver.2.4,Eves,Vocal group (4 female),3:48,135,Gt,Drum sequences
No. 6,M01,Tr. 06,Funky Life,Oriken,Male,3:26,120,Gt,Drum sequences
No. 7,M01,Tr. 07,PROLOGUE,Tomomi Ogata,Female,4:58,122,Gt,Drum sequences
No. 8,M01,Tr. 08,Jinsei konnamono,fevers,Vocal group (2 female),3:12,127,Gt & Bs,Drum sequences
No. 9,M01,Tr. 09,Doukoku,Kazuo Nishi,Male,4:37,70,Gt & Bs & Dr,Live drums
No. 10,M01,Tr. 10,Getting Over,Brakes,Vocal group (3 female),3:35,125,Gt,Drum sequences
No. 11,M01,Tr. 11,Ienai,Hisayoshi Kazato,Male,4:27,90,Gt,Drum sequences
No. 12,M01,Tr. 12,KAGE-ROU,Kazuo Nishi,Male,3:24,120,Gt & Bs,Drum sequences
No. 13,M01,Tr. 13,Catch ball,Konbu,Female,3:39,103,Gt,Drum sequences
No. 14,M01,Tr. 14,Karehairo no Twilight,Rin,Female,3:54,88,Gt & Bs,Drum sequences
No. 15,M01,Tr. 15,old fashioned,Katsuyuki Ozawa,Male,2:42,132,Gt,Drum sequences
No. 16,M01,Tr. 16,Game of Love,Hiromi Yoshii,Female,4:22,122,Gt & Bs,Drum loops
No. 17,M02,Tr. 01,Anata to aete,Hiromi Yoshii,Female,4:01,97,Gt,Drum sequences
No. 18,M02,Tr. 02,True Heart,Tomomi Ogata,Female,4:14,112,Gt,Drum sequences
No. 19,M02,Tr. 03,COOL Motion,Hisayoshi Kazato,Male,4:49,130,Gt,Drum sequences
No. 20,M02,Tr. 04,Tokimeki no syunkan,Eri Ichikawa,Female,4:10,134,,Drum sequences
No. 21,M02,Tr. 05,Feeling In My Heart,Rin,Female,4:28,98,Gt,Drum sequences
No. 22,M02,Tr. 06,Koi ni ochiru jikan ni kansuru kousatsu,Kazuo Nishi,Male,3:29,135,Gt,Drum sequences
No. 23,M02,Tr. 07,SHAKE,MAPS,Vocal group (2 male),3:21,132,Gt,Drum sequences
No. 24,M02,Tr. 08,it's all right,Hisayoshi Kazato,Male,4:00,130,,Drum sequences
No. 25,M02,Tr. 09,tell me,Tomomi Ogata,Female,4:16,103,Gt,Drum sequences
No. 26,M02,Tr. 10,aozora sanpo michi,Tomoko Nitta,Female,3:27,158,Gt & Bs,Drum sequences
No. 27,M02,Tr. 11,stay,Shingo Katsuta,Male,5:18,124,Gt,Drum sequences
No. 28,M02,Tr. 12,Fly away,Tomomi Ogata,Female,4:10,109,Gt,Drum sequences
No. 29,M02,Tr. 13,One Two STEP,Kazuo Nishi,Male,3:35,103,Gt,Drum sequences
No. 30,M02,Tr. 14,syounen no omoi,Mitsuru Tanimoto,Male,3:16,104,,Drum sequences
No. 31,M02,Tr. 15,Moving Round and Round,Yuuichi Nagayama,Male,4:10,129,,Drum loops
No. 32,M02,Tr. 16,what could I do for you,Masaki Kuehara,Male,4:12,125,Gt,Drum sequences
No. 33,M03,Tr. 01,DREAM MAGIC,Hiromi Yoshii,Female,4:47,108,Gt & Bs,Drum sequences
No. 34,M03,Tr. 02,Hitoyo no yume,Hiromi Yoshii,Female,3:27,93,Gt & Bs & Dr,Live drums
No. 35,M03,Tr. 03,Midarana kami no moushigo,Hiromi Yoshii,Female,3:13,170,Gt & Bs & Dr,Live drums
No. 36,M03,Tr. 04,over and over,Kazuo Nishi,Male,5:16,135,Gt & Bs & Dr,Live drums
No. 37,M03,Tr. 05,Replica,Yoshinori Hatae,Male,3:59,184,Gt & Bs & Dr,Live drums
No. 38,M03,Tr. 06,1999,Kousuke Morimoto,Male,4:35,125,Gt & Bs & Dr,Live drums
No. 39,M03,Tr. 07,SPUL,Kousuke Morimoto,Male,4:48,73,Gt & Bs & HH,Drum sequences
No. 40,M03,Tr. 08,promise,Kazuo Nishi,Male,3:46,122,Gt & Bs,Drum sequences
No. 41,M03,Tr. 09,Non Stop Driving,Katsuyuki Ozawa,Male,2:50,200,Gt,Drum sequences
No. 42,M03,Tr. 10,Fly to the moon,Kousuke Morimoto,Male,4:08,125,Gt & Bs,Drum sequences
No. 43,M03,Tr. 11,Centimeter no kodoku,Kazuo Nishi,Male,3:24,163,Gt,Drum sequences
No. 44,M03,Tr. 12,REAL na 5 hun,Kousuke Morimoto,Male,4:06,124,Gt & Bs,Drum sequences
No. 45,M03,Tr. 13,Hajimari,Kousuke Morimoto,Male,3:42,77,Gt & Bs & Dr,Live drums
No. 46,M03,Tr. 14,Senro wa tsuzukuyo,Kousuke Morimoto,Male,3:19,168,Gt,Drum sequences
No. 47,M03,Tr. 15,Deaetakara,Satoshi Kumasaka,Male,3:30,94,Gt × 2 & Bs & Dr & Pf,Live drums
No. 48,M03,Tr. 16,Syodoubutsu,Hiroshi Sekiya,Male,4:29,86,Gt,Drum sequences
No. 49,M04,Tr. 01,Sekai no mikata,Hiroshi Sekiya,Male,4:35,100,Gt,Drum sequences
No. 50,M04,Tr. 02,Mrs. Maril,Rin,Female,3:15,114,Gt & Bs,Drum sequences
No. 51,M04,Tr. 03,Modoranai natsu,Hiroshi Sekiya,Male,6:07,104,Gt × 2 & Bs & Dr & Pf,Live drums
No. 52,M04,Tr. 04,Haru ga kurukara,Tomomi Ogata,Female,3:45,140,Gt,Drum sequences
No. 53,M04,Tr. 05,Ashita wa,Rin,Female,3:39,132,Gt & Bs,Drum sequences
No. 54,M04,Tr. 06,Harukana omoi,Rin,Female,3:42,125,Gt & Bs,Drum sequences
No. 55,M04,Tr. 07,First Love,Akiko Kaburagi,Female,4:09,74,Gt × 2 & Bs & Dr,Live drums
No. 56,M04,Tr. 08,I've got a mail,Masashi Hashimoto,Male,5:22,74,Gt & Bs,Drum loops
No. 57,M04,Tr. 09,Stay with me,Masashi Hashimoto,Male,4:27,70,,Drum sequences
No. 58,M04,Tr. 10,Silver shoes,Rin,Female,3:45,118,,Drum loops
No. 59,M04,Tr. 11,Tenshi no utatane,Rin,Female,3:25,98,Gt & Tp,Drum loops
No. 60,M04,Tr. 12,Kumorizora,Yuzu Iijima,Female,4:03,148,Gt × 2 & Bs,Drum loops
No. 61,M04,Tr. 13,FOR YOU,Kazuo Nishi,Male,4:43,121,Gt,Drum sequences
No. 62,M04,Tr. 14,Be with me Now,Rin,Female,3:11,81,Gt,Drum sequences
No. 63,M04,Tr. 15,Power of mind,Reiko Sato,Female,4:11,126,Gt & Bs & Dr,Live drums
No. 64,M04,Tr. 16,So Long,Kousuke Morimoto,Male,4:52,100,Gt & Bs,Drum sequences
No. 65,M05,Tr. 01,Tanabata,Makiko Hattori,Female,3:52,76,Gt & Bs & Dr,Live drums
No. 66,M05,Tr. 02,Mousugu natsu ga kuru,M & Y,Vocal group (2 female),5:20,76,Gt & HH & Pf,Drum sequences
No. 67,M05,Tr. 03,Tokei no hayasa wa,Makiko Hattori,Female,4:10,88,Gt & Bs & Dr,Live drums
No. 68,M05,Tr. 04,Nichiyoubi,Makiko Hattori,Female,4:54,92,Gt & Bs & Dr,Live drums
No. 69,M05,Tr. 05,Gin no sora,Hiromi Yoshii,Female,6:00,62,Gt,Drum sequences
No. 70,M05,Tr. 06,Miageta sora wa,Tamako Matsuzaka,Female,4:06,104,Gt & Bs & Dr & Pf,Live drums
No. 71,M05,Tr. 07,Tsuki no youni,Hiromi Yoshii,Female,4:46,70,Gt & Pf,Without drums
No. 72,M05,Tr. 08,Heart to Hurt,Kousuke Morimoto,Male,3:21,76,Pf & Vc,Without drums
No. 73,M05,Tr. 09,Miss Maria,Kazuo Nishi,Male,3:16,144,Pf & Vc,Without drums
No. 74,M05,Tr. 10,Kimi no iro,Kousuke Morimoto,Male,3:14,94,Gt,Without drums
No. 75,M05,Tr. 11,Toui machi e,Hiromi Yoshii,Female,3:21,108,Pf,Without drums
No. 76,M05,Tr. 12,Chikai,Kousuke Morimoto,Male,3:50,70,Gt,Without drums
No. 77,M05,Tr. 13,Aishiteru,Makiko Hattori,Female,3:56,120,Gt & Pf,Without drums
No. 78,M05,Tr. 14,Kumo,Masaki Kuehara,Male,4:21,75,Gt,Without drums
No. 79,M05,Tr. 15,Together,Tomomi Ogata,Female,4:28,92,,Without drums
No. 80,M05,Tr. 16,Sagashimono,Tomomi Ogata,Female,3:39,80,,Without drums
No. 81,M06,Tr. 01,How Deep Is Your Love?,Donna Burke,Female [English],3:50,90,Gt,Drum sequences
No. 82,M06,Tr. 02,Once in a life time,Shinya Iguchi,Male [English],5:37,88,Gt & Dr,Live drums
No. 83,M06,Tr. 03,Doing That Thing,Jeff Manning,Male [English],3:37,140,Gt,Drum sequences
No. 84,M06,Tr. 04,Someday,Shinya Iguchi,Male [English],4:40,138,Gt & Dr,Live drums
No. 85,M06,Tr. 05,Waiting for the moment,Jeff Manning,Male [English],3:30,98,Gt,Drum sequences
No. 86,M06,Tr. 06,Angel Baby,Betty,Female [English],4:17,80,Gt,Drum sequences
No. 87,M06,Tr. 07,I think of you,Jeff Manning,Male [English],4:51,90,Gt,Drum sequences
No. 88,M06,Tr. 08,Woman Like You,Shinya Iguchi,Male [English],4:09,120,Gt & Dr,Live drums
No. 89,M06,Tr. 09,Life Is What You Make It To Be,Donna Burke,Female [English],3:43,134,Gt,Drum sequences
No. 90,M06,Tr. 10,Don't Say Good bye,Shinya Iguchi,Male [English],4:37,127,Gt & Dr,Live drums
No. 91,M07,Tr. 01,Change Of Heart,Donna Burke,Female [English],3:43,124,Gt,Drum sequences
No. 92,M07,Tr. 02,I'll be there for you,Betty,Female [English],3:40,134,,Drum sequences
No. 93,M07,Tr. 03,Sweet Dreams,Donna Burke,Female [English],4:18,90,Gt,Drum sequences
No. 94,M07,Tr. 04,Life,Betty,Female [English],3:43,78,Gt,Drum sequences
No. 95,M07,Tr. 05,Feel,Jeff Manning,Male [English],3:52,161,Gt & Bs & Dr,Live drums
No. 96,M07,Tr. 06,Weekend,Betty,Female [English],4:40,130,,Drum loops
No. 97,M07,Tr. 07,Don't Lie To Me,Donna Burke,Female [English],4:03,91,Gt,Drum sequences
No. 98,M07,Tr. 08,31 BLUES,Jeff Manning,Male [English],3:31,107,Gt & Bs & Dr,Live drums
No. 99,M07,Tr. 09,Once and For All,Shinya Iguchi,Male [English],5:17,73,Gt,Without drums
No. 100,M07,Tr. 10,No Regrets,Shinya Iguchi,Male [English],4:53,80,Gt,Drum loops
197 changes: 197 additions & 0 deletions parsers/rwcpopmelody_parser.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
#!/usr/bin/env python
"""
Translates the RWC-POP SMF (Synchronized Midi File) annotations to a set of
JAMS files, keeping only the melody track from each MIDI file.

The original dataset is described online at the following URL:
https://staff.aist.go.jp/m.goto/RWC-MDB/rwc-mdb-p.html

To parse the entire dataset, you just need to provide the path to the folder
containing the SMF files, with filenames in the form RM-P[0-9]{3}*.MID from
RM-P001*.MID to RM-P100*.MID

Example:
./rwcpopmelody_parser.py ~/AIST.RWC-MDB-P-2001.SMF_SYNC -o ~/RWCPOP_melody_jams/

"""

__author__ = "J. Salamon"
__copyright__ = "Copyright 2016, Music and Audio Research Lab (MARL)"
__license__ = "GPL"
__version__ = "1.0"
__email__ = "justin.salamon@nyu.edu"

import argparse
import logging
import os
import time
import midi
import pretty_midi
import pandas as pd

import jams


def fill_file_metadata(jam, metadata, n_track):
"""Fills the global metada into the JAMS jam."""
jam.file_metadata.artist = metadata['Artist (Vocal)'][n_track]
jam.file_metadata.title = metadata['Title'][n_track]

d_str = metadata['Length'][n_track]
jam.file_metadata.duration = (float(d_str.split(":")[0]) * 60 +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is pretty dirty. Since we're already using pandas here, we can do something like this instead:

In [4]: pd.Timedelta('0:2:34').total_seconds()
Out[4]: 154.0

(only problem is that you'll have to pad in hours, but i generally prefer that to reinventing second-parsing.)

float(d_str.split(":")[1]))

# Store remaining RWC metadata in the sandbox
sandbox_dict = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems a little overwrought ; why not just dump all metadata into the sandbox?

Or copy the dict, pop out the ones you don't want, and push the rest?

'Piece No.': metadata['Piece No.'][n_track],
'Cat. Suffix': metadata['Cat. Suffix'][n_track],
'Tr. No.': metadata['Tr. No.'][n_track],
'Singer Information': metadata['Singer Information'][n_track],
'Tempo': metadata['Tempo'][n_track],
'Live Instruments Used': metadata['Live Instruments Used'][n_track],
'Drum Information': metadata['Drum Information'][n_track]}

jam.sandbox.update(**sandbox_dict)


def fill_annotation_metadata(annot):
"""Fills the annotation metadata."""
annot.annotation_metadata.corpus = "RWC Music Database: Popular Music"
annot.annotation_metadata.version = "1.0"
annot.annotation_metadata.annotation_tools = ""
annot.annotation_metadata.annotation_rules = ""
annot.annotation_metadata.validation = ""
annot.annotation_metadata.data_source = ""
annot.annotation_metadata.curator = jams.Curator(name="Masataka Goto",
email="m.goto@aist.go.jp")
annot.annotation_metadata.annotator = {}


def create_jams(smf_file, out_file, metadata):
"""
Creates a JAMS file from an RWC POP smf file (RM-P*.MID).
Note: only the notes of the MELODY track are kept!
"""

# Load midi file
m = midi.read_midifile(smf_file)

# This will store the relevant MIDI tracks
melody = []

# Track 0 is metadata that we need
melody.append(m[0])

# Collect track text (convert to lower case and remove spaces)
track_text = ["".join(e.text.lower().split()) for t in m
for e in t if isinstance(e, midi.TrackNameEvent)]

assert len(track_text) == len(m)

# Find the melody track: it should START with "melo", or if that's not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a standard convention? Or would it make sense to abstract this out and let a user specify a regular expression to search for instead? I like '/melo|voca/' more than hard-wired logic here.

# there then "voca" (for vocal).
track_text = [t[:4] for t in track_text]
if 'melo' in track_text:
index = track_text.index('melo')
elif 'voca' in track_text:
index = track_text.index('voca')
else:
print("ABORTING TRACK: couldn't find melody for: %s" % smf_file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switch away from old-style string formatting please. :)

"ABORTING .... {}".format(smf_file).

return 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to raise exceptions than to have specialized return values.


melody.append(m[index])

# Create new midi pattern with just these tracks
m = midi.Pattern(tracks=melody, resolution=m.resolution,
format=m.format, tick_relative=m.tick_relative)

# Write temporary midi file
temp_midi = out_file.replace(".jams", ".temp.mid")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we improve this hack? Preferably via one of the following:

  • Use a properly secure temporary filename
  • Skip tempfiles altogether and use a StringIO/BytesIO. I'm not sure if midi and pretty_midi support that (but they should).

Finally, If any of those asserts below fail, then this will leave temp files on disk. A better pattern here would be to wrap this all in a try block, and put the tempfile cleanup in a finally clause. That way, even if the asserts fail, the cleanup still executes.

midi.write_midifile(temp_midi, m)

# Load temp midi file using pretty_midi
pm = pretty_midi.PrettyMIDI(temp_midi)

assert len(pm.instruments) == 2
assert pm.instruments[0].notes == []
assert len(pm.instruments[1].notes) > 0

# Get the melody notes
notes = pm.instruments[1].notes

# Create jam
jam = jams.JAMS()

# Create annotation
midi_ann = jams.Annotation('pitch_midi')

# Add notes to the annotation
for note in notes:
midi_ann.append(time=note.start, duration=(note.end-note.start),
value=note.pitch, confidence=1.)

# Fill annotation metadata
fill_annotation_metadata(midi_ann)

# Add annotation to jam
jam.annotations.append(midi_ann)

# Fill file metadata
n_track = int(os.path.basename(smf_file)[4:7]) - 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems magical. What is going on here? Why are characters 2--5 special?

fill_file_metadata(jam, metadata, n_track)

# Save JAMS
jam.save(out_file)

# Remove temporary midi file
os.remove(temp_midi)


def process_folder(smf_dir, out_dir):
"""Converts the original SMF annotations into the JAMS format (keeping only
the notes of the melody track), and saves them in the out_dir folder."""

# Collect all SMF annotations.
smf_files = jams.util.find_with_extension(smf_dir, '.MID', depth=1)

# Get metadata
metadata = pd.read_csv('data/rwcmelodypop/metadata.csv')

for smf in smf_files:

jams_file = (
os.path.join(out_dir,
os.path.basename(smf).replace('.MID', '.jams')))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will do strange things when you have inputs that have .MID multiple times. It would be better to splitext and join the desired extension.

jams.util.smkdirs(os.path.split(jams_file)[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easier to read as os.path.dirname(jams_file)

# Create a JAMS file for this track
create_jams(smf, jams_file, metadata)


def main():
"""Main function to convert the dataset into JAMS."""
parser = argparse.ArgumentParser(
description="Converts the RWC POP dataset to the JAMS format, keeping "
"only the notes of the melody track.",
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument("smf_dir",
action="store",
help="RWC POP SMF folder")
parser.add_argument("-o",
action="store",
dest="out_dir",
default="RWCPOP_melody_jams",
help="Output JAMS folder")
args = parser.parse_args()
start_time = time.time()

# Setup the logger
logging.basicConfig(format='%(asctime)s: %(message)s', level=logging.INFO)

# Run the parser
process_folder(args.smf_dir, args.out_dir)

# Done!
logging.info("Done! Took %.2f seconds.", time.time() - start_time)

if __name__ == '__main__':
main()