Jointist

Jointist is a joint-training framework capable of:

Instrument Recogition
Multi-Instrument Transcription
Music Source Separation

Setup

This code is developed using the docker image nvidia/cuda:10.2-devel-ubuntu18.04 and python version 3.8.10.

To setup the environment for joinist, install the dependies

pip install -r requirements.txt

If you get OSError: sndfile library not found, you need to install libsndfile1 using

apt install libsndfile1

The pretrained model weights can be download from dropbox. Put the model weights under the weights folder after downloading.

The example songs for interference is included in this repo as songs.zip.

After unzipping it using the following command, a new folder called songs will be created.

unzip songs.zip

Inference

a. Instrument Recognition + Transcription

The following script detects the instrument in the song and transcribe the instruments detected:

python pred_jointist.py audio_path=songs audio_ext=mp3 gpus=[0]

It will first run a instrument recognition model, and the predicted instruments are used as the conditions to the transcription model.

If you have multiple GPUs, the argument gpus controls which GPU to use. For example, if you want to use GPU:2, then you can do gpus=[2].

The audio_path specifies the path to the input audio files. If your audio files are not in .mp3 format, you can change the audio_ext argument to the audio format of your songs. Since we use torchaudio.load to load audio files, you can used any audio format as long as it is supported by torchaudio.load.

The output MIDI files will be stored inside the outputs/YYYY-MM-DD/HH-MM-SS/MIDI_output folder.

Model weights can be changed under checkpoint of End2End/config/jointist_inference.yaml.

transcription1000.ckpt is the model trained only on the transcription task.
tseparation.ckpt is the model weight jointly trained with both transcription and source separation tasks.

b. Instrument Recognition + Transcription + Source Separation

The following inference script performs instrument detection, transcription, and source separation:

python pred_jointist_ss.py audio_path=songs audio_ext=mp3 gpus=[0]

Same as above, the output MIDI files will be stored inside the outputs/YYYY-MM-DD/HH-MM-SS/MIDI_output folder.

Model weights can be changed under checkpoint of End2End/config/jointist_ss_inference.yaml. tseparation.ckpt is the checkpoint with a better transcription F1 sources and source separation SDR after training both of them end2end.

Implementational details for Jointist is avaliable here

Using individual pretrained models

Transcription

python pred_transcription.py datamodule=wild

Currently supported datamodule:

wild
h5
slakh The configuration such as path and audio_ext for each datamodule can be modified inside End2End/config/datamoudle/xxx.yaml

Training

Instrument Recognition

python train_detection.py detection=CombinedModel_NewCLSv2 datamodule=slakh epoch=50 gpus=4 every_n_epochs=2

detection: controls the model type detection/backbone: controls which CNN backbone to use datamodule: controls which dataset to use (openmic2018/slakh). It affects the instrument mappings.

Please refer to End2End/config/detection_config.yaml for more configuration parameters

Transcrpition

python train_transcription.py transcription.backend.acoustic.type=CNN8Dropout_Wide inst_sampler.mode=imbalance inst_sampler.samples=2 inst_sampler.neg_samples=2 inst_sampler.temp=0.5 inst_sampler.audio_noise=0 gpus=[0] batch_size=2

transcription.backend.acoustic.type: controls the model type inst_sampler.mode=imbalance: controls which sampling mode to use inst_sampler.samples: controls how many positive samples to be mined for training inst_sampler.neg_samples: controls how many negative samples to be mined for training inst_sampler.temp: sampling temperature, only effective when using imbalance sampling inst_sampler.audio_noise: controls if random noise should be added to the audio during training gpus: controls which gpus to use. [0] means using cuda:0; [2] means using cuda:2; [0,1,2,3] means using four gpus cuda:0-3

Please refer to End2End/config/transcription_config.yaml for more configuration parameters

End2end training (Jointist)

python train_jointist.py

Experiments

link

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
End2End		End2End
weights		weights
.gitattributes		.gitattributes
.gitignore		.gitignore
GPU_debug.py		GPU_debug.py
README.md		README.md
SS_visualization.ipynb		SS_visualization.ipynb
create_slakh2100.py		create_slakh2100.py
evaluate_end2end_Filter.py		evaluate_end2end_Filter.py
experiments.md		experiments.md
f1.ipynb		f1.ipynb
inst_wise.png		inst_wise.png
jointist_explanation.md		jointist_explanation.md
model_fig.png		model_fig.png
openmic_dataprocessing.sh		openmic_dataprocessing.sh
piece_wise.png		piece_wise.png
pkl2pianoroll.py		pkl2pianoroll.py
pkl2pianoroll_MSD.py		pkl2pianoroll_MSD.py
pkl2pianoroll_MSD.sh		pkl2pianoroll_MSD.sh
pkl2pianoroll_MTAT.py		pkl2pianoroll_MTAT.py
pkl2sparsepianoroll_MSD.py		pkl2sparsepianoroll_MSD.py
pred_detection.py		pred_detection.py
pred_jointist.py		pred_jointist.py
pred_jointist_ss.py		pred_jointist_ss.py
pred_transcription.py		pred_transcription.py
requirements.txt		requirements.txt
roll_convert2channel.py		roll_convert2channel.py
roll_convert_sparse.py		roll_convert_sparse.py
slakh2100_dataprocessing.sh		slakh2100_dataprocessing.sh
songs.zip		songs.zip
test_detection.py		test_detection.py
test_jointist.py		test_jointist.py
test_openmic_DETR_Hungarian.py		test_openmic_DETR_Hungarian.py
test_separation.py		test_separation.py
test_transcription.py		test_transcription.py
test_tseparation.py		test_tseparation.py
train_detection.py		train_detection.py
train_jointist.py		train_jointist.py
train_separation.py		train_separation.py
train_transcription.py		train_transcription.py
train_tseparation.py		train_tseparation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

Jointist

Setup

Inference

a. Instrument Recognition + Transcription

b. Instrument Recognition + Transcription + Source Separation

Using individual pretrained models

Transcription

Training

Instrument Recognition

Transcrpition

End2end training (Jointist)

Experiments

About

Releases

Packages

Languages

AMAAI-Lab/Jointist

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Jointist

Setup

Inference

a. Instrument Recognition + Transcription

b. Instrument Recognition + Transcription + Source Separation

Using individual pretrained models

Transcription

Training

Instrument Recognition

Transcrpition

End2end training (Jointist)

Experiments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages