Code of the paper Impact of time and note duration tokenizations on deep learning symbolic music modeling. (ISMIR 2023)
In this work, we analyze the current common tokenization methods and experiment with time and note duration representations. We compare the performance of these two impactful criteria on several tasks, including composer classification, emotion classification, music generation, and sequence representation.
pip install -r requirements
to install requirementssh scripts/download_datasets.sh
to download the POP909 and EMOPIA datasets;- Download the GiantMIDI dataset and put it in
data/
python scripts/tokenize_datasets.py
to tokenize data and learn BPEpython exp_generation.py
to train generative models and generate resultspython exp_pretrain.py
to pretrain classification and contrastive modelspython exp_cla_finetune.py
to train classification models and test thempython exp_contrastive.py
to train contrastive models and test them