This repository contains code and data related to the paper "Repetition Structure Inference with Formal Prototypes".
The code for extracting the source data is written in Python
and can be found in generate-data
.
The dependencies are listed there in requirements.txt
.
The inference code is written in Julia and can be found in inference
.
Dependencies are listed in a Julia project file,
so you can install them by running Julia in the directory,
enter package mode by hitting ]
and running
(Julia) pkg> activate .
(inference) pkg> instantiate
The inference code uses a solver, which by default is set to the commercial solver Gurobi.
To run the code, you can get an academic license.
Alternatively, you can use a different solver supported by the
JuMP framework,
but that will require some changes to the code in optimize_grammar.jl
(see comments)
and will likely be slower.
The plot notebook requires IJulia
,
a Jupyter kernel for Julia.
You can run all or only some of the following steps, as the intermediate results are already stored in the repository
- Extract the data from the source collection:
$ cd generate-data $ python extract-essen.py ../data/essen.tsv
- Run the experiments:
$ cd inference $ julia -L experiments.jl
- compute minimal grammars for Essen melodies:
The above line runs on the 3 shortest tunes, if you want to run the full experiment as in the paper, use
julia> run_long_experiment(3)
298
instead of3
(this will take several days). The code will save the ruleset for every piece indata/melodies/rulesets/
and the inferred grammra indata/melodies/grammars/
. - run Monte-Carlo minimization:
This will take the same melodies as processed in the next step (based on the saved results in
julia> run_monte_carlo()
data/melodies/grammars
), estimate a minimal grammar by sampling random grammars and picking the smallest one, and write the resulting grammar todata/melodies/mc_grammars
. By default, 10,000 grammars are sampled, butrun_monte_carlo()
takes a parameter to control this. - Generate the plots:
Run the notebook
inference/plots.ipynb
. The resulting plots are stored inplots/
.
- compute minimal grammars for Essen melodies:
The code is not designed as a library and rather specialized
to the expirements in the paper.
If you would like to reuse some of the code,
take a look at run_examples()
and run_long_experiment()
in inference/experiments.jl
for some inspiration.
If you are interested in packaging the functionality into a library, please get in touch!
The repo contains the following code and data:
generate-data/
: code for extracting the datasets (Python)extract-essen.py
: converting the Essen Folksong Collection to tsvextract-mtc.py
: converting the Meertens Tune Collection to tsv (not used in the paper)requirements.txt
: Python dependencies
inference/
: code for grammar inference (Julia)Manifest.toml
,Project.toml
: Julia dependenciesparser.jl
: parsing sequences into rulesetsoptimize_grammar.jl
: inferring minimal grammars from rulesetsexperiments.jl
: the experiments described in the paperplotting.jl
: helper code for plottingplots.ipynb
: notebook for generating the plots and analyses shown in the paper
data/
: data and resultsessen.tsv
: Essen Folksong Collection as note listsmtc.tsv
: Meertens Tune Collection melodies as note lists (not used)[grammar_]{xxy,xxyx,example1,example2}.json
: rulesets and grammars for example pieces.melodies/
: computed rulesets and grammars for short tunes from the Essen corpusrulesets/
: computed rulesetsgrammars/
: optimal grammarsmc_grammars/
: Monte-Carlo minimized grammars
plots/
: plots for the paper