Operon is a modern C++ framework for symbolic regression that uses genetic programming to explore a hypothesis space of possible mathematical expressions in order to find the best-fitting model for a given regression target. Its main purpose is to help develop accurate and interpretable white-box models in the area of system identification. More in-depth documentation available at https://operongp.readthedocs.io/.
Broadly speaking, genetic programming (GP) is said to evolve a population of "computer programs" ― AST-like structures encoding behavior for a given problem domain ― following the principles of natural selection. It repeatedly combines random program parts keeping only the best results ― the "fittest". Here, the biological concept of fitness is defined as a measure of a program's ability to solve a certain task.
In symbolic regression, the programs represent mathematical expressions typically encoded as expression trees. Fitness is usually defined as goodness of fit between the dependent variable and the prediction of a tree-encoded model. Iterative selection of best-scoring models followed by random recombination leads naturally to a self-improving process that is able to uncover patterns in the data:
The project requires CMake and a C++17 compliant compiler (C++20 if you're on the cpp20
branch). The recommended way to build Operon is via either nix or vcpkg.
Check out https://github.com/heal-research/operon/blob/master/BUILDING.md for detailed build instructions and how to enable/disable certain features.
First, you have to install nix and enable flakes. For a portable install, see nix-portable.
To create a development shell:
nix develop github:heal-research/operon --no-write-lock-file
To build Operon (a symlink to the nix store called result
will be created).
nix build github:heal-research/operon --no-write-lock-file
Select the build generator appropriate for your system and point CMake to the vcpkg.cmake
toolchain file
cmake -S . -B build -G "Visual Studio 16 2019" -A x64 \
-DCMAKE_TOOLCHAIN_FILE=..\vcpkg\scripts\buildsystems\vcpkg.cmake \
-DVCPKG_OVERLAY_PORTS=.\ports
The file CMakePresets.json
contains some presets that you may find useful. For using clang-cl
instead of cl
, pass -TClangCL
to the above (official documentation).
Python bindings for the Operon library are available as a separate project: PyOperon, which also includes a scikit-learn compatible regressor.
If you find Operon useful you can cite our work as:
@inproceedings{10.1145/3377929.3398099,
author = {Burlacu, Bogdan and Kronberger, Gabriel and Kommenda, Michael},
title = {Operon C++: An Efficient Genetic Programming Framework for Symbolic Regression},
year = {2020},
isbn = {9781450371278},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3377929.3398099},
doi = {10.1145/3377929.3398099},
booktitle = {Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion},
pages = {1562–1570},
numpages = {9},
keywords = {symbolic regression, genetic programming, C++},
location = {Canc\'{u}n, Mexico},
series = {GECCO '20}
}
Operon was also featured in a recent survey of symbolic regression methods, where it showed good results:
@article{DBLP:journals/corr/abs-2107-14351,
author = {William G. La Cava and
Patryk Orzechowski and
Bogdan Burlacu and
Fabr{\'{\i}}cio Olivetti de Fran{\c{c}}a and
Marco Virgolin and
Ying Jin and
Michael Kommenda and
Jason H. Moore},
title = {Contemporary Symbolic Regression Methods and their Relative Performance},
journal = {CoRR},
volume = {abs/2107.14351},
year = {2021},
url = {https://arxiv.org/abs/2107.14351},
eprinttype = {arXiv},
eprint = {2107.14351},
timestamp = {Tue, 03 Aug 2021 14:53:34 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2107-14351.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}