This repository contains code for researchers, regulators or practitioners who wish to use financial-exchange message data to quantify latency arbitrage and study other aspects of speed-sensitive trading, following Matteo Aquilina, Eric Budish and Peter O’Neill, “Quantifying the High-Frequency Trading ‘Arms Race’”, Quarterly Journal of Economics, 2021 (hereafter, “ABO”). The Python code processes the user's message data, detects trading races, and outputs a race-level statistical dataset along with complementary trading data. The R code produces race summary statistics, tables and figures analogous to all reported results in ABO. We also provide a small artificial data set that can be used to understand the data structure and to test one's configuration.
This code should be used in conjunction with the detailed documentation linked below.
Version 1.1 (November 2021). Please visit https://github.com/ericbudish/HFT-Races to check for updates.
Please refer to ABO Code and Data Appendix for detailed documentation and instructions.
- Download and unzip this repository. Obtain the exchange message data and pre-process the data following Sections 2 and 3 of the ABO Code and Data Appendix. You may want to use the artificial dataset provided here to test your configuration first.
- Decide on the set of race detection parameters as instructed in Section 4 of the appendix. You can use the sample race detection parameters provided here for testing.
- Make sure Python 3 is installed on your MacOS/Linux system. For Windows users, we recommend using the Windows Subsystem for Linux. Please refer to Section 5 of the appendix for a complete guide on computational environment setup. Install Pandas and Numpy using the following command if you have not.
pip install -r requirements.txt # Install the required Python packages
- Specify the parameters and file paths in
process_msg_data_main.py
and run the script in the console using the command below. To test your configuration, set the file paths accordingly to use the artificial test data provided here. Please refer to Section 6 of the ABO Code and Data Appendix.
nohup python3 path/to/code/process_msg_data_main.py > path/to/main/log/run.log 2>&1&
- Specify the parameters and file paths in
race_detection_main.py
and run the script in the console using the command below. To test your configuration, set the file paths accordingly to use the artificial test data provided here. Please refer to Section 7 of the ABO Code and Data Appendix.
nohup python3 path/to/code/race_detection_main.py > path/to/main/log/run.log 2>&1&
- Use
log_monitor.py
to make sure all data have been processed. Follow the instructions inlog_monitor.py
and execute the code interactively or in a console.
python3 path/to/code/log_monitor.py
- Specify the parameters and file paths in
MainResults.R
andSensitivity.R
and source the scripts. If you have multiple sets of race detection parameters, you may need to repeat this step multiple times for different sets of race detection results.
Rscript /path/to/MainResults.R
Rscript /path/to/RCode.R
We would be grateful for feedback or comments on the code, and are especially eager to hear from early users. Please address comments, questions, and any other feedback to eric.budish@chicagobooth.edu and hft.races.code.package@gmail.com.
Code credits: Jiahao Chen, Natalia Drozdoff, Matthew O'Keefe, Jaume Vives, Zizhe Xia, Matteo Aquilina, Eric Budish, Peter O'Neill
Documentation credits: Jiahao Chen, Zizhe Xia, Matteo Aquilina, Eric Budish, Peter O'Neill
The Python code and documentation are licensed under the BSD 3-Clause license. License is available here.
The R code is licensed under the GNU General Public License version 3 due to the use of GNU GPL licensed R packages. License is available here