Skip to content
/ tsqsim Public

Time Series Quick Simulator - able to perform time series analysis and to setup validation experiments.

License

Notifications You must be signed in to change notification settings

mj-xmr/tsqsim

Repository files navigation

tsqsim

Time Series Quick Simulator - able to perform time series analysis and to setup validation experiments. With its somewhat limited plotting capabilities, but highly optimized run time speed, the simulator serves more as a stress-tester of your models, challenging their robustness, rather than a pattern discovery tool. The assumption is that some preliminary research has already been done, using scripting languages like Python, R or Weka, where patterns are easy to eyball. Teaching a machine to detect these found patterns automatically in rigorous conditions is where the tsqsim shows its true potential.

Background

Time Series Analysis (TSA) deals with finding patterns in Time Series data, which can be just anything that changes across time and depends on values from the previous discrete time points. The patterns can be used to build a model of the data, that can be in turn used to predict future data points up to a certain confidence, which decreases gradually, as the requested prediction horizon expands.

In order to understand better what the project does, or the TSA in general, I recommend watching this playlist.

The dangers of modeling

As with just every model for every single task, it's much easier to optimize a model to work great for the data known at the moment of optimization (in-sample), rather than to make it generic and work even just satisfactory enough on new, yet unknown data (out-of-sample). The problem is known as overfitting and is, or at least should be, the most important concern of every data scientist. In order to mitigate the problem and help you create generic models, the project employs Walk Forward Optimization (or Validation), which simulates the lack of future data at the exact discrete moment of trying to find the optimal parameters, as well as Monte Carlo simulation of alternative scenarios, generated randomly, yet still based on the original data. The combination of these two methods serves as the ultimate stress-test method. Remember, that the observed history is just the single story line out of many possibilities, that could have happened. What we observe is simply something that just happened to agree to collapse into one story line, based on the probabilities attached to the set of interacting events in question, including those with low probability but impact so high, that they can break the (almost) agreed story line. If something dramatic could have happened, but was avoided only because of sheer luck, it will happen in future in an altered constellation, given enough time. You'd best be ready by fortifying your model (through simulation) against these dramatic events and their alterations. Please refer to "Fooled by Randomness" by Nassim Nicholas Taleb if you are interested in learning such concepts.

Requirements

  • The console simulator should be compilable and run fast enough on almost any OS, where a POSIX C++ compiler is available.
  • About 1.5 GB of RAM is expected for the initial data serialization step.
  • Depending on the granuality of your data, the according data storage space is needed for the textual (CSV) input, as well as for the serialized binary data. Both types of data are stored compressed though and are being decompressed on the fly into memory, as they are needed.

Supported Operating Systems and features:

OS \ Feature CI gcc clang UT wx Qt Py R
Debian stable
Debian buster
Ubuntu 21.04
Ubuntu 20.04
Mac OSX 11
Mac OSX 10.15
Windows

Glossary:

Optional components:

  • UT
  • wx
  • Qt
  • Py
  • R

Quickstart

In case these instructions become outdated, please refer to the steps of the CI.

Preparation

Please run the below scripts. They are meant to be non-interactive and will require root permissions (via sudo). When in doubt, please view their contents with cat for an objective assessment of their functionalities.

git clone --recursive https://github.com/mj-xmr/tsqsim.git # Clone this repo (assuming it's not a fork)
cd tsqsim		# Enter the cloned repo's dir
./util/prep-env.sh	# Prepare the environment - downloads example data and creates useful symlinks
./util/deps-pull.sh	# Download the maintaned dependencies
./util/deps-build.sh	# Build and install the unmanaged dependencies (uses sudo for installation)

Building & running

./ci-default --run-demo	# Build and optionally run the demo
./ci-default --no-qt # Don't build the optional QT module
./ci-default -h 	# See all build options

The executables will be available under build/*/bin, where * depends on the choices you've just made above.

Rebuilding new versions

cd tsqsim             # Enter the cloned repo's dir
git fetch             # Get new tags
git checkout master   # Checkout master or a specific tag
git pull
git submodule update --remote; git submodule sync && git submodule update # Update the submodules
rm build/* -fr        # Sometimes it might be needed to clean the build dir, if the directory structure changes
./util/prep-env.sh
./util/deps-pull.sh
./util/deps-build.sh

Controlling the application

To learn all the app's options and additional information, from within the target build directory (build/*/bin) execute:

./tsqsim --help

Some of the options are able to be modified more conviniently through the wxConf application, accessible from the same directory (TODO: add more help):

./wxConf

The changes made in wxConf are being read by all of the remaining applications right after performing any change in the wxConf, without any need of confirmation of the changes.

Command line example

For example, to override the default discrete period and the ending year, the following can be executed:

./tsqsim --per h12 --max-year 2016  # Simulator
./tsqsim-qt --min-year 2015 --max-year 2016 --per h12   # QT data viewer

Any alterations performed via the CLI override the changes made in the wxConf app. In case of the QT app though, the CLI options overwrite the configurations permanently, which is a system limitation. They can be however regenerated at any time by the wxConf.

Controlling the WX configurator

Beside the usual application usage, please note, that it's very beneficial to use mouse scroll on the selection controls, like Months or Years, which eases their operation.

Controlling the QT data viewer

  • Mouse right click reloads the data. Useful after a configuration has changed via wxConf
  • Mouse scroll zooms in and out
  • Mouse drag with left click moves the viewpoint within the same dataset
  • Cursors left/right move the viewpoint left/right, loading new dataset
  • Cursors up/down scale up/down
  • Control resets the state of the app completely and returns to the initial view

Modifying the transformation script

The TS transformation script's path can be obtained via ./tsqsim --help, as well as it's currently available transformations. The script can modify the chain of transformations used by the executable, without the need for its recompilation.

Running R scripts

The tool delivers binding to the R's C interface, currently via PredictorRBaseline and PredictorRCustom. Their corresponding R scripts can be found in the directory static/scripts Before running the wrapped R scripts, two variables need to be exported from the shell, that is supposed to run the predictors. Under Linux:

export R_HOME=/usr/lib/R
export LD_LIBRARY_PATH=$R_HOME/lib
./tsqsim

and under OSX:

export R_HOME=/Library/Frameworks/R.framework/Resources
export LD_LIBRARY_PATH=$R_HOME/lib
./tsqsim

Python backends

tsqsim is able to wrap the Python 3rd party TSA frameworks. You may either write your own wrapper or use the already available ones.

Available python backends

The following Python backends are currently available:

Name installation script
statsmodels pip install darts scripts/py_darts.py
darts pip install statsmodels scripts/py_statsmodels.py

Extending python backends

TODO! Wrap predict and convert from dataframe and return a timeseries.

Development

For the development use case, it's recommended to turn on certain optimizations, that reduce the recompilation and linking time while changing the source code often. The optimizations are: dynamic linking (shared), precompiled headers (pch) and (optionally) a networked parallel compiler's wrapper, icecc.

A command, that would leverage these optimizations could look like the following:

./ci-default --shared --pch --compiler icecc -j 30

, where icecc is available only after setting it up in your LAN, and 30 would be the number of cores, that you want to use through icecc. If you declare more, than there are available, the icecc scheduler will throttle down your choice automatically. To spare yourself typing, I recommend adding the following aliases to your shell:

echo "alias tsqdev='./ci-default --shared --pch --compiler icecc -j 30'" >> ~/.bash_aliases
echo "alias tsqdev-dbg='tsqdev --debug'" >> ~/.bash_aliases
bash    # To reload the aliases and make them available in the current shell

Now you're able to use the aliases from the source directory via:

tsqdev

or, in case you need to debug:

tsqdev-dbg

Acknowledgments

The project uses a lot of code written by me in the previous 10+ years. I’m now decoupling it and making it reusable for generic purposes.

Example outputs

QT data viewer

image

Python-based QT app's alternative

  • Upper window: The interactive QT data viewer
  • Lower window: A very portable Python alternative, useful where QT is unavailable

image

Python-based (Partial) AutoCorrelation Function (ACF & PACF) plots

ACF of the original series:

image

ACF of the first difference of the series, exhibiting a statistically significant inverse correlation at lag 1:

image

... and so does the Partial AutoCorrelation Function (PACF):

image

Seasonal decomposition of the daily bars exhibits a strong seasonal pattern over the week:

image

wx Configurator

image

Console simulator

Reading script file: 'tsqsim-script.txt'
Script line: 'diff'
Script line: 'sqrts # Nice to have before logs'
Script line: 'logs'
Script line: 'add 0'

  Dickey-Fuller GLS Test Results
  ====================================
  Statistic                     -2.049
  P-value                        0.065
  Optimal Lags                       3
  Criterion                        AIC
  Trend                       constant
  ------------------------------------

  Test Hypothesis
  ------------------------------------
  H0: The process contains a unit root
  H1: The process is weakly stationary

  Critical Values
  ---------------
   1%      -2.754
   5%      -2.143
  10%      -1.838

  Test Conclusion
  ---------------
  We can reject H0 at the 10% significance level

Closes & transformation
                                                                             
   1.14 +----------------------------------------------------------------+   
        |       +       +       +        +       +       +       +****** |   
  1.135 |-+                                             * *       *    +-|   
   1.13 |-+      ****                                   ****     *     +-|   
        |      * * **                                  *   **    *       |   
  1.125 |-*   ***    ***                             ***     *   *     +-|   
        | ****       *  *    *      *** *     **     **      ** *        |   
   1.12 |**              *  * * ****   * *   *  *   *          *       +-|   
  1.115 |-+              * **  *         *****   * **                  +-|   
        |       +       + *     +        +       **      +       +       |   
   1.11 +----------------------------------------------------------------+   
        0       10      20      30       40      50      60      70      80  
                                                                             
                                                                             
    2 +------------------------------------------------------------------+   
  1.5 |*+*    +        +    *  +*  **   * ** ** +*     **  *     +   **+-|   
    1 |*+*    *  * *  ***   **  *  * * ** ** * ***   * ** * * *    * * +-|   
      |* *    *  * *  * *   **  *  * * ** ** *   *   * ** * * *    * *   |   
  0.5 |-**    *  * *  * *   **  *  * * ** ** *   *   * ** * * *    * * +-|   
    0 |-**    ** * ** * ** * * * **  * ** * **    *  *** ** * **   **  +-|   
 -0.5 |-* *  *** *****  ** * * * **   * **  **    * * ** *  ** *  * *  +-|   
   -1 |-* *  **** ****  **** * * **   * **  **    * * ** *  ** *  * *  +-|   
      | * ** ** * ** *  * ** * * **   * **  **    *** ** *  ** **** *    |   
 -1.5 |-+ ****+ * ** * +*    **+ *    * **      + * *   +*  **   ** *  +-|   
   -2 +------------------------------------------------------------------+   
      0       10       20      30       40      50      60       70      80  


EURUSD-d - Stats
  Mean            StdDev          URT           Samples
--------------------------------------------------------
-0.150           1.369          -1.251          25
 0.069           1.549          -5.128          25
-0.108           1.443          -4.204          27
--------------------------------------------------------
 0.219           106.6%         -2.049          77

StatsMedianSplit::Stats size = 12.5596
Stationarity score = -2.04925
2019.04 - 2019.06