Time Series Quick Simulator - able to perform time series analysis
and to setup validation experiments.
With its somewhat limited plotting capabilities,
but highly optimized run time speed, the simulator serves more as a stress-tester
of your models, challenging their robustness, rather than a pattern discovery tool.
The assumption is that some preliminary research has already been done,
using scripting languages like Python
, R
or Weka
, where patterns are easy to eyball.
Teaching a machine to detect these found patterns automatically in rigorous conditions
is where the tsqsim
shows its true potential.
Time Series Analysis (TSA) deals with finding patterns in Time Series data, which can be just anything that changes across time and depends on values from the previous discrete time points. The patterns can be used to build a model of the data, that can be in turn used to predict future data points up to a certain confidence, which decreases gradually, as the requested prediction horizon expands.
In order to understand better what the project does, or the TSA in general, I recommend watching this playlist.
As with just every model for every single task, it's much easier to optimize a model to work great for the data known at the moment of optimization (in-sample), rather than to make it generic and work even just satisfactory enough on new, yet unknown data (out-of-sample). The problem is known as overfitting and is, or at least should be, the most important concern of every data scientist. In order to mitigate the problem and help you create generic models, the project employs Walk Forward Optimization (or Validation), which simulates the lack of future data at the exact discrete moment of trying to find the optimal parameters, as well as Monte Carlo simulation of alternative scenarios, generated randomly, yet still based on the original data. The combination of these two methods serves as the ultimate stress-test method. Remember, that the observed history is just the single story line out of many possibilities, that could have happened. What we observe is simply something that just happened to agree to collapse into one story line, based on the probabilities attached to the set of interacting events in question, including those with low probability but impact so high, that they can break the (almost) agreed story line. If something dramatic could have happened, but was avoided only because of sheer luck, it will happen in future in an altered constellation, given enough time. You'd best be ready by fortifying your model (through simulation) against these dramatic events and their alterations. Please refer to "Fooled by Randomness" by Nassim Nicholas Taleb if you are interested in learning such concepts.
- The console simulator should be compilable and run fast enough on almost any OS, where a POSIX C++ compiler is available.
- About 1.5 GB of RAM is expected for the initial data serialization step.
- Depending on the granuality of your data, the according data storage space is needed for the textual (CSV) input, as well as for the serialized binary data. Both types of data are stored compressed though and are being decompressed on the fly into memory, as they are needed.
Supported Operating Systems and features:
OS \ Feature | CI | gcc | clang | UT | wx | Qt | Py | R |
---|---|---|---|---|---|---|---|---|
Debian stable | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
Debian buster | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
Ubuntu 21.04 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
Ubuntu 20.04 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
Mac OSX 11 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
Mac OSX 10.15 | ✓ | ✓ | ✓ | ✓ | ✓ | |||
Windows | ✓ | ✓ | ✓ | ✓ |
Glossary:
- CI = Continuous Integration
- gcc & clang = C/C++ compilers
- UT = Unit Tests
- wx = wxWidgets-based configuration application
- Qt = Qt application (data viewer)
- Py = Python alternative to the QT app
- R = bindings to the "R" statistical framework
Optional components:
- UT
- wx
- Qt
- Py
- R
In case these instructions become outdated, please refer to the steps of the CI.
Please run the below scripts. They are meant to be non-interactive and will require root permissions (via sudo
).
When in doubt, please view their contents with cat
for an objective assessment of their functionalities.
git clone --recursive https://github.com/mj-xmr/tsqsim.git # Clone this repo (assuming it's not a fork)
cd tsqsim # Enter the cloned repo's dir
./util/prep-env.sh # Prepare the environment - downloads example data and creates useful symlinks
./util/deps-pull.sh # Download the maintaned dependencies
./util/deps-build.sh # Build and install the unmanaged dependencies (uses sudo for installation)
./ci-default --run-demo # Build and optionally run the demo
./ci-default --no-qt # Don't build the optional QT module
./ci-default -h # See all build options
The executables will be available under build/*/bin
, where *
depends on the choices you've just made above.
cd tsqsim # Enter the cloned repo's dir
git fetch # Get new tags
git checkout master # Checkout master or a specific tag
git pull
git submodule update --remote; git submodule sync && git submodule update # Update the submodules
rm build/* -fr # Sometimes it might be needed to clean the build dir, if the directory structure changes
./util/prep-env.sh
./util/deps-pull.sh
./util/deps-build.sh
To learn all the app's options and additional information, from within the target build directory (build/*/bin
) execute:
./tsqsim --help
Some of the options are able to be modified more conviniently through the wxConf
application, accessible from the same directory (TODO: add more help):
./wxConf
The changes made in wxConf
are being read by all of the remaining applications right after performing any change in the wxConf
, without any need of confirmation of the changes.
For example, to override the default discrete period and the ending year, the following can be executed:
./tsqsim --per h12 --max-year 2016 # Simulator
./tsqsim-qt --min-year 2015 --max-year 2016 --per h12 # QT data viewer
Any alterations performed via the CLI override the changes made in the wxConf
app. In case of the QT app though, the CLI options overwrite the configurations permanently, which is a system limitation. They can be however regenerated at any time by the wxConf
.
Beside the usual application usage, please note, that it's very beneficial to use mouse scroll on the selection controls, like Months or Years, which eases their operation.
- Mouse right click reloads the data. Useful after a configuration has changed via
wxConf
- Mouse scroll zooms in and out
- Mouse drag with left click moves the viewpoint within the same dataset
- Cursors left/right move the viewpoint left/right, loading new dataset
- Cursors up/down scale up/down
- Control resets the state of the app completely and returns to the initial view
The TS transformation script's path can be obtained via ./tsqsim --help
, as well as it's currently available transformations. The script can modify the chain of transformations used by the executable, without the need for its recompilation.
The tool delivers binding to the R's C interface, currently via PredictorRBaseline
and PredictorRCustom
. Their corresponding R scripts can be found in the directory static/scripts
Before running the wrapped R scripts, two variables need to be exported from the shell, that is supposed to run the predictors. Under Linux:
export R_HOME=/usr/lib/R
export LD_LIBRARY_PATH=$R_HOME/lib
./tsqsim
and under OSX:
export R_HOME=/Library/Frameworks/R.framework/Resources
export LD_LIBRARY_PATH=$R_HOME/lib
./tsqsim
tsqsim
is able to wrap the Python 3rd party TSA frameworks. You may either write your own wrapper or use the already available ones.
The following Python backends are currently available:
Name | installation | script |
---|---|---|
statsmodels | pip install darts | scripts/py_darts.py |
darts | pip install statsmodels | scripts/py_statsmodels.py |
TODO! Wrap predict and convert from dataframe and return a timeseries.
For the development use case, it's recommended to turn on certain optimizations, that reduce the recompilation and linking time while changing the source code often. The optimizations are: dynamic linking (shared), precompiled headers (pch) and (optionally) a networked parallel compiler's wrapper, icecc.
A command, that would leverage these optimizations could look like the following:
./ci-default --shared --pch --compiler icecc -j 30
, where icecc
is available only after setting it up in your LAN, and 30
would be the number of cores, that you want to use through icecc. If you declare more, than there are available, the icecc scheduler will throttle down your choice automatically.
To spare yourself typing, I recommend adding the following aliases to your shell:
echo "alias tsqdev='./ci-default --shared --pch --compiler icecc -j 30'" >> ~/.bash_aliases
echo "alias tsqdev-dbg='tsqdev --debug'" >> ~/.bash_aliases
bash # To reload the aliases and make them available in the current shell
Now you're able to use the aliases from the source directory via:
tsqdev
or, in case you need to debug:
tsqdev-dbg
The project uses a lot of code written by me in the previous 10+ years. I’m now decoupling it and making it reusable for generic purposes.
- Upper window: The interactive QT data viewer
- Lower window: A very portable Python alternative, useful where QT is unavailable
ACF of the original series:
ACF of the first difference of the series, exhibiting a statistically significant inverse correlation at lag 1:
... and so does the Partial AutoCorrelation Function (PACF):
Seasonal decomposition of the daily bars exhibits a strong seasonal pattern over the week:
Reading script file: 'tsqsim-script.txt'
Script line: 'diff'
Script line: 'sqrts # Nice to have before logs'
Script line: 'logs'
Script line: 'add 0'
Dickey-Fuller GLS Test Results
====================================
Statistic -2.049
P-value 0.065
Optimal Lags 3
Criterion AIC
Trend constant
------------------------------------
Test Hypothesis
------------------------------------
H0: The process contains a unit root
H1: The process is weakly stationary
Critical Values
---------------
1% -2.754
5% -2.143
10% -1.838
Test Conclusion
---------------
We can reject H0 at the 10% significance level
Closes & transformation
1.14 +----------------------------------------------------------------+
| + + + + + + +****** |
1.135 |-+ * * * +-|
1.13 |-+ **** **** * +-|
| * * ** * ** * |
1.125 |-* *** *** *** * * +-|
| **** * * * *** * ** ** ** * |
1.12 |** * * * **** * * * * * * +-|
1.115 |-+ * ** * ***** * ** +-|
| + + * + + ** + + |
1.11 +----------------------------------------------------------------+
0 10 20 30 40 50 60 70 80
2 +------------------------------------------------------------------+
1.5 |*+* + + * +* ** * ** ** +* ** * + **+-|
1 |*+* * * * *** ** * * * ** ** * *** * ** * * * * * +-|
|* * * * * * * ** * * * ** ** * * * ** * * * * * |
0.5 |-** * * * * * ** * * * ** ** * * * ** * * * * * +-|
0 |-** ** * ** * ** * * * ** * ** * ** * *** ** * ** ** +-|
-0.5 |-* * *** ***** ** * * * ** * ** ** * * ** * ** * * * +-|
-1 |-* * **** **** **** * * ** * ** ** * * ** * ** * * * +-|
| * ** ** * ** * * ** * * ** * ** ** *** ** * ** **** * |
-1.5 |-+ ****+ * ** * +* **+ * * ** + * * +* ** ** * +-|
-2 +------------------------------------------------------------------+
0 10 20 30 40 50 60 70 80
EURUSD-d - Stats
Mean StdDev URT Samples
--------------------------------------------------------
-0.150 1.369 -1.251 25
0.069 1.549 -5.128 25
-0.108 1.443 -4.204 27
--------------------------------------------------------
0.219 106.6% -2.049 77
StatsMedianSplit::Stats size = 12.5596
Stationarity score = -2.04925
2019.04 - 2019.06