https://www.kaggle.com/datasets/paultimothymooney/stock-market-data
Place data under directory ./data/stock_market_data/
- Optimal trading frequency is based on historical record from 1999-2022 purely. That means no other features or attributes are taken into the consideration
- We assume no inflation rate in our base model. This is a simplificiaton, because theoratically a marginal profit of 10% tomorrow is different from a margianl profit of 10% in 10 years. We will discuss the next step to put inflation rate into consideration in future work
- Recommandation is provided for universal traders. That means not to a specific stock buyer
- Minimum frequency is one day
- Each trading event is i.i.d
A typical application to the stock market dataset is to do time series forecasting based on historical record. While it is possible to use ML models like ARIMA and RNN to predict a time window of stock price to further make a suggestion on trade frequency, stock market is highly unpredictable, and seasonality analysis is usually insufficient. We take a completely different path to solve this problem.
First of all, suppose a trader buys a stock at time
Since traders can trade at any time
Histograms of marginal profits of 10 randomly selected stock profile generated from N days trading frequency.
Here are examples of the marginal profit distribution for a specific stock.
nasdaq/AAL | sp500/HQ |
---|---|
We can see clearly that with different size of
To model this, one of the best options is Gaussian Mixture Models (GMM) that use Expectation-Maximization to find maximum likelihood estimates to fit n Gaussian models. Another reason for me to choose GMM is because we will be fitting GMM on a range of
To make this process automated, we use
- Run in a virtual env with python version 3.7.16.
pip install -r requirements.txt
- Alternatively, you may create a Container using
Dockerfile
All the settings related to training are put under configs.yaml
. To train, run
python train.py
At the end, we take the
Another thing to note is that we are essentially comparing marginal profit from a same day trading frequency to a 10 year trading frequency, which might not be a fair comparision (depending on the objective). To make a fair comparision is very easy. As we treat every trade event as i.i.d., we can just scale the N day trading marginal profit by a constant value
We can push this project further to make an end-to-end software product to provide real-time recommandation. New stock profile gets updated and stored in the database, and our GMM models evaluate the performance on new data using metrics, AIC and BIC. If the error terms exceeds a certain threshold, it will trigger a training request to update the models in the database.
On the user side, they could send request based on their interest. If they want to know about a set of specific stocks, our proposed GMM enambles will only be weighted averaging over those stocks. On the other hand, if they are asking general inquiry, we can just preload the recommandation.