Skip to content

dieko95/AlgoTrading

Repository files navigation

Algorithmic Trading with Python

The following repo is based on the final project of the course "Algorithmic Trading" taught at Hult International Business School by professor Michael Rolleigh. The goal is to backtest a trading algorithm that receives the output from a machine learning model as a signal to perform the strategy. Please find the final results in Algorithmic Trading in Python.pdf

The corresponding tasks are divided into the following notebooks:

Contains the data science workflow. This script was used as a research environment with the objective of extracting the signal from fundamental and alternative data. The signal extraction was done by ultimately developing a machine learning model that had as target returns in a particular stock (in this case Redhat [RHT] stocks).

Data Collection, Cleaning, and Exploration:

The Github repo GetOldTweets was used to web scrape all the mentions that Redhat had in the past 4 years (2014 - 2018). a total of 56,400 tweets were collected and cleaned. Sentiment analysis was undertaken by implementing VaderSentimentAnalyzer. After computing the sentiment score for each tweet, they were aggregated by adding the number of positive, negative and neutral tweets. Finally, these aggregated tweets were inner joined by timestamp with the fundamental data frame. These two data frames consisted of the following:

Fundamental Data:

  • Diluted EPS
  • EBIT Margin
  • EBITDA Margin Revenue % Growth
  • EBITDA % Growth
  • EBIT % Growth
  • NetIncome % Growth
  • Net Debt to Equity
  • Alternative Data:

Alternative Data

  • Favourites
  • retweets
  • Compound
  • Positive
  • Neutral
  • Negative

Machine Learning:

The data was split into training and Test Set (85% | 15%). The target consisted of the stock returns and the features consisted of the previously stated Fundamental and Alternative data. After having the target and features correctly split, the baseline model was created. This model consisted of a simple decision tree with no constraints in its max depth. Thus, it was hugely overfitted. After this, 5-fold cross-validation was done in order to assess how overfitted this model was. Finally, a cross-validated grid search was used with the purpose of finding the best hyperparameters and the optimized decision tree was created. The same process was done with a random forest with the difference of performing feature selection. This process was repeated with Gradient Booster and Neural Nets in order to compare and contrast different machine learning models.

The baseline model (hugely overfit decision tree) was used for the backtesting in order to exemplify the dangers of overfitting in trading.

Backtesting consists of creating a trading strategy and going back in time simulating 'what would have happened' if the strategy was implemented in such a time frame. The time frame selected in this script ranged from 2014-01-02 to 2018-12-30. Finally, `bt` was used as the backtesting library for this project. 3 different strategies were implemented:

1. Baseline Strategy

  • If the predicted return is greater than 2% buy all the stocks with the available funds that we have (e.g. $1,000,000 in stocks).
  • If the predicted return is greater than 1% buy half the stocks with the available funds that we have (e.g. $500,000 in stocks).
  • If the predicted return is less than -2% sell all the stocks with the available funds that we have (e.g. $1,000,000 in stocks).
  • If the predicted return is greater than -1% sell half the stocks with the available funds that we have (e.g. $500,000 in stocks).
2. Baseline Strategy + Limiting the change of position by a max of 10% a day
  •  The limit in the change of position was implemented due to the fact that is not plausible to assume that we will be able to buy or sell 100% of our stocks in just 1 day.
3. Baseline Strategy + Limiting the change of position by a max of 10% a day + 1 Day Lag
  • The 1 Day Lag was incorporated due to the fact that the library `bt` used the closing price in its backtesting.
This notebook backtests the previously mentioned strategies only in the test data time range (2018-04-01 to 2018-12-30).

 Considerations

Please find more information about further development of the current project in the file Limitations and Recommendations.pdf

Authors

Diego Giménez, Jorge Betancourt, Myungsung Kim, Piya Thavornwong, Xuan Lu, Yoon Hee Bae

About

Financial Machine Learning and Algorithmic Trading with Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published