Skip to content

Development of a deep learning trading bot based on natural language processing using 22k financial news and 96m chart points, including offline trading simulation and real online trading executed in Python.

Notifications You must be signed in to change notification settings

adriankuehn/deep-learning-trading-bot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Deep Learning Trading Bot

This project is divided in two parts: The first part covers the development of a deep learning trading bot which trades based on the past chart prices as input data. Hereby we are using 100 million chart points from the EUR-USD index between 2018 and 2020. The second part covers the development of a trading bot which utilizes financial news data as input. This part is also based on the EUR-USD index but also includes 20k news from the business and politics section from the Financial Times.

1. Deep-Neuronal-Network Trading Bot

The training of this trading bot is based on EUR-USD tick data from FXCM-Historical-Data-API
This FXCM-API is by far one of the most accurate market data providers with an average of three ticks per second and is freely available (sick!). The data can be downloaded via the following link structure: https://tickdata.fxcorporate.com/{instrument}/{year}/{int_of_week_of_year}.csv.gz
The workflow for creating the DNN-Bot is devided in five steps where each step is presented by one subfolders. In the following each subfolder is explained.

1.1 Data Preparation

First, the ZIP-CSV are unpacked and converted into a TXT. The values are read and one EURUSD price is determined for each available second (in case there are multiple prices per second). Then the seconds with no EURUSD prices are filled with the price values from the previous seconds. The seconds are divided into steps (batches, 10-250 seconds) and the price average for each step is calculated. The last second of each step (excluded from the average) forms the market price, i.e. the price which the trading simulation later receives for performance evaluation. The step averages together with the respective market prices and times are written to text files in the 'result/' folder.

1.2 Train the Neuronal Network

After importing the prepared dataset the program runs a for-loop and performs the following calculations for each timepoint from the dataset: Based on a pattern previously determined by several parameters, the last x price points are filtered for the current timepoint. These serve as input for the model. In addition, the average of the next x values occurring after the current timepoint is calculated. If this average "target price" is greater than the price from the current timepoint and is also greater than a certain "hold_threshold" (threshold is used to only filter the market events with strong price movements), the model gets the value [1.0] ( = recommendation to buy) as a label, otherwise [0.1] ( = recommendation to sell). Finally, a user-defined "Sequential()" model is created and the model is trained with the inputs and the labels. A network based on LSTM layers (Long-Short-Term-Memory) can be found in the folder. Finally the models are saved for each epoch and the training metrics are visualized graphically.

1.3 Offline Trading Simulation

The neural network which performed best can now be used for the trading bot. Similar to section 1.2, the input data is calculated again with the addition that in this case the market prices (4th position in .txt files, are used the determine profit/loss of trade) are also read in. Based on those inputs, the NN now predicts the trading action (buy-long, sell-short, hold) for each point in time. The NN can buy and sell several times in a row at the current prices. If the NN is unsure and does not exceed a certain hold_threshold, the current positions are hold. The more certain the NN is, the more is invested (omega). As soon as the action reverses from long to short, all the positions are sold and vice versa. At the end of each week all open positions are liquidated so that they are not hold over the weekend. The development of the capital value (portfolio value) of the trading bot is tracked and displayed at the end. The hold-threshold, the maximum-allowed-leverage, the initial-capital-value, the standard-purchase-size and the spread (trading fees) can be set as parameters. All predictions are saved in the log file. All performance results and graphics can be viewed in the respective folders. Below you can see the train history over 20 epochs with weeks 28-35 in 2020 (overfitting is quite strong for the given network configuration, this is only for demonstation purposes and can be improved significantly). You can also see the capital balance development for a trading period of weeks 36-37 in 2020 with a spread of 0.000005 which was trainend on weeks 28-35 in 2020.

Screenshots_1

Screenshots_2

1.4 Online Trading using FXCMPY REST API

FXCMPY's REST API is used to connect Python to the real live forex market. At the beginning of the trading program, a separate thread is started, which continuously receives the current EUR/USD rates from the forex market, converts the data and saves it locally. Once a new value is received, the thread will stall for 0.9s, ultimately providing a price per second. By creating the text files "Break.txt" and "Stopp.txt", trading can be paused and stopped by the trading bot. At the beginning there is a wait of 220 seconds until the price list for feeding the NN has built up. Now, in predetermined intervals, the current input based on most recent price points is calculated and passed to the NN for prediction. The NN now makes its trading decision in the same way as in the offline trading simulation. Disconnections are caught with try-except loops, up to 20 times. The trading bot is programmed in such a way that it correctly processes the time "switch" at midnight and can therefore trade through the night. Unfortunately, it should be noted here that the FXCM server is somewhat unstable. It often takes several attempts before the trading bot has successfully connected to the forex server. It should be noted that an active FXCM demo account including an active token must be provided.

1.5 Dimensionality Reduction using an Autoencoder

In the directory 'autoencoder/' you can find a trained autoencoder which reduces the dimensionality of the input data by nearly 90%. After scaling and normalizing the input price data, the autoencoder tries to compress the available information in the array with the EURUSD prices. This allows the trading bot to gain more information from a smaller number of "compressed" price-arrays. The scripts used for training and some other scripts for additional measurements and method testing can be found in the respective directory.

2. Natural-Language-Processing Trading Bot

The training of the second trading bot is based on the EURUSD data but also on more than 20 thousand financial news from the Financial Times. The data can be downloaded via the Financial Times API and the following link structure: http://content.guardianapis.com/search?from-date=2018-01-01&to-date=2020-12-31&format=xml&page-size=20&page=2&type=article&section=politics&show-fields=headline,main&api-key=your_api_key
The workflow for creating the NLP-Bot is devided in three steps whereby each step is presented by one subfolders. In the following each subfolder is explained.

2.1 Download News Data from Financial Times

The data is authomatically downloaded in JSON format via the imported 'requests' module and a get-request with the above mentioned link. Values such as publication date (on the Internet) as well as title and body text (and many other values) are available for each message. The text data is in HTML format, so the HTML code is converted into plain text using the BeautifulSoup module. The final text together with the web-publication date is stored in an memory optimized numpy array. We also clean the EURUSD price data in nearly the same way as mentioned above for the DNN-Trading-Bot.

2.2 Data Preperation/Allocation

In this subfolder the financial news are allocated to the respective course developments. So we determine for each webpublication-date of the news a price movement which occured right after the web-publication date (there are also parameters in the script which take delays into account). To do this, the EURUSD data is imported from the text files and stored in a memory optimized numpy array with the shape [3, 12, 31, 24 , 60, 60] (year, month, day, hour, minute, second (one value per second)). After that, for each message we store for each news based on a predifined pattern 20 pricepoints which occured after the webpublication date. Together with the datetime, headline, trailtext and body those pricepoints are stored in an final 'Allocated_business_...npy' array which is later used for training the network.

2.3 Train the NLP Network

In this subsection we finally train and create the NLP-Trading-Bot. Herefore we only use financial news from the 'Allocated_business_...npy' array with strong price movements (filtered based on threshold). In terms of network architecture you can see a state-of-the-art natural language processing system. It consits of an Embedding Layer, an one dimensional Convolutional Neuronal Network layer and finally a Bidirectional Long-Short-Term-Memory layer. Therefore, the labels are created based on whether the price movement after the webpublication date is positive or negative. Hereby the price movements are again calculated usign diffrent patterns which could be specified by several parameters. The threshold value for filtering the market movements was determined by analyzing the distribution of the labels (this was done for all thresholds, see histogramm picture). Then a word index (an integer is assigned to each word) and, depending on the network architecture, a one-hot-binary representation (for Dense-Layer) or a list with integer indices (for Embedding-Layer) is built. An average length of the input word lists is calculated and all word lists are either truncated to a uniform length or padded with zeros. The remaining part including the trading simulation equals the sections from the DNN-Trading-Bot.

About

Development of a deep learning trading bot based on natural language processing using 22k financial news and 96m chart points, including offline trading simulation and real online trading executed in Python.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages