A Composite Sentiment Index for the Cryptocurrency Market

Sentiment Measurement & Return Predictability.

Project Description

Introduction

Objective: Master's Degree Graduation Thesis.
Abstract: Constructed a comprehensive list of 9 sentiment indicators in crypto market and combined these indicators into one single sentiment index. Proved the index to be an excellent predictor of crypto market returns using VAR models and Granger-Causality tests.
Status: Completed

Methods Used

Sentiment Analysis (Utilizing a crypto-specific lexicon created by Chen et al, 2019)
Principal Component Analysis
Vector Autoregression Models

Dependencies

Python 3
numpy==1.18.5
pandas==1.0.5
scikit-learn==0.23.2
pytrends==4.7.3
statsmodels==0.12.0
plotly==4.9.0
nltk==3.5
beautifulsoup4==4.9.3

Interesting Results to Keep You Reading

It is the first time (to my knowledge) that one follows a composite approach to create a sentiment index for the cryptocurrency market (i.e. combining multiple sentiment indicators into one index, the idea is to create an index that could remains stable and useful for a long period of time, according to Brown & Cliff, 2004)
The VAR model shows that the lagged values of my sentiment index are significantly correlated with the daily returns of the crypto market (at lag 1, 3, 4, 5).
Granger-Causality tests show that the sentiment index is an excellent predictor of cryptocurrency returns.
Over a period of 5+ years (12/2014 - 07/2020), a sentiment-based trading strategy was backtested and generated a portfolio equalling 320x the original portfolio (compared to around 40x if we just simply hold the market index. Note that during this time, the crypto market exploded exponentially in size, hence resulting in this seemingly crazy returns).

Getting Started

How to Run

($ indicates these are terminal commands)

Clone this repo: $ git clone https://github.com/dang-trung/crypto-sentiment-index/
Create your environment (virtualenv):
$ cd crypto-sentiment-index
$ virtualenv -p python3 venv
$ source venv/bin/activate (bash) or venv\Scripts\activate (windows)
$ (venv) pip install -e

Or (conda):
$ conda env create -f environment.yml
$ conda activate crypto-sentiment-index
In terminal:

Get data from StockTwits and Reddit: $ python -m src.data
Process data: $ python -m src.process
Visualize: $ python -m src.visualize
Create models: $ python -m src.model

Project Structure

├─ data                      
│  ├─ 00_external            <- Contain rules for sentiment analysis & text processing
│  ├─ 01_raw                 <- Immutable text messages retrieved from stockTwits/reddit
│  └─ 02_processed           <- Data used to developed models
│     ├─ direct              <- Direct sentiment indicators
│     ├─ indirect            <- Indirect sentiment indicators
│     ├─ crix.json           <- Target variable
│     └─ final_dataset.csv
├─ output                    <- Generated output
│  ├─ 01_figures             <- Figures
│  └─ 02_reports             <- Analysis reports
│     ├─ full_thesis.pdf     <- Final thesis
│     └─ report_chapters.pdf <- Analysis chapters (skip literature review etc.)
├─ src                       <- Source code
│  ├─ data                   <- Package of modules that retrieve raw data
│  │  ├─ __init__.py         
│  │  ├─ __main__.py         <- Run in terminal: $ python -m src.data
│  │  ├─ convert_ts.py       <- Functions to convert between different formats of time
│  │  ├─ others.py           <- Get messages from other sources (google volume, trading volume, FT articles)
│  │  ├─ reddit.py           <- Get messages from reddit
│  │  └─ stocktwits.py       <- Get messages from stockTwits
│  ├─ process                <- Modules used to retrieve data 
│  │  ├─ __init__.py
│  │  ├─ __main__.py         <- Run in terminal: $ python -m src.process
│  │  ├─ gather_data.py      <- Gather all processed data into data/02_processed
│  │  ├─ sentiment_score.py  <- Function to score sentiment 
│  │  └─ text_process.py     <- Function to process text data (only info relevant to sentiment analysis remains)
│  ├─ __init__.py
│  ├─ model.py               <- Train the model using processed data from data/02_processed 
│  └─ visualize.py           <- Generate figures
├─ .gitattributes            <- Avoid GitHub mis-recognize figures in html format as codes
├─ .gitignore                <- Avoids uploading large data, system files, etc.
├─ LICENSE.md
├─ README.md                 
├─ environment.yml           <- Share conda enviroment
├─ requirements.txt          <- To reproduce analysis enviroment using pip
└─ setup.py                  <- Make the project pip installable with `$ pip install -e`

Dependent Variable

Cryptocurrency market returns (computed using the market index CRIX, retrieved here, see more on how the index is created at Trimborn & Härdle (2018) or those authors' website.)

Sentiment Indicators

Sentiment score of Messages on StockTwits, Reddit Submissions, Reddit Comments
- Computed using dictionary-based sentiment analysis, lexicon used: crypto-specific lexicon by Chen et al (2019), retrieved at the main author's personal page.
- StockTwits messages are retrieved through StockTwits Public API, Reddit data are retrieved using PushShift.io Reddit API.
Messages volume on StockTwits, Reddit Submissions, Reddit Comments.
Market volatility index VCRIX (see how the index is created: Kolesnikova (2018), retrieved here.)
Market trading volume (retrieved using Nomics Public API)

The sentiment index is simply the first principal component of these 9 indicators.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

A Composite Sentiment Index for the Cryptocurrency Market

Project Description

Introduction

Methods Used

Dependencies

Interesting Results to Keep You Reading

Table of Contents

Getting Started

How to Run

Project Structure

Dependent Variable

Sentiment Indicators

Read More

Files

README.md

Latest commit

History

README.md

File metadata and controls

A Composite Sentiment Index for the Cryptocurrency Market

Project Description

Introduction

Methods Used

Dependencies

Interesting Results to Keep You Reading

Table of Contents

Getting Started

How to Run

Project Structure

Dependent Variable

Sentiment Indicators

Read More