Using the power of Big Data Tools to analyze Stock Market
Stocks selected:
NASDAQ: GOOGL,MSFT,ORCL,FB,AAPL,TSLA
NSE: TCS,INFY
For collection of per minute Stock prices, Alphavantage API is used to retrieve prices for companies listed on NASDAQ. Read in Detail here
For NSE, a scraper is written in Python which scrapes the latest prices for each minute. Read in Detail here.
However the problem faced here was that at certain 1 or 2 minute interval, price won't get updated on the NSE website. For the same, data interpolation is done.
Collected Twitter Data using Python with Twitter API. Read in Detail here.
Also collected Twitter Data using Flume. Had to modify Flume's Twitter's package code for the same. Read in Detail here
Data from Twitter is stored on the Data Lake. For the purpose of this project, Cloudera Datalake has been used.
The twitter data is processed to correct the spellings of the text. It is done using a JAVA library called Language Tool.
On the twitter data, Stanford Core NLP library is used to tokenize, annotate sentences, part of speech tagging, syntactic analysis and sentiment analysis using Stanford's pre-trained model. With the same, sentiment value of each tweet is obtained. Multiplication of Number of followers and Sentiment value for that tweet is aggregated per minute Read in detail here
Read about our progress on our blog