-
Notifications
You must be signed in to change notification settings - Fork 0
/
Google_Rajneesh.Rmd
129 lines (85 loc) · 5.45 KB
/
Google_Rajneesh.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
Twitter Sentiment Analysis Assignment
========================================================
In this assignment, we are showing that whether Twitter can predict Google stock price movement or not. For five days, 01 Dec 2014 to 05 Dec 2014, Twitter has been mined for Google tweets.
```{r eval=FALSE, echo=FALSE}
library("twitteR")
library("wordcloud")
library("tm")
library("RColorBrewer")
library("ggplot2")
my.key <-"4qIXH3iyPC2XXdgNAHAye7wcr"
my.secret <-"Pz7HVASz7jrFTsABvXVi858pjhz011QduDCrq5mVJGWbJCDIeY"
cred <- OAuthFactory$new(consumerKey=my.key, consumerSecret=my.secret, requestURL='https://api.twitter.com/oauth/request_token', accessURL='https://api.twitter.com/oauth/access_token', authURL= 'https://api.twitter.com/oauth/authorize')
save(cred, file="twitter_authentication.Rdata")
cred$handshake(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl"))
registerTwitterOAuth(cred)
tweets <- searchTwitter("@google",n=200, lang="en", since='2014-12-01', until='2014-12-02')
length(tweets)
tweets.id <- sapply(tweets, function(x) x$getId())
tweets.text <- sapply(tweets, function(x) x$getText())
tweets.screenname <- sapply(tweets, function(x) x$getScreenName())
tweets.isretweet <- sapply(tweets, function(x) x$getIsRetweet())
tweets.retweeted <- sapply(tweets, function(x) x$getRetweeted())
tweets.created <- sapply(tweets, function(x) x$getCreated())
head(tweets.text)
df <- data.frame(tweets.id, tweets.text, tweets.screenname, tweets.isretweet, tweets.retweeted, tweets.created)
names(df) <-c("id", "text", "screenname", "isretweet", "retweeted", "created")
write.table(df, file = "google.txt", append = TRUE)
pos <- scan("positive-words.txt",what="character",comment.char=";")
neg <- scan("negative-words.txt",what="character",comment.char=";")
tweets.corpus <- Corpus(VectorSource(tweets.text))
tweets.corpus <- tm_map(tweets.corpus, tolower)
tweets.corpus <- tm_map(tweets.corpus, removePunctuation)
tweets.corpus <- tm_map(tweets.corpus, function(x) removeWords(x,stopwords()))
corpus.split <- lapply(tweets.corpus,strsplit,"\\s+")
matches <- lapply(corpus.split,function(x) {
match.pos <- match(x[[1]],pos)
match.neg <- match(x[[1]],neg)
list(length(which(!is.na(match.pos))),length(which(!is.na(match.neg))))
})
match.matrix <- matrix(unlist(matches),nrow=length(matches),ncol=2,byrow=T)
simple.sentiment <- match.matrix[,1] - match.matrix[,2]
hist(simple.sentiment)
sum(simple.sentiment)
```
# Total number of tweets each day- 200
# Histograms for each day
> The sentiment histogram for day 1
![Day 1](https://raw.githubusercontent.com/rajneesh2017/histogramPics/master/Rplot01.png)
> The sentiment histogram for day 2
![Day 2](https://raw.githubusercontent.com/rajneesh2017/histogramPics/master/Rplot02.png)
> The sentiment histogram for day 3
![Day 3](https://raw.githubusercontent.com/rajneesh2017/histogramPics/master/Rplot03.png)
> The sentiment histogram for day 4
![Day 4](https://raw.githubusercontent.com/rajneesh2017/histogramPics/master/Rplot04.png)
> The sentiment histogram for day 5
![Day 5](https://raw.githubusercontent.com/rajneesh2017/histogramPics/master/Rplot05.png)
# Graphs for comparision
>The graphs comparing the changes in stock price for 5 days to the actual stock price changes:
```{r echo=FALSE,message=FALSE}
library("ggplot2")
library(RCurl)
y <- getURL("https://raw.githubusercontent.com/rajneesh2017/IT497_GroupAssign_Rajneesh_Mohit_Ekta_Tony/master/Google_RajneeshPandey/google.csv")
```
```{r echo=FALSE,message=FALSE}
x <- read.csv(text = y)
```
```{r eval=FALSE,echo=FALSE,message=FALSE,fig.width=12}
ggplot(data=x, aes(x=DAYS, y=TSS, fill=DAYS)) + geom_bar(stat="identity")
gg <- ggplot(data=x, aes(x=DAYS, y=ASP, fill=DAYS)) + geom_bar(stat="identity")
gg + coord_cartesian(ylim = c(524, 538))
```
![graph 1](https://raw.githubusercontent.com/rajneesh2017/graphs/master/unnamed-chunk-21.png)
![graph 2](https://raw.githubusercontent.com/rajneesh2017/graphs/master/unnamed-chunk-22.png)
# Table
This table shows Total sentiment score(TSS) and actual stock price(ASP) for five days
```{r echo=FALSE}
x
```
# Summary
>
Here, after doing Twitter sentiment analysis, we can clearly see that total sentiment score can predict stock price changes but it is not always the same. The total sentiment score is the sum of positive and negative sentiment score and this can give an idea about stock price change but we cannot rely on that basis. Here I have done analysis for 5 days in which it is showing the changes quite the same as in actual stock price change, but it could be coincidence. If going further in the analysis, I found that it cannot predict the stock price change always.
>
Now if we talk about the five days analysis, the bar graph for Total sentiment score (TSS) and actual stock price (ASP) shows almost the same pattern. We can see in the bar graphs and above table that for the 1st day and 2nd day, there is a big change in total sentiment score but actual stock price is almost the same, then, on 3rd day both TSS and ASP goes down, on 4th day both goes up and on 5th day both goes down. Therefore, we can say that Twitter sentiment analysis can give an idea up to an extent, but we cannot rely on that.
>
In conclusion, all I can say that Twitter sentiment analysis can give an idea to guess the actual stock price change but we cannot produce a statement like Twitter can predict stock price movement truly because the correlation between total sentiment score based on tweets and actual stock price change is not always the same.