-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathlab_6_model.Rmd
121 lines (85 loc) · 4.35 KB
/
lab_6_model.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
---
title: 'Lab #6 (model)'
author: "Jerid Francom"
date: "10/4/2021"
output:
pdf_document:
toc: yes
html_document:
toc: yes
df_print: kable
---
```{r setup, message=FALSE}
library(tidyverse) # to manipulate data and plot
library(tadr) # to download compressed files
library(rtweet) # to interface with Twitter API
```
# Direct downloads
The goal of this section is to download a compressed file and decompress its contents into the `data/original/` directory. The data to be downloaded comes from the [ACTIV-ES Corpus](https://github.com/francojc/activ-es) available on GitHub.
First, I will source the `functions/functions.R` file to load the `get_compressed_data()` into the current R session.
```{r dd-source-functions}
source(file = "functions/functions.R") # source the get_compressed_data function
args(get_compressed_data) # view the arguments of the function
```
The `get_compressed_data()` function has three arguments. Two are required. `url` will be where the URL address will be added and `target_dir` will be where I add the directory where I want to download and decompress the .zip file. `force` has a default of 'FALSE', which will be set to 'TRUE' to force a redownload of the .zip file.
```{r dd-download-actives-tagged}
get_compressed_data(url = "https://github.com/francojc/activ-es/raw/master/activ-es-v.02/corpus/tagged.zip", target_dir = "data/original/actives/") # download and decompress
```
Let's look at the updated project directory structure.
```{r dd-directory-tree}
fs::dir_tree(recurse = 2) # show the project directory structure (2 levels deep)
```
# API interfaces
In this section I am going to work with the rtweet package to access the Twitter API. I will perform a search for a set of terms, compile one dataset, and save the dataset to disk as a .csv file.
First I need to load the pre-established Twitter authentication token.
```{r ai-load-authentication-token}
student_token <- read_rds(file = "student_token.rds") # read the authentication token .rds file
```
I tested the following code in the R console and it worked!
```{r ai-stream-test, eval=FALSE}
stream_usa <-
stream_tweets(lookup_coords("usa"), timeout = 10, token = student_token) # NOT RUN (test in R Console)
```
Here is the `stream_usa()` function
```{r ai-stream-function}
stream_usa <- function(file, timeout = 10, token = student_token, force = FALSE) {
# Function:
# Stream tweets from the US and save results to a csv file
if(!file.exists(file) | force == TRUE) { # check if the file does not exist or force is set to TRUE
message("Getting ready to stream.") # message
if(!dir.exists(dirname(file))) { # check to see if the directory exists
dir.create(path = dirname(file), showWarnings = FALSE, recursive = TRUE) # create the necessary directory structure
message("Directory created.") # message
}
stream <- # results
rtweet::stream_tweets(lookup_coords("usa"), # run stream_tweets for tweets from within the US
timeout = timeout, # set the timeout for the stream
token = token) %>% # set the API token used
lat_lng() # extract the lat and long coordinates from the stream results
rtweet::save_as_csv(x = stream, file_name = file) # save the stream object as a csv file
message("Stream file saved!") # message
} else { # if the file exists and force is FALSE
message("Stream file already exists. Set 'force = TRUE' to overwrite existing data.")
}
}
```
Run the `stream_usa()` function setting the `timeout` argument to `300`.
```{r ai-run-stream-usa}
stream_usa(file = "data/original/twitter/stream_twitter_usa.csv",
timeout = 300) # stream for 5 minutes
```
Now I will check to see that the data structure has been created.
```{r ai-directory-structure-twitter}
fs::dir_tree(recurse = 2) # show the project directory structure (2 levels deep)
```
Read in the twitter data that was stored in the .csv file.
```{r ai-read-twitter-stream, message=FALSE, warning=FALSE}
stream_twitter_usa <- read_csv(file = "data/original/twitter/stream_twitter_usa.csv") # Read streamed twitter data
```
```{r ai-plot-tweets, message=FALSE}
# Function sourced functions.R script
source("functions/functions.R") # source functions.R script
plot_tweet_langs(tweets = stream_twitter_usa) # plot tweets labeling the languages by color
```
# Assessment
...