-
Notifications
You must be signed in to change notification settings - Fork 199
Conversation
Updating the table:
On some cases there is less columns when the content is parsed than when not (lookup_users, search_users)! This is due to the The 73 + 1 is the user added when parsed to know which user has liked which tweet. |
Documenting progress, about blocking problems at the moment:
|
Many changes, but still need to add a test for #574 (and didn't check for #575 but should be fixed too)
Maybe some columns can be further nested or hidden on an attribute. |
@hadley Maybe you can give a look at this PR. Probably the best way to start is on the tweet function and the new functions to parse twitter objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like some great improvements overall! Questions/comments below.
R/entities_objects.R
Outdated
# <https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/object-model/entities#hashtags> | ||
hashtags <- function(x) { | ||
if (NROW(x) == 0) { | ||
return(data.frame(text = NA, indices = I(list(NA)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it correct that a zero row input yields a one row output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the reason is to not omit anything for later easier parsing and linking between hasthags and tweets. There is also the problem when a user hasn't tweeted if the information about the tweets of that users and other users are collected it would break the tweet_data extraction see #574 for such a report.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I think you always want to return a data frame here, but are you sure it should have one row and not zero rows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I'm sure. What is your concern? Filling the object with mostly empty data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just generally reasoning about this function. I don't understand how all the pieces fit together but this feels like a code smell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some tests and it seems that mixing NA
s and data.frames
would be correctly handled by rbind
unless there are nested data.frame
s. I'll wait a bit to merge the PR but I can't think of any workaround that won't complicate more the workflow.
Tests
``` r
l <- list(a = NA, b = data.frame(a = "b", b = "a"), c = data.frame(a = c("C", "D"), b = c("B", "C")))
do.call(rbind, l)
#> a b
#> a <NA> <NA>
#> b b a
#> c.1 C B
#> c.2 D C
l <- list(a = data.frame(a = "b", b = "a"), b = NA, c = data.frame(a = c("C", "D"), b = c("B", "C")))
do.call(rbind, l)
#> a b
#> a b a
#> b <NA> <NA>
#> c.1 C B
#> c.2 D C
l <- list(a = data.frame(a = "b", b = "a"), b = NA, c = data.frame(a = "C", b = "B"))
do.call(rbind, l)
#> a b
#> a b a
#> b <NA> <NA>
#> c C B
l <- list(a = data.frame(a = "b", b = "a"), b = data.frame(a = c("C", "D"), b = c("B", "C")), c = NA)
do.call(rbind, l)
#> a b
#> a b a
#> b.1 C B
#> b.2 D C
#> c <NA> <NA>
library("tibble")
l <- list(a = NA, b = tibble(a = "b", b = "a"), c = tibble(a = c("C", "D"), b = c("B", "C")))
do.call(rbind, l)
#> # A tibble: 4 x 2
#> a b
#> * <chr> <chr>
#> 1 <NA> <NA>
#> 2 b a
#> 3 C B
#> 4 D C
l <- list(a = tibble(a = "b", b = "a"), b = NA, c = tibble(a = c("C", "D"), b = c("B", "C")))
do.call(rbind, l)
#> # A tibble: 4 x 2
#> a b
#> * <chr> <chr>
#> 1 b a
#> 2 <NA> <NA>
#> 3 C B
#> 4 D C
l <- list(a = tibble(a = "b", b = "a"), b = NA, c = tibble(a = "C", b = "B"))
do.call(rbind, l)
#> # A tibble: 3 x 2
#> a b
#> * <chr> <chr>
#> 1 b a
#> 2 <NA> <NA>
#> 3 C B
l <- list(a = tibble(a = "b", b = "a"), b = tibble(a = c("C", "D"), b = c("B", "C")), c = NA)
do.call(rbind, l)
#> # A tibble: 4 x 2
#> a b
#> * <chr> <chr>
#> 1 b a
#> 2 C B
#> 3 D C
#> 4 <NA> <NA>
l <- list(a = NA, b = data.frame(a = I(data.frame(c = 1)), b = "a"), c = data.frame(a = I(data.frame(c = c(2, 3))), b = c("B", "C")))
do.call(rbind, l)
#> Warning: non-unique value when setting 'row.names': '1'
#> Error in `.rowNamesDF<-`(x, value = value): duplicate 'row.names' are not allowed
l <- list(a = data.frame(a = I(data.frame(c = 1)), b = "a"), b = NA, c = data.frame(a = I(data.frame(c = c(2, 3))), b = c("B", "C")))
do.call(rbind, l)
#> Warning: non-unique values when setting 'row.names': '1', '2'
#> Error in `.rowNamesDF<-`(x, value = value): duplicate 'row.names' are not allowed
l <- list(a = data.frame(a = I(data.frame(c = 1)), b = "a"), b = NA, c = data.frame(a = I(data.frame(c = c(2))), b = "B"))
do.call(rbind, l)
#> Warning: non-unique value when setting 'row.names': '1'
#> Error in `.rowNamesDF<-`(x, value = value): duplicate 'row.names' are not allowed
l <- list(a = data.frame(a = I(data.frame(c = 1)), b = "a"), b = data.frame(a = I(data.frame(c = c(2, 3))), b = c("B", "C")), c = NA)
do.call(rbind, l)
#> Warning: non-unique value when setting 'row.names': '1'
#> Error in `.rowNamesDF<-`(x, value = value): duplicate 'row.names' are not allowed
l <- list(a = NA, b = tibble(a = I(data.frame(c = 1)), b = "a"), c = tibble(a = I(data.frame(c = c(2, 3))), b = c("B", "C")))
do.call(rbind, l)
#> Warning: non-unique value when setting 'row.names': '1'
#> Error in `.rowNamesDF<-`(x, value = value): duplicate 'row.names' are not allowed
l <- list(a = tibble(a = I(data.frame(c = 1)), b = "a"), b = NA, c = tibble(a = I(data.frame(c = c(2, 3))), b = c("B", "C")))
do.call(rbind, l)
#> Warning: non-unique values when setting 'row.names': '1', '2'
#> Error in `.rowNamesDF<-`(x, value = value): duplicate 'row.names' are not allowed
l <- list(a = tibble(a = I(data.frame(c = 1)), b = "a"), b = NA, c = tibble(a = I(data.frame(c = c(2))), b = "B"))
do.call(rbind, l)
#> Warning: non-unique value when setting 'row.names': '1'
#> Error in `.rowNamesDF<-`(x, value = value): duplicate 'row.names' are not allowed
l <- list(a = tibble(a = I(data.frame(c = 1)), b = "a"), b = tibble(a = I(data.frame(c = c(2, 3))), b = c("B", "C")), c = NA)
do.call(rbind, l)
#> Warning: non-unique value when setting 'row.names': '1'
#> Error in `.rowNamesDF<-`(x, value = value): duplicate 'row.names' are not allowed
Created on 2021-07-13 by the reprex package (v2.0.0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably missing something, but I don't see you creating an zero row data frames in your tests? But I don't really understand how all of this data flows together, and there's likely to be something upstream that means my suggestion wouldn't work.
R/utils.R
Outdated
@@ -100,7 +104,7 @@ maybe_n <- function(x) { | |||
} | |||
|
|||
is_testing <- function() { | |||
identical(Sys.getenv("TESTTHAT"), "true") | |||
requireNamespace("testthat", quietly = TRUE) && identical(Sys.getenv("TESTTHAT"), "true") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That check shouldn't be needed since only testthat should be setting the envvar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should because later on there are some function calling testthat function inside the package (skip_*). I swapped the order though so that if the variable is not present the package won't be loaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's a good idea to change is_testing()
because it's a standard helper you'd generally expect to be copied directly from testthat. Where is skip()
being used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two cases where this happens; on http.R lines 309-312:
handle_rate_limit <- function(x, api, retryonratelimit = NULL, verbose = TRUE) {
if (is_testing()) {
testthat::skip("Rate limit exceeded")
}
....
and auth.R lines 273-275
no_token <- function() {
if (is_testing()) {
testthat::skip("Auth not available")
...
I could move the check outside the function, but I thought it was more accurately to leave it inside it.
Many thanks for all the comments. Will try to reply and address them soon (1 or 2 weeks probably). |
Maybe I forgot to commit this??
I'm moving this comment to #558 because it's more generally about the changes and less about implementation. |
Simplify object parsing towards #558
Steps:
tweets_to_tbl_
is doing the internal tests and transforming the data.frame from X columns to Y columns for all these functions:*Numbers from the test cases.
Probably a list of endpoints that supply each kind of object will be much useful.