Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in check_token_v2(): ! A bearer token is needed for this endpoint. #760

Closed
abuchmueller opened this issue Jan 31, 2023 · 6 comments
Closed

Comments

@abuchmueller
Copy link

abuchmueller commented Jan 31, 2023

After loading the latest devel version (‘1.1.0.9001’), I'm wondering if 'painless' streaming without providing a bearer token is still possible with this package.
One of the greatest features of this package was, that just using a regular account, you could authenticate and stream tweets.
Since you deprecated stream_tweets() in favor of filtered_stream() I'm wondering of this is even still possible.

library(rtweet)

auth_setup_default()
#> Using default authentication available.
#> Reading auth from '/Users/abuchmueller/Library/Preferences/org.R-project.R/R/rtweet/default.rds'
sample_stream(parse = F)
#> Error in `check_token_v2()`:
#> ! A bearer `token` is needed for this endpoint.

#> Backtrace:
#>     ▆
#>  1. └─rtweet::sample_stream(parse = F)
#>  2.   └─rtweet:::endpoint_v2(...)
#>  3.     ├─httr2::req_url_path_append(req_v2(token), path)
#>  4.     │ └─httr2:::check_request(req)
#>  5.     │   └─httr2:::is_request(req)
#>  6.     └─rtweet:::req_v2(token)
#>  7.       └─rtweet:::check_token_v2(token)
#>  8.         └─rlang::abort("A bearer `token` is needed for this endpoint.")

I'm confused. The documentation says token | Expert use only. Use this to override authentication for a single API call. In most cases you are better off changing the default for all calls. See auth_as() for details. suggesting that bearer tokens are optional.

Am I missing something here or are bearer tokens now always required?

Also I have minor nitpick, for the time being that parsing is not supported, I would not set the default parameter parse in filtered_stream() to TRUE.

@llrs
Copy link
Member

llrs commented Jan 31, 2023

Hi Andreas, there are several points I think are worth addressing in your post.

The latest development of rtweet is in the devel branch currently at 1.1.0.9003 version number. This is to make it easier for people to read the documentation close to what it is on CRAN and keep the devel branch "hidden" (it is documented in the contributing file). I mention this in case you want the latest features too and not only a working auth_setup_default() function. Sorry about that!

Yes, it was cool as long as it worked until last November. The API no longer worked. I don't have the energy and time to support old APIs when it is clear that it won't be supported in the future, so rtweet had to switch to the new API v2 which has different requirements, works different and returns different output. See this post where I explain what I am working on, and some decision I'm facing.

The bearer token was the easiest and fastest way to implement support for the stream endpoints in API v2 with existing code. I am still struggling authenticating a user via an app with the OAuth2 as implemented by Twitter, so that users can use their own account to use the API. That feature will come, but according to this table, Twitter only allows using the bearer token for this endpoint. It might change in the future or it might not. But you will need to use the bearer token for now.

The documentation you quote is from an internal function. I don't see any mention to distinction of authentication mechanisms. In rtweet all arguments for authorization codes are named token, and are used for bearer tokens or OAuth1.0 tokens (The one provided by auth_setup_default()). The expert usage warning is because it could lead to surprise interactions where a user would set up a token but a different one could be sent to Twitter for pagination.

The default parse = TRUE is to keep consistency with the other endpoints. At the time of release I hadn't figured out how to support parsing the output of the content which depends on user's arguments. rtweet should parse and transform the data to a nice data.frame in the future to still be the default.
Keeping TRUE as default will make it easier for me, developers and users to handle that transition: Once I implement it their code with parse = FALSE will work still but new code won't need to handle the data directly once parse = TRUE works without an error.
Doing the other way around would, go against the consistent behavior of rtweet and make it harder for me to later upgrade the package as I would need to notify developers and maintainers in advance that I was about to break (again) their code. But if you have arguments against this approach please let me know.

I hope this helps clarify the problems around the token and the streaming endpoints.


I went to see the issues at Twitmo:

  • in the devel branch there is a function to make requests to Twitter's archive using the API v2. It might be ready/released next month (If I'm lucky).
  • The issue with the open connections in the streaming function unused connection warning when using parse_stream() abuchmueller/Twitmo#13 is something I couldn't reproduce. Sometimes the streaming functions take too much time in my computer and I am not sure why. But if you have some feedback I'll try to fix it in rtweet.
  • The problem with incorrect EOF could be rtweet fault's, I think I had fixed it here. But it might help if you pass pagesize = 1. As it is done in parse_stream (which doesn't work for the new streaming files...)

@abuchmueller
Copy link
Author

abuchmueller commented Feb 1, 2023

Hi Lluis,

thanks for taking the time to write such a thorough response and also going through Twitmos issues.
Since Twitmo heavily depends on rtweet and I would like to bring it into a working state again, I'm also facing some decisions. So I am very interested which direction the development of rtweet is taking (and also to avoid working on stuff that you might have already sorted out or will in the near future like parsing).

That feature will come, but according to this table, Twitter only allows using the bearer token for this endpoint. It might change in the future or it might not. But you will need to use the bearer token for now.

I'm still not sure I get this right. There is currently no way to give users API access on behalf of a Twitter account (like in rtweet v0.7) because Twitters V2 API doesn't allow this? I guess OAuth 1.0a User Context is what I am looking for? This would be a bummer for me because it means that every user definitely will need a bearer token from now on. If this is true I also suspect given the current state of Twitter with it's shifting priorities that there won't happen much in that regard soon meaning I have little hope that OAuth 1.0a User Context for V2 streaming will come.

The documentation you quote is from an internal function
The documentation I quoted is from filtered_stream, which is not internal, is it?

In the devel branch there is a function to make requests to Twitter's archive using the API v2. It might be ready/released next month (If I'm lucky).

This is great news for those with academic research access. Twitmo however was conceptualised as a package towards less advanced users, that don't know their way around API's (yet), that abstracts all of this stuff away. Getting academic access is not trivial. If this cannot be true in the future anymore I'll have to reconsider if it makes sense to continue/pick up development again.

The issue with the open connections in the streaming function abuchmueller/Twitmo#13 is something I couldn't reproduce. Sometimes the streaming functions take too much time in my computer and I am not sure why. But if you have some feedback I'll try to fix it in rtweet.

Don't worry about that, this is more likely related to the jsonlite package or my implementation. It's ugly and I should've just hidden the warning but harmless, I guess.

The problem with incorrect EOF could be rtweet fault's, I think I had fixed it here. But it might help if you pass pagesize = 1. As it is done in parse_stream (which doesn't work for the new streaming files...)

This was never a big problem, since I've used that regex you had in earlier rtweet versions to throw out bad lines. I think later on, bad lines weren't even written to the json file and thrown out upon streaming if I remember correctly.

What is currently an issue, is that parsing doesn't work because there seems to be changes in rtweets tweets_with_users function. Did you change this function to be compatible with the V2 format? Then it would make sense that I get an out of bounds error, because the little example json I put in the package for demo purposes was streamed using v1.1.

library(Twitmo)
raw_path <- system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")
mytweets <- load_tweets(raw_path)
#> opening file input connection.
#>  Found 167 records... Found 193 records... Imported 193 records. Simplifying...
#> closing file input connection.
#> Warning in tb$possibly_sensitive <- list(NA): Coercing LHS to a list
#> Error in x[["user"]]: subscript out of bounds

The default parse = TRUE is to keep consistency with the other endpoints. At the time of release I hadn't figured out how to support parsing the output of the content which depends on user's arguments. rtweet should parse and transform the data to a nice data.frame in the future to still be the default.
Keeping TRUE as default will make it easier for me, developers and users to handle that transition: Once I implement it their code with parse = FALSE will work still but new code won't need to handle the data directly once parse = TRUE works without an error.
Doing the other way around would, go against the consistent behavior of rtweet and make it harder for me to later upgrade the package as I would need to notify developers and maintainers in advance that I was about to break (again) their code.

The way CRAN handles downstream dependency management (by forcing downstream dependencies to work the the latest upstream package version, not allowing to lock to lower versions) code breakage of downstream dependencies is bound to happen all the time with R packages anyway. You can dribble around it but ultimately it's not your fault.
However getting greeted with an error when trying out a new package/function and only using default arguments throws new users kinda off and makes me not want to use the package/forget about it again. Ultimately, it's your decision.

@llrs
Copy link
Member

llrs commented Feb 1, 2023

I am very happy to talk with other maintainers depending on rtweet! I hoped that the update to v1.0.0 would spark some conversations: I want to improve how rtweet handles the API but at the same time not include too much to make redundant other packages (I must confess I have spent almost no time learning what they do).

  1. About the authentication: There are several ways to authenticate, for the API v2 at the streaming endpoints there is no way to give users API access on behalf of a Twitter account. Some endpoints of v2 still allow OAuth 1.0a User Context but I don't expect it to stick much longer (In the API v2 documentation it appears in smaller letters and they added OAuth2.0 support).
  2. About the documentation: My bad, I only found in that internal page and didn't realize it was reused for the new stream endpoints... which I shouldn't have! I'll change that. I get too used to the internals and I forget how it is for normal users.
  3. About the warning closing the connection. It might be a problem with jsonlite but as strange things happen with it I am not even 50% sure.
  4. About the EOF. I got the same error a couple of times because somehow httr2::req_stream appends TRUE at the end of the file, but I thought I had handled this.
  5. I didn't make big changes in tweets_with_user. I added a class and a few cosmetic changes to make it slightly faster (see the diff between versions), neither of these changes triggered any problem in rtweet tests (or its dependencies), but I might have missed something. This function is not compatible with the new output of API v2, so it can't be used for it.
  6. About the examples and default code: I'll make the examples friendly for the moment (adding parse = FALSE). I need to give enough notice to package developers downstream when I submit to CRAN and I don't want to break user code more than needed. But yes, I need to work more on how I communicate with the users and keep in mind their experience.

Let me know if you have more feedback, I'm very happy to hear it.
I also try to update the default branch one week before submission to CRAN and announce it's imminent release in Twitter hopping to avoid this kind of problems. Next time I might write a blog post and wait longer.

My general idea of rtweet is to have a lean, consistent interface with Twitter API with sensible, rectangular data handling the authentications as easier as I can. In the future I will probably drop functions requiring dependencies not to the core of the package (ggplot2, igraph, magick, webshot, maps?) or that are more about processing the data, but I might incorporate some functions to create/explore network relationships through the API (ie, combining different rtweet functions, vectorizing some others, ...).

@llrs
Copy link
Member

llrs commented Feb 2, 2023

Well, all of this is moot, see the new announcement: To use the API all users should pay.

@abuchmueller
Copy link
Author

Well, all of this is moot, see the new announcement: To use the API all users should pay.

Well damn, that was what I meant with "shifting priorities" at Twitter ... EM bought a cashflow negative company at an inflated price, so the free lunch policy was bound to end.
Big blow to the research community, Twitter was a great place for researchers to sample text.
Maybe it's time to look into other social networks for sampling text ...

@llrs
Copy link
Member

llrs commented Mar 29, 2023

I'm about to release a version (check the master/NEWS file). I'm closing this question as I've changed the wording of the new endpoints.
Let me know if there is any trouble.

@llrs llrs closed this as completed Mar 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants