-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Twint not fetching Beyond 22 august 2021 #1266
Comments
Same here. Since it's a scraper, I'm used to it not getting a lot of old tweets, but today I'm not getting tweets before the last seven days. When I try to use the since/until commands, it only gets a few tweets from teh same day. I'm wondering if Twint started collecting through the REST API, wich has a limit of the last seven days. |
Same here. twint only collecting less than 100 tweets only. |
I'm finding the same problem |
I am also having the same issue today |
+1 |
I can get tweets prior to Aug 22, but only 1-2 pages of results, and occasionally (~60%) it will return no tweets. |
It seems that when the search query has only a few tweets, it can overcome the date limit. |
I am having the same issue today :( |
Same issue :( |
I'm having the same problem, except not just when looking for specific dates. The number of tweets I get is inconsistent and sometimes zero. I have implemented the changes committed in #684 but that has not resolved the problem. I'm not very proficient with python but it seems that these changes are still pointing to the exception unconditionally when the data returned is zero. Is there a way to change this? |
same issue |
Same issue :( |
There is a workaround, but it has a limitation to 20 tweets, at least for me. It works to retrieve tweets beyond 22nd of August, but you have to set a small interval for 'c.Since' and 'c.Until'. e.g.: Be aware that even with this one, it fails somethimes. If you set 'c.Pandas' to True, you could check if your dataframe is empty and if so, run again the configuration (twint.run.Search(c)) |
Ok guys. Just uncomment line 92 in the url.py file: ('query_source', 'typed_query'),cahnge to This solution works for PC (Linux). It does not seem to work on Raspberry Pi and I have no idea why. |
Is there a way to delete previous comments. It's a bit messy. Here again: Just uncomment (remove the '#') line 92 in the url.py file: ('query_source', 'typed_query'), This solution works for PC (Linux). It does not seem to work on Raspberry Pi and I have no idea why. |
Working for me. Thanks to @klojohn |
thanks for the solution @klojohn . But it does not seem to be working for windows. |
I'm having the same issue, not able to scrape the data using since and until. |
Is the given solution not working for anyone else too? On linux |
@klojohn Great solution! Initially this is working for me on Mac OS. |
Hi all, this seems to be an issue around specific dates and/or tweet, but I cant confirm as the process will stop at random points for each run. If I note the date where it stopped previously and then rerun the process with - - until In the solution provided (to comment out line 92) I tested in a few environments:
|
@aarorauark How did you install twint? |
Thank you @JWLMSN for getting back to me. I used both git and pip as mentioned in the link (https://github.com/twintproject/twint) and tried twint but faced the similar issue. Could you run in the CLI (twint -s "American Airlines" --since "2010-01-02" --until "2010-12-06" -0 "Test_file.csv" --csv) or run in the Jupyter the commands mentioned in my earlier post (snapshot from jupyter has the commands) and let me know if you are able to fetch all the tweets for the range? There is another issue I have opened in which twint is not returning more than 20 tweets and all tweets happened to be from the same day but also not the full set is returned? (#1276) |
@aarorauark I just tried a run with the parameters you mentioned and the query returns way more data beyond 2010-12-04, although I aborted the script because that would be a lot of data to pull for testing purposes.
but my guess is it would have run all the way until the specified end date. So it's pretty safe to say your specific query is not the problem. Must be something else. |
Thank you @JWLMSN for your time. Twint actually starts 2 days prior to the until date you specify thats what i have noticed. I have collected lots of data back in March this year and pretty big files but somehow it is broken now. Could you please share the file because ideally it would not take more than max 10 min to be honest and with this time range of just couple of months it would take only 5 min? I just want to see - (1) you are getting more than 40 odd tweets and (2) you are able to capture most of the dates because what i am seeing is if you do not specify "until" and for less famous companies or less viral search strings twint fetches data for the past 15 days only from now. You can simply run for a month only of any year and for any company say "Facebook or Amazon" that has large user generated content on twitter. I just want to see two points that I have mentioned. Again highly appreciate your time on this. |
I am also having the same problem. I work with Command Prompt (CMD), where I indicate my command: but it only allows me to extract the tweets until September 15th. How can I solve that? |
@klojohn 's solution works for me on mac, thankyou! |
This worked for me on Windows! Thanks! |
I went to try @klojohn 's solution, but that line had already been uncommented in my version of Twint. And I'm still experiencing an issue. I'm on Linux. Did anyone else see that in their version it was already uncommented? |
I work with Command Prompt (CMD), where I indicate, for example, my command: twint -u gofundme How can I apply the solution you indicate? |
Not working for me, I am using windows |
Your solution worked for me as well. I am running twint version 2.1.21 on Python 3.9.7, which is the latest version available via pip. Now I am wondering: is there planned fix for this in the main release? Is there an actively maintained fork of twint somewhere (which preferably includes this fix)? FYI: I'm running these instructions:
|
For a equivalent project, try snscrape :
https://github.com/JustAnotherArchivist/snscrape
Le dim. 14 nov. 2021 à 13:02, 7k50 ***@***.***> a écrit :
… Is there a way to delete previous comments. It's a bit messy.
Here again:
Just uncomment (remove the '#') line 92 in the url.py file:
('query_source', 'typed_query'),
This solution works for PC (Linux). It does not seem to work on Raspberry
Pi and I have no idea why.
Your solution worked for me as well.
I am running twint version *2.1.21* on Python 3.9.7, which is the latest
version available via pip.
Now I am wondering: is there planned fix for this in the main release?
I guess nothing has happened with this issue yet since twint hasn't been
updated on GitHub in a while.
Is there an actively maintained fork of twint somewhere (which preferably
includes this fix)?
If twint is no longer actively maintained, are there any alternative
software we should be aware of?
FYI: I'm running these instructions:
c = twint.Config()
#Represented command: twint -u USERNAME --images -o USERNAME.csv --csv
c.Username = "username”
c.Images = True
c.Store_csv = True
c.Output = "%s.csv" % username
twint.run.Search(c)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1266 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACBIGXQ366I5WHG5U3XRIL3UL6QNPANCNFSM5DCCJCJQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
still having the same issue on Linux even after trying this solution...in my case, now twint only returns ~90 tweets about "apple" and "$aapl" for one date...
|
Where do i find/open this url.py file @klojohn? |
Thanks a lot solved for me. |
This solution worked for me, and I'm using Python IDLE on Windows. Thanks @klojohn! |
Fix not working for me, py 3.9.7 on mac |
This worked for me, thank you. Running on windows, installed with pip |
@klojohn Sir, I managed to receive tweets with a code similar to what you said, but it only gives data for a week, I think the url.py file has been changed. It wasn't exactly what you said. To be removed |
How would it be for Windows? Have you got it? I've been trying things for months, uninstalling and installing and I don't know what else to do. |
My twint version is 2.1.21. It works fine for me on Windows after using the fix posted by @klojohn. Shows all/most tweets that I wanted to see. |
Now it works great with the solution you have indicated. It is wonderful! |
@klojohn solution worked like a charm! (windows) |
Issue Template
Please use this template!
Initial Check
No similar issue found
pip3 install --user --upgrade -e git+https://github.com/twintproject/twint.git@origin/master#egg=twint
;Command Ran
import twint
import nest_asyncio
nest_asyncio.apply()
config = twint.Config()
config.Search = "#gis"
config.Limit=10000
config.Hide_output=True
config.Until = '2016-12-07'
config.Since = '2021-08-01'
config.Store_object = True
twint.run.Search(config)
now you will have some tweets
tweets_as_objects = twint.output.tweets_list
Description of Issue
Environment Details
The text was updated successfully, but these errors were encountered: