Skip to content
This repository has been archived by the owner on Mar 30, 2023. It is now read-only.

Twint not fetching Beyond 22 august 2021 #1266

Open
ahmed991 opened this issue Aug 30, 2021 · 51 comments
Open

Twint not fetching Beyond 22 august 2021 #1266

ahmed991 opened this issue Aug 30, 2021 · 51 comments

Comments

@ahmed991
Copy link

ahmed991 commented Aug 30, 2021

Issue Template

Please use this template!

Initial Check

No similar issue found

Make sure you've checked the following:

  • [] Python version is 3.8.8;
  • [] Updated Twint with pip3 install --user --upgrade -e git+https://github.com/twintproject/twint.git@origin/master#egg=twint;
  • [] I have searched the issues and there are no duplicates of this issue/question/request.

Command Ran

import twint
import nest_asyncio
nest_asyncio.apply()
config = twint.Config()
config.Search = "#gis"
config.Limit=10000

config.Hide_output=True

config.Until = '2016-12-07'

config.Since = '2021-08-01'
config.Store_object = True

twint.run.Search(config)

now you will have some tweets

tweets_as_objects = twint.output.tweets_list

Please provide the exact command ran including the username/search/code so I may reproduce the issue.

Description of Issue

Please use as much detail as possible.
t.co/i4nXyn9TC5
1429340378696716291 2021-08-22 12:11:16 +0500 Opportunities for Geographic Information Systems Technician in Pacific, MO #Pacific #GIS #GISTech Apply →: https://t.co/jHgPrMjV9r https://t.co/HVEmFTlJYJ
1429326875789127680 2021-08-22 11:17:37 +0500 <neeri_wwtd> #GIS Representation of #Covid_19 scenario for #India for 22th August 2021,prepared by @CSIR_NEERI Total Vaccination till date 58,14,89,377 (+52,23,612) Active Cases in last 24 hrs - 30,545 #CoronaVirusUpdates @pmoindia #coronavirus #StayHome #COVID19nsw #CovidVic #CovidVaccine https://t.co/sxZwbKnQ1C
1429310797809868800 2021-08-22 10:13:44 +0500 Any recommendations for free online courses for learning Python? #Python #GIS
1429296533405642752 2021-08-22 09:17:03 +0500 #OpenSource Web- #GIS Development Roadmap https://t.co/B9b2B6cXBL #APIs #SoftwareDevelopment #TechJunkieNews https://t.co/wvCa7DU4Yv
1429283432337526785 2021-08-22 08:24:59 +0500 @wormmaps If they would just stay up on latest #GIS technology trends, it would help a lot, and wouldn’t cost nearly as much!
1429281827164917760 2021-08-22 08:18:37 +0500 Y ahora sí que se generaban todas las etiquetas aunque tuviera valores nulos :) #QGIS #GIS 🗺️ https://t.co/ELrGTvup7F
[!] No more data! Scraping will stop now.
found 0 deleted tweets in this search.
as we can see, it stops at 22 august 2021

Environment Details

Using Windows, Running this in Anaconda Jupyter Notebook

@tassog
Copy link

tassog commented Aug 30, 2021

Same here. Since it's a scraper, I'm used to it not getting a lot of old tweets, but today I'm not getting tweets before the last seven days. When I try to use the since/until commands, it only gets a few tweets from teh same day. I'm wondering if Twint started collecting through the REST API, wich has a limit of the last seven days.

@i-decrypt
Copy link

Same here. twint only collecting less than 100 tweets only.

@wtroisey
Copy link

I'm finding the same problem

@Meenu-Jain
Copy link

I am also having the same issue today

@brianwarehime
Copy link

+1

@minibug1021
Copy link

I can get tweets prior to Aug 22, but only 1-2 pages of results, and occasionally (~60%) it will return no tweets.

@tassog
Copy link

tassog commented Aug 31, 2021

I can get tweets prior to Aug 22, but only 1-2 pages of results, and occasionally (~60%) it will return no tweets.

It seems that when the search query has only a few tweets, it can overcome the date limit.

@mariodias
Copy link

I am having the same issue today :(

@hcanalesmx
Copy link

Same issue :(

@jformaldehydem
Copy link

I'm having the same problem, except not just when looking for specific dates. The number of tweets I get is inconsistent and sometimes zero. I have implemented the changes committed in #684 but that has not resolved the problem. I'm not very proficient with python but it seems that these changes are still pointing to the exception unconditionally when the data returned is zero. Is there a way to change this?

@razi9126
Copy link

razi9126 commented Sep 1, 2021

same issue

@WENNA-HUB
Copy link

Same issue :(

@dumix21
Copy link

dumix21 commented Sep 3, 2021

There is a workaround, but it has a limitation to 20 tweets, at least for me. It works to retrieve tweets beyond 22nd of August, but you have to set a small interval for 'c.Since' and 'c.Until'.

e.g.:
c.Since = '2021-03-21'
c.Until = '2021-03-22'

Be aware that even with this one, it fails somethimes. If you set 'c.Pandas' to True, you could check if your dataframe is empty and if so, run again the configuration (twint.run.Search(c))

@klojohn
Copy link

klojohn commented Sep 4, 2021

Ok guys. Just uncomment line 92 in the url.py file:

('query_source', 'typed_query'),

cahnge to
('query_source', 'typed_query'),

This solution works for PC (Linux). It does not seem to work on Raspberry Pi and I have no idea why.

@klojohn
Copy link

klojohn commented Sep 4, 2021

Is there a way to delete previous comments. It's a bit messy.

Here again:

Just uncomment (remove the '#') line 92 in the url.py file:

('query_source', 'typed_query'),

This solution works for PC (Linux). It does not seem to work on Raspberry Pi and I have no idea why.

@i-decrypt
Copy link

Working for me. Thanks to @klojohn

@ahmed991
Copy link
Author

ahmed991 commented Sep 5, 2021

Is there a way to delete previous comments. It's a bit messy.

Here again:

Just uncomment (remove the '#') line 92 in the url.py file:

('query_source', 'typed_query'),

This solution works for PC (Linux). It does not seem to work on Raspberry Pi and I have no idea why.

thanks for the solution @klojohn . But it does not seem to be working for windows.

@NadiaMusafarudin
Copy link

I'm having the same issue, not able to scrape the data using since and until.

@razi9126
Copy link

razi9126 commented Sep 8, 2021

Is the given solution not working for anyone else too? On linux

@sc442
Copy link

sc442 commented Sep 9, 2021

@klojohn Great solution! Initially this is working for me on Mac OS.

@Slyth3
Copy link

Slyth3 commented Sep 16, 2021

Hi all, this seems to be an issue around specific dates and/or tweet, but I cant confirm as the process will stop at random points for each run.

If I note the date where it stopped previously and then rerun the process with - - until
Or
c.until
The process will continue and stop at another "bad tweet"

In the solution provided (to comment out line 92) I tested in a few environments:

  • Does not work in windows Anaconda Jupiter (to confirm)
  • Does not work in Linux (AWS EC2) - Im not sure what Im doing wrong here but seems to work for others
  • Works in Windows Cmd line

@aarorauark
Copy link

Hey Guys, I tried uncommenting line 92 from url.py but still no success. I tried on Jupyter and still received only handful of tweets and all tweets were dated 2010-12-04.
image
image

@JWLMSN
Copy link

JWLMSN commented Sep 18, 2021

@aarorauark How did you install twint?
If you used pip, type "pip3 show twint" into the command line and follow the path shown under "Location". There you'll find a folder named twint and the url.py which you have to modify inside that folder.

@aarorauark
Copy link

Thank you @JWLMSN for getting back to me. I used both git and pip as mentioned in the link (https://github.com/twintproject/twint) and tried twint but faced the similar issue. Could you run in the CLI (twint -s "American Airlines" --since "2010-01-02" --until "2010-12-06" -0 "Test_file.csv" --csv) or run in the Jupyter the commands mentioned in my earlier post (snapshot from jupyter has the commands) and let me know if you are able to fetch all the tweets for the range? There is another issue I have opened in which twint is not returning more than 20 tweets and all tweets happened to be from the same day but also not the full set is returned? (#1276)

@JWLMSN
Copy link

JWLMSN commented Sep 18, 2021

@aarorauark I just tried a run with the parameters you mentioned and the query returns way more data beyond 2010-12-04, although I aborted the script because that would be a lot of data to pull for testing purposes.
My last couple of responses were

10414711921180672 2010-12-02 20:27:15 +0200 <farecomparedeal> Sales for winter/spring from @VirginAmerica @AmericanAir & more. It's Airfare Deals Round-Up Time  http://bit.ly/e90Ukl
10411604629790720 2010-12-02 20:14:54 +0200 <asperkourt> Asper Kourt will be flying first class on American Airlines for the next 3 months . . . k, that's not quite true,...  http://fb.me/MPtFbjZ4

but my guess is it would have run all the way until the specified end date. So it's pretty safe to say your specific query is not the problem. Must be something else.

@aarorauark
Copy link

Thank you @JWLMSN for your time. Twint actually starts 2 days prior to the until date you specify thats what i have noticed. I have collected lots of data back in March this year and pretty big files but somehow it is broken now. Could you please share the file because ideally it would not take more than max 10 min to be honest and with this time range of just couple of months it would take only 5 min? I just want to see - (1) you are getting more than 40 odd tweets and (2) you are able to capture most of the dates because what i am seeing is if you do not specify "until" and for less famous companies or less viral search strings twint fetches data for the past 15 days only from now.

You can simply run for a month only of any year and for any company say "Facebook or Amazon" that has large user generated content on twitter. I just want to see two points that I have mentioned.

Again highly appreciate your time on this.

@DavidPerea
Copy link

I am also having the same problem. I work with Command Prompt (CMD), where I indicate my command: twint -u gofundme

but it only allows me to extract the tweets until September 15th. How can I solve that?

@agnescameron
Copy link

@klojohn 's solution works for me on mac, thankyou!

@theCreativitist
Copy link

Ok guys. Just uncomment line 92 in the url.py file:

('query_source', 'typed_query'),

cahnge to ('query_source', 'typed_query'),

This solution works for PC (Linux). It does not seem to work on Raspberry Pi and I have no idea why.

This worked for me on Windows! Thanks!

@bensilver95
Copy link

I went to try @klojohn 's solution, but that line had already been uncommented in my version of Twint. And I'm still experiencing an issue. I'm on Linux. Did anyone else see that in their version it was already uncommented?

@DavidPerea
Copy link

Ok guys. Just uncomment line 92 in the url.py file:

('query_source', 'typed_query'),

cahnge to ('query_source', 'typed_query'),

This solution works for PC (Linux). It does not seem to work on Raspberry Pi and I have no idea why.

I work with Command Prompt (CMD), where I indicate, for example, my command: twint -u gofundme

How can I apply the solution you indicate?

@Mega-Barrel
Copy link

Ok guys. Just uncomment line 92 in the url.py file:

('query_source', 'typed_query'),

cahnge to ('query_source', 'typed_query'),
This solution works for PC (Linux). It does not seem to work on Raspberry Pi and I have no idea why.

This worked for me on Windows! Thanks!

Not working for me, I am using windows

@7k50
Copy link

7k50 commented Nov 14, 2021

Is there a way to delete previous comments. It's a bit messy.

Here again:

Just uncomment (remove the '#') line 92 in the url.py file:

('query_source', 'typed_query'),

This solution works for PC (Linux). It does not seem to work on Raspberry Pi and I have no idea why.

Your solution worked for me as well.

I am running twint version 2.1.21 on Python 3.9.7, which is the latest version available via pip.

Now I am wondering: is there planned fix for this in the main release?
I guess nothing has happened with this issue yet since twint hasn't been updated on GitHub in a while.

Is there an actively maintained fork of twint somewhere (which preferably includes this fix)?
If twint is no longer actively maintained, are there any alternative software we should be aware of?

FYI: I'm running these instructions:

c = twint.Config()

#Represented command: twint -u USERNAME --images -o USERNAME.csv --csv
c.Username = "username”
c.Images = True
c.Store_csv = True
c.Output = "%s.csv" % username

twint.run.Search(c)

@hpiedcoq
Copy link
Contributor

hpiedcoq commented Nov 14, 2021 via email

@christineeeeee
Copy link

still having the same issue on Linux even after trying this solution...in my case, now twint only returns ~90 tweets about "apple" and "$aapl" for one date...

Ok guys. Just uncomment line 92 in the url.py file:

('query_source', 'typed_query'),

cahnge to ('query_source', 'typed_query'),
This solution works for PC (Linux). It does not seem to work on Raspberry Pi and I have no idea why.

This worked for me on Windows! Thanks!

Not working for me, I am using windows

@MidasHendrik
Copy link

MidasHendrik commented Dec 6, 2021

Just uncomment (remove the '#') line 92 in the url.py file:
('query_source', 'typed_query'),
This solution works for PC (Linux). It does not seem to work on Raspberry Pi and I have no idea why.

Where do i find/open this url.py file @klojohn?
working in google colab
used this for installation:
!git clone --depth=1 https://github.com/twintproject/twint.git
!cd /content/twint && pip3 install . -r requirements.txt
!pip3 uninstall aiohttp
!pip3 install aiohttp==3.7.0
import twint
import nest_asyncio
nest_asyncio.apply()

@Abdelrahmanrezk
Copy link

Ok guys. Just uncomment line 92 in the url.py file:

('query_source', 'typed_query'),

cahnge to ('query_source', 'typed_query'),

This solution works for PC (Linux). It does not seem to work on Raspberry Pi and I have no idea why.

Thanks a lot solved for me.

@eminekahveci
Copy link

Hey Millet, url.py'den 92. bilgiyi yorumlamayı test ettim ama yine de başarılı olamadım. Jupyter'da denedim ve hala sadece bir avuç 2010 tweet aldım ve tüm tweetler-12-04. resim resim

hello, like you, I want to receive tweets with certain hashtags with jupyter notebook, when I do the same commands in jupyternotebook, I get an error. Did you use anaconda 3.6 version, I wonder if that's why mine doesn't work. I would be glad if you could give some information.

@eminekahveci
Copy link

hello, like you, I want to receive tweets with certain hashtags with jupyter notebook, when I do the same commands in jupyternotebook, I get an error. Did you use anaconda 3.6 version, I wonder if that's why mine doesn't work. I would be glad if you could give some information.
Ekran Alıntısı11

@DenseLance
Copy link

Is there a way to delete previous comments. It's a bit messy.

Here again:

Just uncomment (remove the '#') line 92 in the url.py file:

('query_source', 'typed_query'),

This solution works for PC (Linux). It does not seem to work on Raspberry Pi and I have no idea why.

This solution worked for me, and I'm using Python IDLE on Windows. Thanks @klojohn!

@2spoopy4me
Copy link

Fix not working for me, py 3.9.7 on mac

@eamon-keane
Copy link

@aarorauark How did you install twint? If you used pip, type "pip3 show twint" into the command line and follow the path shown under "Location". There you'll find a folder named twint and the url.py which you have to modify inside that folder.

This worked for me, thank you. Running on windows, installed with pip

@eminekahveci
Copy link

@klojohn Sir, I managed to receive tweets with a code similar to what you said, but it only gives data for a week, I think the url.py file has been changed. It wasn't exactly what you said. To be removed

@DavidPerea
Copy link

How would it be for Windows? Have you got it? I've been trying things for months, uninstalling and installing and I don't know what else to do.

@DenseLance
Copy link

@DavidPerea How would it be for Windows? Have you got it? I've been trying things for months, uninstalling and installing and I don't know what else to do.

My twint version is 2.1.21. It works fine for me on Windows after using the fix posted by @klojohn. Shows all/most tweets that I wanted to see.

@DavidPerea
Copy link

@DavidPerea ¿Cómo sería para Windows? ¿Lo tienes? Llevo meses probando cosas, desinstalando e instalando y ya no se que mas hacer.

Mi versión twint es 2.1.21. Funciona bien para mí en Windows después de usar la solución publicada por @klojohn . Muestra todos/la mayoría de los tweets que quería ver.

Now it works great with the solution you have indicated. It is wonderful!

@Antanskas
Copy link

@klojohn solution worked like a charm! (windows)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests