Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REQUEST] Twitter extractor option to ignore media in replies #705

Closed
KaMyKaSii opened this issue Apr 23, 2020 · 12 comments
Closed

[REQUEST] Twitter extractor option to ignore media in replies #705

KaMyKaSii opened this issue Apr 23, 2020 · 12 comments

Comments

@KaMyKaSii
Copy link

I want to download only the media that the user has posted to all of his followers, ignoring any media attached to tweets in response to a specific user. I tried to use the -K option to find out if I could use a filter to ignore replies. As I didn't find it, I hope that it is possible for you to implement this option. Thank you!

@mikf
Copy link
Owner

mikf commented Apr 23, 2020

Doesn't this already happen when downloading from a user's media timeline, i.e. https://twitter.com/USER/media? I'm pretty sure there aren't replies included there.

@KaMyKaSii
Copy link
Author

Doesn't this already happen when downloading from a user's media timeline, i.e. https://twitter.com/USER/media? I'm pretty sure there aren't replies included there.

Still download the media from replies here

$ gallery-dl --verbose https://twitter.com/criminxly/media
[gallery-dl][debug] Version 1.13.5-dev
[gallery-dl][debug] Python 3.8.2 - Linux-4.4.141-perf+-aarch64-with-libc
[gallery-dl][debug] requests 2.22.0 - urllib3 1.25.7
[gallery-dl][debug] Starting DownloadJob for 'https://twitter.com/criminxly/media'
[twitter][debug] Using TwitterMediaExtractor for 'https://twitter.com/criminxly/media'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): twitter.com:443
[urllib3.connectionpool][debug] https://twitter.com:443 "GET /i/profiles/show/criminxly/media_timeline?include_available_features=1&include_entities=1&reset_error_state=false&lang=en HTTP/1.1" 200 7762
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): pbs.twimg.com:443
[urllib3.connectionpool][debug] https://pbs.twimg.com:443 "GET /media/EWO6gIxWkAcNtSt.jpg:orig HTTP/1.1" 503 0
[downloader.http][warning] '503 Service Not Available' for 'https://pbs.twimg.com/media/EWO6gIxWkAcNtSt.jpg:orig' (1/6)
[urllib3.connectionpool][debug] Starting new HTTPS connection (2): pbs.twimg.com:443
[urllib3.connectionpool][debug] https://pbs.twimg.com:443 "GET /media/EWO6gIxWkAcNtSt.jpg:orig HTTP/1.1" 200 16579
/sdcard/gallery-dl/twitter/criminxly/1253050185250492419_1.jpg
[urllib3.connectionpool][debug] https://pbs.twimg.com:443 "GET /media/EWEwLSQXkAM5wVG.jpg:orig HTTP/1.1" 200 82914
/sdcard/gallery-dl/twitter/criminxly/1252335160957325317_1.jpg
[urllib3.connectionpool][debug] https://pbs.twimg.com:443 "GET /media/EWEsXCYXkAISdNM.jpg:orig HTTP/1.1" 200 75279
/sdcard/gallery-dl/twitter/criminxly/1252331310305206273_1.jpg
[urllib3.connectionpool][debug] https://pbs.twimg.com:443 "GET /media/EWBCBp1XgAAlCNG.jpg:orig HTTP/1.1" 200 137430
/sdcard/gallery-dl/twitter/criminxly/1252073302694166528_1.jpg
[urllib3.connectionpool][debug] https://pbs.twimg.com:443 "GET /media/EVwlpnTWAAAfONM.jpg:orig HTTP/1.1" 503 0
[downloader.http][warning] '503 Service Not Available' for 'https://pbs.twimg.com/media/EVwlpnTWAAAfONM.jpg:orig' (1/6)
[urllib3.connectionpool][debug] Starting new HTTPS connection (3): pbs.twimg.com:443
[urllib3.connectionpool][debug] https://pbs.twimg.com:443 "GET /media/EVwlpnTWAAAfONM.jpg:orig HTTP/1.1" 503 0
[downloader.http][warning] '503 Service Not Available' for 'https://pbs.twimg.com/media/EVwlpnTWAAAfONM.jpg:orig' (2/6)
[urllib3.connectionpool][debug] Starting new HTTPS connection (4): pbs.twimg.com:443
[urllib3.connectionpool][debug] https://pbs.twimg.com:443 "GET /media/EVwlpnTWAAAfONM.jpg:orig HTTP/1.1" 200 218898
/sdcard/gallery-dl/twitter/criminxly/1250916229621190662_1.jpg
[urllib3.connectionpool][debug] https://pbs.twimg.com:443 "GET /media/EVvUgiXXQAY9kGx.jpg:orig HTTP/1.1" 200 520405
/sdcard/gallery-dl/twitter/criminxly/1250826994708865027_1.jpg
[urllib3.connectionpool][debug] https://pbs.twimg.com:443 "GET /media/EVruQicXgAE-5rQ.jpg:orig HTTP/1.1" 200 57715
/sdcard/gallery-dl/twitter/criminxly/1250573823847661570_1.jpg
[urllib3.connectionpool][debug] https://pbs.twimg.com:443 "GET /media/EVror2tWoAI6If0.jpg:orig HTTP/1.1" 503 0
[downloader.http][warning] '503 Service Not Available' for 'https://pbs.twimg.com/media/EVror2tWoAI6If0.jpg:orig' (1/6)
[urllib3.connectionpool][debug] Starting new HTTPS connection (5): pbs.twimg.com:443
[urllib3.connectionpool][debug] https://pbs.twimg.com:443 "GET /media/EVror2tWoAI6If0.jpg:orig HTTP/1.1" 503 0
[downloader.http][warning] '503 Service Not Available' for 'https://pbs.twimg.com/media/EVror2tWoAI6If0.jpg:orig' (2/6)
[urllib3.connectionpool][debug] Starting new HTTPS connection (6): pbs.twimg.com:443
[urllib3.connectionpool][debug] https://pbs.twimg.com:443 "GET /media/EVror2tWoAI6If0.jpg:orig HTTP/1.1" 200 269606
/sdcard/gallery-dl/twitter/criminxly/1250567714386718721_1.jpg
[urllib3.connectionpool][debug] https://pbs.twimg.com:443 "GET /media/EViYYa0XgAIMHeD.jpg:orig HTTP/1.1" 200 85810
/sdcard/gallery-dl/twitter/criminxly/1249916460190834689_1.jpg
[urllib3.connectionpool][debug] https://pbs.twimg.com:443 "GET /media/EViXThrWkAEjJR5.jpg:orig HTTP/1.1" 200 199983
/sdcard/gallery-dl/twitter/criminxly/1249915269910331397_1.jpg
[urllib3.connectionpool][debug] https://twitter.com:443 "GET /i/profiles/show/criminxly/media_timeline?include_available_features=1&include_entities=1&max_position=1249915269910331397&reset_error_state=false&lang=en HTTP/1.1" 200 104

Screenshot_20200423-164226583_1
Screenshot_20200423-164229994_1

@mikf
Copy link
Owner

mikf commented Apr 27, 2020

There is now a reply metadata field (c4371a6) that can be used to --filter posts:
gallery-dl --filter "not reply" URL.

@Hrxn
Copy link
Contributor

Hrxn commented Apr 27, 2020

Should this maybe made a default for Twitter?

@mikf
Copy link
Owner

mikf commented Apr 27, 2020

Probably, and it should also be exposed as a regular option like retweets.

@mikf mikf closed this as completed in 9f638c2 Apr 29, 2020
@mikf
Copy link
Owner

mikf commented Apr 29, 2020

"Regular option" done, but I'm still keeping replies enabled for the time being.
I'm going to change its default to false for the next "big" release, i.e. 1.14.0

@KaMyKaSii
Copy link
Author

@mikf thanks for making the filter. Could you also consider making a filter for these videos from other accounts?
IMG-20200509-WA0092

@outlaw240
Copy link

Should it be off by default? While it's pretty cool that it can filter out a bunch of reaction images, I find that it has the potential to skip out on actual content from the user as well (posted as replies), and for some reason also seems to be omitting gifs/videos from threaded tweets, which is weird.

@Hrxn
Copy link
Contributor

Hrxn commented May 10, 2020

[..] to skip out on actual content from the user as well (posted as replies) [..]

Wait what? If a user posts a reply, on his own account, to his own tweet, this counts as such a reply as well, there is no differentiation on Twitter's side?

@outlaw240
Copy link

It appears to be that way, sometimes users will use the reply function if they update a tweet with more content later on, so it won't always be a threaded tweet. It does not seem to matter whether they reply to themselves or another user. An option to filter out only the latter would be useful indeed, because 99% of the time they're just reaction images of some sort.

@jpmc
Copy link

jpmc commented Jul 27, 2022

There is now a reply metadata field (c4371a6) that can be used to --filter posts:
gallery-dl --filter "not reply" URL.

Chiming in 2 years later and this doesn't seem to work. I try that and get:

[twitter][error] FilterError: Evaluating filter expression failed (NameError: name 'reply' is not defined)

Also tried "not reply_to" which did the same error, and checked the twitter extractor file to see if there was any other sort of flag for it, though I'm extremely new to the codebase and have no idea what I'm looking for.

And it still seems the default option is to download reply-tweets, even in /media paths. So I'm getting meme gif responses intermixed with the actual content I'm trying to download. :(

Edit: Apologies as well for the necro-post. But this seemed to be the most applicable issue that already well documents my problem without making an entire duplicate issue for it.

@mikf
Copy link
Owner

mikf commented Jul 27, 2022

@jpmc

The metadata field name to filter replies by got changed to reply_id when switching from HTML parsing to using the REST API in 5bc1097: --filter "not reply_id"

reply_to is only defined for replies, so you either do "reply_id and reply_to", which is kind of redundant, or "locals().get('reply_to')"

There is also a replies config file option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants