-
-
Notifications
You must be signed in to change notification settings - Fork 975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wikimedia] add 'wiki' extractor #6050
Conversation
gallery_dl/extractor/wikimedia.py
Outdated
"generator": "images", | ||
"titles" : path, | ||
} | ||
self.per_page = self.config("limit", 10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I set the default to a larger value, or perhaps even eliminate the option altogether? In theory, setting this to 500
could cause some issues if the wiki is hosted on low end hardware or has poor connectivity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe set it to 50
by default and limit it to 200
so others don't go overboard with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some tests on Fandom and a few smaller wikis and they all tolerated 500
pretty well, so I guess I was just overthinking it.
archive_fmt = "{sha1}" | ||
request_interval = (1.0, 2.0) | ||
|
||
def __init__(self, match): | ||
BaseExtractor.__init__(self, match) | ||
path = match.group(match.lastindex) | ||
|
||
if self.category == "wikimedia": | ||
self.category = self.root.split(".")[-2] | ||
elif self.category in ("fandom", "wikigg"): | ||
self.category = "{}-{}".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be moved to before BaseExtractor.__init__
is called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.category
only has a value after BaseExtractor.__init__
was called, so this not really an option I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. It's a bit of a shame that the Fandom wikis can't be controlled individually, but I guess you could just make separate config files to achieve the same result.
This PR adds the ability to download all media files hosted on a MediaWiki instance.