-
-
Notifications
You must be signed in to change notification settings - Fork 975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add instagram metadata: post_pageurl, post_tags #743
Conversation
Add the following metadata for instagram: - post_pageurl: json string with url of the post page - post_tags: json array with instagram tags extracted from the post description
This way, --write-tags will pick up the post tags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Vrihub for working on it!
Just some minor suggestions, apart the rename from post_pageurl
to url
.
Usally all change for extractor in the first commit message line has the format:
[extractor] short description
Please change the first line of the commit message to something like:
[instagram] add 'url' and 'tags' metadata
....
For both url
and tags
please update existing test in order to test these metadata as well.
Can you please rename post_pageurl
to url
, add tests and squash all commits to one single commit and adjust the commit message as suggested?
Thanks again!
I've made the requested changes (and fixed a bug with duplicate tags). Re: the squash everything into a single commit, IIRC this can be done by the project admin when merging the pull request, right? BTW, I think I'm going to add some other useful metadata: location and tagged users, so please don't merge this yet. |
gallery_dl/extractor/instagram.py
Outdated
@@ -395,6 +394,7 @@ class InstagramImageExtractor(InstagramExtractor): | |||
def __init__(self, match): | |||
InstagramExtractor.__init__(self, match) | |||
self.shortcode = match.group(1) | |||
self._find_tags = re.compile(r'#\w+').findall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to add this to the __init__()
of the base class, so that all Instagram extractors have access to it.
The Travis tests currently have lots of AttributeError: 'InstagramUserExtractor' object has no attribute '_find_tags'
messages in them. You can run those tests locally with python test/test_results.py instagram
to check if anything broke before pushing your commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhm, right. I was misleaded by the absence of an __init__()
in the base class InstagramExtractor
.
A tentative fix is in 1305200
I added __init__()
to the base class, and also had it call Extractor.__init__()
, since all the subcategory extractors in their __init__()
are calling InstagramExtractor.__init__()
(which in the absence of it would resolve in calling Extractor.__init__()
, am I right?)
It works for me, but please confirm that this indeed makes sense.
Or maybe it would be better to define self._find_tags()
as a static method instead?
Add the following metadata for instagram:
post_tagstags: json array with instagram tags extracted from the post description