-
-
Notifications
You must be signed in to change notification settings - Fork 975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG/SITE ENHANCEMENT] Weibo truncates when additional content exist. #2601
Comments
Please use triple backticks instead of one, your logs are unreadable. |
I'm willing to settle for just the bare media content url for now, which the PC ajax gives in the pid entry. However i'd be great if you could get the other misc metadata as well. I think what i need can be achieved with wget but i've got little idea how to have it recurse over the 'sinceid=' entries. |
Sorry, changed. Didn't know that was a thing. |
My issue with desktop Weibo, and why gallery-dl is using the mobile API, is its Sina Visitor System, which redirects any request from "overseas" to https://passport.weibo.com/. yt-dlp seems to be able to work around this. I'll see what I csn do. |
I seem to be able to bypass this by first visiting certain profile pages, which act like a sort of "referer", and essentially grants me privilege which sometimes only logged in users have, like the ajax thing. Not every profile works, as i've come across "blacklisted" ones such that you'll never be able to view unless you've actually logged in. If you were to try to visit a "standard" profile which isn't a "referer" profile nor a "blacklisted" profile, it will, as you've said, direct you to the passport page which makes you to login. However, if you were to visit the "referer" profiles first, then try to visit the "standard" profile, it doesn't ask you to login anymore. Kinda strange and rather difficult to reproduce. An issue with reproducing it is when you do try to use the "referer" on incognito mode, it doesn't work at all. My testing were done with merely clearing cookies and cache, but i'm not sure if everything is sanitized to initial conditions. Trying to be abit more rigorous, tried on edge which i've never really used. Didn't have to login to access those ajax, but had to be "refered" first it as described above. |
These are the api things to call for actual posts instead of just the user's album/gallery. These should include retweets and other stuff, which should be more general. I forgot to add this.
Its pretty much the same as before, should be able to interate through the since_id= to get all of the posts. |
Weibo's extractor got updated to use the desktop API. (a069281) With I will look into supporting downloading from a user's gallery next. (#686) |
General Didn't realize Weibo had sub-albums. That'll be the next thing I'm working on, I guess. |
Thanks a ton. Must've been a lot of trouble for you. I've always wondered, do you speak/read Chinese? is the language barrier problematic for you? I've tried asking yt-dlp to support this but since you're also working on this, might as well hope if you can integrate this. Weibo has this "video gallery" page that is similar to photo gallery. has the same get structure Example link from this user
All of the available video formats and urls of each post are listed explicitly. It would be great if they could be grabbed with youtube-dl or yt-dlp, and then saved under the metadata of their respective posts. If you have time, that is. |
Thanks for updating. However the original function
Doesn't seem to be working anymore? Is this intended? |
No, it's not. I have, again, managed to slip in a small typo that breaks things: gallery-dl/gallery_dl/extractor/weibo.py Line 203 in c2d1171
The single space character at the beginning of this string breaks things. Workaround: append
I do not. I can only speak/read German and English.
No, not really. Many important things are still in English (URLs, parameters, etc) and anything else can be translated with translate.google.com or the like.
Where did you get this from? When I click around in this profile I only get
Videos currently get grabbed without ytdl, albeit in their maximum resolution / quality. For example https://weibo.com/6077799204/LgvbpoSNW from #2367 now gets downloaded as 3840x2160 60fps video, but, at the moment, there is no option to grab it at a smaller resolution. |
I see, thank you implementing all these changes for this site.
Sorry i'm not able to recreate this, i've no idea how it i got to ?tabtype=video. But its different from ?tabtype=newVideo. It doesn't seem to different though.
The workaround works, However, retweets recognition is broken. when downloading from the user's "feed", it doesn't differentiate between user's posts vs a 'retweeted' post. It just downloads everything into 1 folder as if its a post by the user.
is thus also broken. |
I'm not sure why Weibo's mobile api truncates but it certainly does stop reliably at a certain point, unfortunately.
This is the output I get with gallery-dl
Removed some very repetitive lines "[10][DEBUG] <urllib3.connectionpool> <connectionpool.py> <C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\connectionpool.py>" as well as the sinaimg domain for the media as it was too long to fit the 65k character limit.
As you can see, it truncates at https://m.weibo.cn/api/container/getIndex?type=uid&value=3164968121&containerid=1076033164968121&since_id=4419858450470528
I can go into it and see that the last post is https://weibo.com/0/4418191176198193 which is dated to 19-9-19 11:05
However, from the PC gallery webpage, it has its own version of "thing" like the above. This is what it looks like
The last before it also truncates is https://weibo.com/ajax/profile/getImageWall?uid=3164968121&sinceid=4213105794453395_-1_20180323_-1
And the last post mid it allows me to access is https://weibo.com/0/3958298363435082 which is dated 16-3-29 09:35
Tl;dr basically, as of now, gallery-dl will stop midway as the mobile api refuses to serve up any more earlier content, whereas the PC site will. I tested this on a browser where i was not logged on and it allows me to recursively request the PC api-like thing without being logged on.
I'm hoping you could fix this truncating issue, and if any of these helped. Since the PC side appears to work properly, maybe an alternative extractor for weibo could work on the PC api-thing instead of the mobile.
The text was updated successfully, but these errors were encountered: