Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix user.videos crawling issue #1141

Merged
merged 1 commit into from
Jul 29, 2024
Merged

Conversation

lhphat02
Copy link
Contributor

@lhphat02 lhphat02 commented Apr 13, 2024

Issue: Can't crawl videos from user.videos more than 35 videos (even if using cursor).

The code I used for testing:

async def get_user_videos(username):
    start_time = time.time()
    row_count = 0

    async with TikTokApi() as api:
        await api.create_sessions(headless=False, ms_tokens=[ms_token], num_sessions=1, sleep_after=3)
        user = api.user(username)
        user_data = await user.info()
        post_count = user_data["userInfo"]["stats"].get("videoCount")

        async for video in user.videos(count=post_count):
            url = f"https://www.tiktok.com/@{video.as_dict['author']['uniqueId']}/video/{video.id}"
            print(f"URL: {url}") 
            row_count += 1

    end_time = time.time()
    elapsed_time = end_time - start_time
    print(f"Execution time: {elapsed_time} seconds")
    print(f"Total rows: {row_count}")
    print(f"Rows per second: {row_count / elapsed_time}")

Before modifying the videos method:

2024-04-13 14:20:04,220 - TikTokApi.tiktok - ERROR - Got an unexpected status code: {'log_pb': {'impr_id': '202404130720035F6460F88AF0BF0E31DB'}, 'statusCode': 10201, 'statusMsg': '', 'status_code': 10201, 'status_msg': ''}
Execution time: 11.261611223220825 seconds
Total rows: 0
Rows per second: 0.0

After modifying the videos method:

URL: https://www.tiktok.com/@sofm_official/video/6817297421245107457
URL: https://www.tiktok.com/@sofm_official/video/6815619623837289729
...
URL: https://www.tiktok.com/@sofm_official/video/6815419939957017857
URL: https://www.tiktok.com/@sofm_official/video/6815113300146228481
URL: https://www.tiktok.com/@sofm_official/video/6814374629558258945
Execution time: 14.023724794387817 seconds
Total rows: 135
Rows per second: 9.626543730666045

Please check this

@anarchopythonista
Copy link
Contributor

Applying this patch locally fixed the issue I was experiencing with the 6.3.0 release. Thank you!

@mi01
Copy link

mi01 commented Apr 29, 2024

This fix is breaking the count parameter, since the function will always return multiples of 35 (or less if the number of videos is smaller). But it might work if we add an additional break statement in this loop:

for video in resp.get("itemList", []):
yield self.parent.video(data=video)
found += 1

for video in resp.get("itemList", []):
    yield self.parent.video(data=video)
    found += 1
    if found == count:
        break

Still the cursor parameter is useless and confusing for the user of this function.

@davidteather davidteather self-requested a review April 29, 2024 15:47
@jesse-moderwell
Copy link

Applying this patch locally fixed the issue I was experiencing with the 6.3.0 release. Thank you!

Same!

@vagvalas
Copy link

vagvalas commented Jun 29, 2024

Same here, tried with hashtag.py got limit on 35-40..
tried to figured out what did you done here, i dont quietly understand completely.
I found
/TikTokApi/api/user.py

     found = 0
        while found < count:
            params = {
                "secUid": self.sec_uid,
                "count": count,
                "cursor": cursor,
            }

but in hashtag.py is already the changed version of yours, but doesnt crawl more..

/TikTokApi/api/hashtag.py

        found = 0
        while found < count:
            params = {
                "challengeID": self.id,
                "count": 35,
                "cursor": cursor,
            }

So for hashtag doesn't the 35 multiples works.. just fetches 35 and stops..
am i missing something?

@davidteather davidteather changed the base branch from main to v6.4.0 July 29, 2024 19:14
Copy link
Owner

@davidteather davidteather left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@davidteather davidteather merged commit 4d7e0c0 into davidteather:v6.4.0 Jul 29, 2024
davidteather added a commit that referenced this pull request Jul 29, 2024
* Fix user.videos crawling issue (#1141)
* added support for statsV2 (#1143)
* Restored functionality for download bytes method (#1174)
* [PlayList] Add playlist to the user scrapy api (#1177)
* bump to 6.4.0

---------

Co-authored-by: Phat Luu Huynh <lhphat.dev@gmail.com>
Co-authored-by: ekorian <korian.edeline@gmail.com>
Co-authored-by: Ben Steel <bendavidsteel@gmail.com>
Co-authored-by: wu5bocheng <wu5bocheng@gmail.com>
@wouterdedroog
Copy link

This fix is breaking the count parameter, since the function will always return multiples of 35 (or less if the number of videos is smaller). But it might work if we add an additional break statement in this loop:

I noticed this after updating to 6.4.0 from 6.3.0. Does requesting videos in increments of 35 cause any additional requests (instead of requesting just 5 videos)? @mi01

@mi01
Copy link

mi01 commented Aug 6, 2024

This fix is breaking the count parameter, since the function will always return multiples of 35 (or less if the number of videos is smaller). But it might work if we add an additional break statement in this loop:

I noticed this after updating to 6.4.0 from 6.3.0. Does requesting videos in increments of 35 cause any additional requests (instead of requesting just 5 videos)? @mi01

This causes no additional request AFAIK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants