Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Still Getting 429 Errors After 4.9.1 Update #573

Closed
searchsolved opened this issue Apr 10, 2023 · 21 comments
Closed

Still Getting 429 Errors After 4.9.1 Update #573

searchsolved opened this issue Apr 10, 2023 · 21 comments

Comments

@searchsolved
Copy link

I'm still getting consistent 429 errors after the latest PyTrends update.

Should I by doing something different? I've read the docs and they look the same as before the PyTrends update.

Thanks.

@uzairamer
Copy link

uzairamer commented Apr 10, 2023

Basically, you need to get the NID cookie. Following solution worked for me:

import requests

session = requests.Session()
session.get('https://trends.google.com')
cookies_map = session.cookies.get_dict()
nid_cookie = cookies_map['NID']

then plug the NID cookie in the TrendReq object

from pytrends.request import TrendReq

TrendReq(hl='en-US', tz=360, retries=3, requests_args={'headers': {'Cookie': f'NID={nid_cookie}'}})

@DumbFace
Copy link

this solution worked for me:

import requests

session = requests.Session()
session.get('https://trends.google.com')
cookies_map = session.cookies.get_dict()
nid_cookie = cookies_map['NID']

then plug the NID cookie in the TrendReq object

from pytrends.request import TrendReq

TrendReq(hl='en-US', tz=360, retries=3, requests_args={'headers': {'Cookie': f'NID={nid_cookie}'}})

Thanks a lot, it worked for me.

@searchsolved
Copy link
Author

Thanks that worked for me too!

@fackse
Copy link

fackse commented Apr 10, 2023

The solution provided by @uzairamer works for me too. Thanks a lot @uzairamer !

@emlazzarin
Copy link
Collaborator

emlazzarin commented Apr 10, 2023 via email

@Terseus
Copy link
Collaborator

Terseus commented Apr 11, 2023

Basically, you need to get the NID cookie. Following solution worked for me:

import requests

session = requests.Session()
session.get('https://trends.google.com')
cookies_map = session.cookies.get_dict()
nid_cookie = cookies_map['NID']

then plug the NID cookie in the TrendReq object

from pytrends.request import TrendReq

TrendReq(hl='en-US', tz=360, retries=3, requests_args={'headers': {'Cookie': f'NID={nid_cookie}'}})

I don't understand why this solution works at all, this is basically what the GetGoogleCookie method is already doing, or at least trying to do.

If someone who can work around the problem using this solution could post more information we could fix it in the library, we need to know at least:

  • what problem are you having; how much successful downloads do you have before having 429 errors?
  • how are you using the TrendReq objects? Do you create one object for every request, or create it once for a whole batch?
  • how are you applying this? Manually before every request? Once for every 100 requests?

@whatalnk
Copy link

I don't understand why this solution works at all, this is basically what the GetGoogleCookie method is already doing, or at least trying to do.

It seems that GetGoogleCookie is not working as intended.

I accessed to https://trends.google.com/trends/explore?geo=JP, which is used by GetGoogleCookie, from Google Chrome (Secret Window):

  • First time: it returned error (404 or 429)
  • Second time (after reload): It returned normal responce

https://trends.google.com/trends?geo=JP is ok, and it redirected to https://trends.google.com/home?geo=JP.

@Terseus
Copy link
Collaborator

Terseus commented Apr 11, 2023

Hi @whatalnk

That request is expected to fail with a 429, pytrends doesn't care if it works or not, it's only meant to generate a valid NID cookie.

Even when the request fails the backend returns a NID cookie that we can use to the next request(s).

Thanks for the response ,though.

@fackse
Copy link

fackse commented Apr 11, 2023

Currently, I want to retrieve the trends for about 5536 terms. Before the hint with the NID cookie came here, I tried it like this:

[...]
pytrends = TrendReq(hl='en-US', tz=360, retries=3)
# Set the number of terms per request (maximum 5)
terms_per_request = 5

dfs = []
# Loop through the chunks of unique names
for name_chunk in tqdm(list(chunk(name_time_ranges_tuples, terms_per_request)), desc="Fetching trends data"):
    names, time_ranges = zip(*name_chunk)
    try:
        # Build the payload with the current name chunk
        pytrends.build_payload(names, timeframe="2023-03-06 2023-04-03", geo='US')
        result = pytrends.interest_over_time()
        result = result.drop('isPartial', axis=1)
        dfs.append(result)
    except Exception as e:
        print(f"Error for {names}: {e}")
        
    time.sleep(60)
[...]

With the 5536 terms (chunked) I got 230 times the error 429. I didn't follow the progress "live" to the end. But at the beginning, the error came about every second call. Apparently, the error occurred less frequently later.

On another machine I used the following code. In this case the abort condition was not reached once:

def process_chunk(chunk, index, retries=3):
    for attempt in range(retries):
        try:
            pytrends.build_payload(chunk, timeframe="2023-03-06 2023-04-03", geo='US')
            result = pytrends.interest_over_time()
            result = result.drop('isPartial', axis=1)
            return result
        except Exception as e:
            if attempt < retries - 1:
                print(f"Error processing chunk {index} (attempt {attempt + 1}): {e}. Retrying...")
            else:
                print(f"Error processing chunk {index} (final attempt {attempt + 1}): {e}. Giving up.")
                return None

session = requests.Session()
session.get('https://trends.google.com')
cookies_map = session.cookies.get_dict()
nid_cookie = cookies_map['NID']
proxy = 'http://<user>:<pw>@isp2.hydraproxy.com:9989'

trends_df = pd.DataFrame()
dfs = []

pytrends = TrendReq(hl='en-US', tz=360, retries=3, proxies=[proxy]*100000,requests_args={'headers': {'Cookie': f'NID={nid_cookie}'}})

with Progress() as progress:
    task = progress.add_task("[cyan]Processing chunks...", total=len(chunks))

    for index, chunk in enumerate(chunks):
        result = process_chunk(chunk, index)
        if result is not None:
            dfs.append(result)
        progress.update(task, advance=1)

trends_df = pd.concat(dfs)

@Terseus
Copy link
Collaborator

Terseus commented Apr 11, 2023

Hi @fackse,

I see you're using proxies in the second solution, but not in the first.

The TrendReq class behaves differently between when using proxies and when doesn't:

  • when using proxies, the instance will retrieve a new NID cookie for every request.
  • when not using them the instance will reuse the same NID over and over again.

Please, can you try to execute the first solution using proxies in the same way you do in the second solution and report if there's any difference in the error rate?

Thanks a lot.

@whatalnk
Copy link

Thank you for your quick reply. This comment was my misunderstandig.

Well, I found funny things.

I downloaded source code (pytrends==4.9.1) from PyPI, and checked the contents.
It seems that the change from the commit f6b2d0c is not included.

In requests.py, GetGoogleCookie is still use f'{BASE_TRENDS_URL}/?geo={self.hl[-2:]}', not f'{BASE_TRENDS_URL}/explore/?geo={self.hl[-2:]}'

@fackse
Copy link

fackse commented Apr 11, 2023

@Terseus : I would like to do that, but now another error occurs. No idea if it is related to this. In any case, I can't use pytrends right now.
Tested on two different machines. I am working in Jupyter Notebook. Have restarted the kernel several times, also restarted jupyter itself.:

I also took a look at the results from the last run and did some sampling and compared the results to those on the Google Trends page. It showed that pytrends often reports NaNs where values are actually present. I don't want to post the data here publicly, but I can send you an excerpt if you want.

Edit:
Reported error was due to user stupidity 😅 Thank you @whatalnk

@whatalnk
Copy link

@fackse

pytrends.build_payload("Meech", timeframe="2023-03-06 2023-04-03", geo='US')

kw_list is interpreded as M, e, e, c, h, not Meech

How about

pytrends.build_payload(["Meech"], timeframe="2023-03-06 2023-04-03", geo='US')

@fackse
Copy link

fackse commented Apr 11, 2023

@fackse

pytrends.build_payload("Meech", timeframe="2023-03-06 2023-04-03", geo='US')

kw_list is interpreded as M, e, e, c, h, not Meech

How about

pytrends.build_payload(["Meech"], timeframe="2023-03-06 2023-04-03", geo='US')

You're absolutely right, my bad! 🤦‍♂️

@Terseus
Copy link
Collaborator

Terseus commented Apr 11, 2023

Hi @whatalnk,

Thank you for your quick reply. This comment was my misunderstandig.

Well, I found funny things.

I downloaded source code (pytrends==4.9.1) from PyPI, and checked the contents. It seems that the change from the commit f6b2d0c is not included.

In requests.py, GetGoogleCookie is still use f'{BASE_TRENDS_URL}/?geo={self.hl[-2:]}', not f'{BASE_TRENDS_URL}/explore/?geo={self.hl[-2:]}'

Thanks a lot for checking it and raising it here.

I've checked the package in pypi.org (both wheel and sdist) and you're right, the version 4.9.1 doesn't contain the fix from the PR #570.

The version 4.9.1 is generated from the commit ed8c400dd9e0b52d878187802ad01c4f7e1b9a71 which original branch doesn't contain the code from #570.

Please @emlazzarin can you please make a 4.9.2 release from the current master branch?

In the meantime, please @fackse install pytrends from the current master branch and retry your code again, to do it you can:

  1. clone the repo in your machine, e.g. git clone https://github.com/GeneralMills/pytrends /home/fackse/pytrends.
  2. install the code directly from the repo in editable mode with pip install -e /home/fackse/pytrends.

Thank you.

@fackse
Copy link

fackse commented Apr 11, 2023

It appears to be working! I installed it using "pip install git+https://github.com/GeneralMills/pytrends". I tested it with 1108 requests, each consisting of 5 keywords. To speed up the process, I used ThreadPoolExecutor to parallelize it and a proxy. The call was made with the following code:

pytrends = TrendReq(hl='en-US', tz=360, retries=3, proxies=[proxy]*100000)
pytrends.build_payload(chunk, timeframe="2023-03-06 2023-04-03", geo='US')

During the test, I encountered 196 instances of the message "Proxy error. Changing IP". To better control the behavior after the third attempt, I set the retries parameter to 3. Only 15 times did the request fail to go through within three attempts, in which case I had to re-initialize TrendReq (again with 3 retries - as seen in the code above).

@datacubed
Copy link

@emlazzarin @Terseus sorry to pester, but any idea when we would get this much needed release?

@emlazzarin
Copy link
Collaborator

emlazzarin commented Apr 13, 2023 via email

@emlazzarin
Copy link
Collaborator

https://pypi.org/project/pytrends/4.9.2/ has now been updated with the correct code. Thanks!

@JUSTINDSBAUTISTA
Copy link

Hello,

I don't know if you will response or not, but it's good to give it a try.
I am intern, and have a task to SCRAPE the google trends.

  • I have to get the RelatedQueries and the graph.
  • and write it to .CSV

I did research for almost a week, but I don't get any guide from my work place.
But I found this code from the internet, that takes the RELATED QUERIES & RELATED TOPICS but not the Graph and save it as .csv file.
But you have to copy and paste each URL from the NETWORK.

I would be happy to learn PYTHON.
Thank you for your help.

@JUSTINDSBAUTISTA
Copy link

import httpx
import json
import pandas as pd

# Set the geographical location to the United States

geo_location = "CA"

# Add the API URLs
topics_url = f"https://trends.google.com/trends/api/widgetdata/relatedsearches?hl=en-US&tz=240&req=%7B%22restriction%22:%7B%22geo%22:%7B%22country%22:%22CA%22%7D,%22time%22:%222024-09-12T18%5C%5C:05%5C%5C:57+2024-09-13T18%5C%5C:05%5C%5C:57%22,%22originalTimeRangeForExploreUrl%22:%22now+1-d%22,%22complexKeywordsRestriction%22:%7B%22keyword%22:%5B%7B%22type%22:%22BROAD%22,%22value%22:%22programming%22%7D%5D%7D%7D,%22keywordType%22:%22ENTITY%22,%22metric%22:%5B%22TOP%22,%22RISING%22%5D,%22trendinessSettings%22:%7B%22compareTime%22:%222024-09-11T18%5C%5C:05%5C%5C:57+2024-09-12T18%5C%5C:05%5C%5C:57%22%7D,%22requestOptions%22:%7B%22property%22:%22%22,%22backend%22:%22CM%22,%22category%22:0%7D,%22language%22:%22en%22,%22userCountryCode%22:%22CA%22,%22userConfig%22:%7B%22userType%22:%22USER_TYPE_LEGIT_USER%22%7D%7D&token=APP6_UEAAAAAZuXQhcOGUPuW6AsOipJBkyUfkQjfuYgk"

queries_url = f"https://trends.google.com/trends/api/widgetdata/relatedsearches?hl=en-US&tz=240&req=%7B%22restriction%22:%7B%22geo%22:%7B%22country%22:%22CA%22%7D,%22time%22:%222024-09-12T18%5C%5C:05%5C%5C:57+2024-09-13T18%5C%5C:05%5C%5C:57%22,%22originalTimeRangeForExploreUrl%22:%22now+1-d%22,%22complexKeywordsRestriction%22:%7B%22keyword%22:%5B%7B%22type%22:%22BROAD%22,%22value%22:%22programming%22%7D%5D%7D%7D,%22keywordType%22:%22QUERY%22,%22metric%22:%5B%22TOP%22,%22RISING%22%5D,%22trendinessSettings%22:%7B%22compareTime%22:%222024-09-11T18%5C%5C:05%5C%5C:57+2024-09-12T18%5C%5C:05%5C%5C:57%22%7D,%22requestOptions%22:%7B%22property%22:%22%22,%22backend%22:%22CM%22,%22category%22:0%7D,%22language%22:%22en%22,%22userCountryCode%22:%22CA%22,%22userConfig%22:%7B%22userType%22:%22USER_TYPE_LEGIT_USER%22%7D%7D&token=APP6_UEAAAAAZuXQhdFr9kwhcYtatVcsQ2f0ELPYNdUo"

# Get the data from the API URLs
topics_response = httpx.get(url=topics_url)
queries_response = httpx.get(url=queries_url)

# Remove the extra symbols and add the data into JSON objects
topics_data = json.loads(topics_response.text.replace(")]}',", ""))
queries_data = json.loads(queries_response.text.replace(")]}',", ""))

result = []

# Prase the topics data and the data into the result list
for topic in topics_data["default"]["rankedList"][1]["rankedKeyword"]:
    topic_object = {
        "Title": topic["topic"]["title"],
        "Search Volume": topic["value"],
        "Link": "https://trends.google.com/" + topic["link"],
        "Geo Location": geo_location,
        "Type": "search_topic",
    }
    result.append(topic_object)

# Prase the querires data and the data into the result list
for query in queries_data["default"]["rankedList"][1]["rankedKeyword"]:
    query_object = {
        "Title": query["query"],
        "Search Volume": query["value"],
        "Link": "https://trends.google.com/" + query["link"],
        "Geo Location": geo_location,
        "Type": "search_query",
    }
    result.append(query_object)

print(result)

# Create a Pandas dataframe and save the data into CSV
df = pd.DataFrame(result)
df.to_csv("keywords.csv", index=False)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants