Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updating scraping.js to fix HTTP 406 error while runing the test suite #99

Merged
merged 2 commits into from
Aug 20, 2024

Conversation

Jacobojijo
Copy link
Contributor

This is to fix issue #95

Solution

I modified the scraping.js file to use more browser-like User-Agent and Accept headers. Here are the key changes:

  1. Added constants for user agent and accept header:

    const userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36';
    const acceptHeader = 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8';
  • Created a getWithHeaders function to make requests with these headers:

    function getWithHeaders(url) {
        return preq.get({
            uri: url,
            headers: {
                'User-Agent': userAgent,
                'Accept': acceptHeader
            }
        });
    }
  • Updated all preq.get() calls to use getWithHeaders() instead.

  • Modified the meta() function calls to include the headers:

    return meta({
        uri: url,
        headers: {
            'User-Agent': userAgent,
            'Accept': acceptHeader
        }
    })

This should improve the reliability of the test suite, especially when dealing with websites that have stricter requirements for incoming requests.

@Jacobojijo
Copy link
Contributor Author

Jacobojijo commented Jul 19, 2024

@mvolz, any PR review on this?

@Jacobojijo
Copy link
Contributor Author

@mvolz, I realized the issue was that the package.json was not in sync with package-lock.json because there was an update of package.json that was not reflected on package-lock.json.
I have fixed and you can now test the PR.

@Jacobojijo
Copy link
Contributor Author

@mvolz, PR merge?

@mvolz mvolz merged commit f85f073 into wikimedia:master Aug 20, 2024
3 checks passed
@mvolz
Copy link
Collaborator

mvolz commented Aug 20, 2024

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants