Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: impit-based HttpClient implementation #2787

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

Conversation

barjin
Copy link
Contributor

@barjin barjin commented Jan 2, 2025

Adds a new @crawlee-scope package, which exports an HttpClient implementation based on retch-http.

The implementation is still rough around the edges - mostly based on https://github.com/apify-projects/node-curl-impersonate/blob/master/src/http-client.ts. Like the curl-impersonate client, this client doesn't yet offer native support for streamed responses.

Related to #2756

@barjin barjin added the adhoc Ad-hoc unplanned task added during the sprint. label Jan 2, 2025
@barjin barjin self-assigned this Jan 2, 2025
@github-actions github-actions bot added this to the 105th sprint - Tooling team milestone Jan 2, 2025
@github-actions github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Jan 2, 2025
@barjin

This comment was marked as resolved.

@barjin barjin marked this pull request as ready for review January 3, 2025 13:46
@barjin barjin requested review from janbuchar and B4nan January 3, 2025 13:46
@barjin
Copy link
Contributor Author

barjin commented Jan 3, 2025

Example usage:

const crawler = new CheerioCrawler({
    requestHandler: async ({ json }) => {
        console.log(json);
    },
    httpClient: new RetchHttpClient({
        browser: Browser.Firefox,
    }),
    proxyConfiguration: new ProxyConfiguration({
        proxyUrls: ['http://auto:------@proxy.apify.com:8000'],
    }),
});

await crawler.run([
    {
        url: 'https://api.ipify.org/?format=json',
    },
]);

@barjin
Copy link
Contributor Author

barjin commented Jan 3, 2025

Aside from the code review - I'm unsure how the new package will be released. Is the process fully automatic? Do we need to mention this package in some config file? What should we name the package?

@B4nan
Copy link
Member

B4nan commented Jan 3, 2025

it's all automated, as long as it's part of the workspace, it should just work

@B4nan B4nan changed the title feat: retch-http-based HttpClient implementation feat: impit-http-based HttpClient implementation Jan 16, 2025
@B4nan B4nan changed the title feat: impit-http-based HttpClient implementation feat: impit-based HttpClient implementation Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants