-
Notifications
You must be signed in to change notification settings - Fork 735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(core): add crawler.exportData()
helper
#2166
Conversation
Once released, this will be used in the homepage example to simplify it: import { PlaywrightCrawler } from 'crawlee';
// PlaywrightCrawler crawls the web using a headless browser controlled by the Playwright library.
const crawler = new PlaywrightCrawler({
// ...
});
// Add first URL to the queue and start the crawl.
await crawler.run(['https://crawlee.dev']);
-// Export the whole dataset to a single file in `./storage/key_value_stores/result.csv`.
-const dataset = await crawler.getDataset();
-await dataset.exportToCSV('result');
+// Export the whole dataset to a single file in `./result.csv`.
+await crawler.exportData('./result.csv');
// Or work with the data directly.
const data = await crawler.getData();
console.table(data.items); |
69c2931
to
2d843a1
Compare
Tests on node 16 were broken after yarn v4 upgrade, so fixed that too in the PR. |
Retrieves all the data from the default crawler `Dataset` and exports them to the specified format. Supported formats are currently 'json' and 'csv', and will be inferred from the `path` automatically. ```ts const crawler = new BasicCrawler({ ... }); crawler.pushData({ ... }); await crawler.exportData('./data.csv'); ```
@vladfrangu any idea why that test keeps failing in the CI? it works locally just fine for me, I thought its something with the relative paths, even tried a 500ms wait, but it still fails the same |
Huh, it works locally? I'll take a look |
the sync version was introduced for easier chaining, but with the `crawler.exportData()` we dont really need it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, this seems pretty helpful. Are there any review guidelines I should follow?
Not really, did you use some previously? Happy for suggestions, we were also discussing some time ago with @vdusek that we could have a PR template with some checklist (or a link to some guidelines page). Some things I usually look for:
|
Thanks, that seems reasonable. Every place I worked at did something pretty close to this 🙂 |
Yeah, I was thinking of just a simple Based on this assignment, LLM provides the following 🙂: ## Description
[Provide a brief description of the problem or feature this pull request addresses.]
## Solution
[Explain how this pull request solves the problem or implements the feature.]
## Issue
[Link to the related issue (e.g., `Closes #123` or `Fixes #456`).]
## Testing
[Describe how the changes have been tested. Include any relevant test cases or steps.]
## Release Steps
[Outline the steps required to release these changes. Include any deployment or configuration steps if applicable.]
## Review Guidance
[Offer tips and guidance for reviewers on how to approach and assess the changes. Explain any specific areas or considerations to focus on.]
## Checklist
- [ ] Code has been reviewed
- [ ] Tests pass successfully
- [ ] Documentation has been updated (if necessary)
- [ ] Issue is closed (if applicable)
- [ ] All discussions are resolved
## Additional Information
[Any additional information, context, or screenshots that might be helpful in reviewing this pull request.] Of course, only if the sections make sense in the context of PR, not saying all of the PRs have to contain them. |
Retrieves all the data from the default crawler
Dataset
and exports them to the specified format. Supported formats are currently 'json' and 'csv', and will be inferred from thepath
automatically.