This is an Apify Actor (using the Apify v3 SDK, based on crawlee) for crawling web content. It extracts the body, title, and other metadata from pages it crawls. In addition, it handles PDF files by downloading the raw data and saving it as base64-encoded data to the dataset.
-
Notifications
You must be signed in to change notification settings - Fork 3
Apify Crawler for Fixie Corpus ingestion.
License
fixie-ai/apify-fixie-crawler
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Apify Crawler for Fixie Corpus ingestion.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published