Releases: spider-rs/spider
Spider v1.9.0
What's Changed
You can now gather all the content for all of your pages in one go between tlds and subdomains.
-- Example
extern crate spider;
use spider::website::Website;
fn main() {
let mut website: Website = Website::new("https://rsseau.fr");
website.configuration.subdomains = true;
website.configuration.tld = true;
website.crawl();
for page in website.get_pages() {
println!("- {}", page.get_url());
}
}
Full Changelog: v1.8.0...v1.9.0
v1.8.0
What's Changed
- feat(time): add page duration uptime tracking with the feature flag
time
.
Full Changelog: v1.7.22...v1.8.0
Spider v1.7.22
What's Changed
-- Other Changes
Update docs on respect robots txt handling and missing rust-docs.
Full Changelog: v1.7.19...v1.7.22
Spider v1.7.8
What's Changed
- chore(concurrency): add simultaneous multithreading detection by @j-mendez in #45
- perf(concurrency): increase default concurrency limit by @j-mendez in #46
- feat(cli): add comma separated list ability blacklist
- chore(cli): fix rust verbose log output
--
about .5s performance shaved between benchmark Spider v1.6.1.
Full Changelog: v1.7.3...v.1.7.7
Full Changelog: v.1.7.7...v.1.7.8
Crawl sync option
- ability to crawl links in sync.
fn main() {
// crawl one by one
let mut website: Website = Website::new("https://choosealicense.com");
website.crawl_sync();
}
or via the cli.
spider -d https://rsseau.fr crawl -s
What's Changed
- chore(log): add crate log default logger by @j-mendez in #42
- feat(delay): add non blocking delay scheduling by @j-mendez in #43
Full Changelog: v1.6.1...v1.7.3
Spider v1.6.1
Performance Tuned
Speed of crawler cranked up a notch and now the fastest open-source spider crawler available. View the benchmarks in the CI action for results. If you know of any alternative crawlers feel free to open an issue so we can add the benchmark comparisons.
What's Changed
- test(bench): add self task execution of bench by @j-mendez in #40
- perf(links): filter dup links after async batch
- chore(delay): fix crawl delay thread groups
Full Changelog: v1.6.0...v1.6.1
Perf increased after commit 053eea4.
Benchmarks against crolly
and node-crawler
(cases are about identical in implementation ) .
Crawl sync api fix
This release brings fixing the thread handling of async task with the client that is established on the main thread.
Crawl speed is improved drastically due to the incorrect handling between the client pool releasing between threads.
What's Changed
- chore(log): add log util by @j-mendez in #31
- feat(regex): add optional regex black listing by @j-mendez in #36
- perf(crawl): improve crawl link exclusion by @j-mendez in #37
- perf(parsing): add async parallel page handling by @j-mendez in #38
- perf(client): fix blocking and async mixture by @j-mendez in #39
Full Changelog: v1.5.1...v1.6.0