Skip to content

v1.60.13

Compare
Choose a tag to compare
@j-mendez j-mendez released this 12 Dec 16:42
· 511 commits to main since this release

What's Changed

This release brings a new feature flag (smart), performance improvements, and fixes.

  • feat(smart): add feat flag smart for smart mode. Default request to HTTP until JavaScript rendering is needed
  • perf(crawl): add clone external checking
  • chore(chrome): fix chrome connection socket keep alive on remote connections
  • feat(chrome_store_page): add feat flag chrome_store_page and screenshot helper
  • chore(decentralize): fix glob build
  • feat(redirect): add transparent top redirect handling

Smart Mode

Smart mode brings the best of both worlds when crawling. It runs HTTP request first until JS page Rendering is required with Chrome.

Screenshots

Taking a screenshot manually can be done with the [chrome_store_page] feature flag.

extern crate spider;

use spider::tokio;
use spider::website::Website;

#[tokio::main]
async fn main() {
    let mut website: Website = Website::new("https://rsseau.fr");
    let mut rx2 = website.subscribe(16).unwrap();

    tokio::spawn(async move {
        while let Ok(page) = rx2.recv().await {
            println!("Screenshotting: {:?}", page.get_url());
            let full_page = false;
            let omit_background = true;
            page.screenshot(full_page, omit_background).await;
            // output is stored by default to ./storage/ use the env variable SCREENSHOT_DIRECTORY to adjust the path.
        }
    });

    website.crawl().await;

    println!("Links found {:?}", website.get_links().len());
}

Full Changelog: v1.50.20...v1.60.13