feat(crawl): add subdomain and tld crawling #59

j-mendez · 2022-06-24T16:55:50Z

add subdomain crawling ability
add tld crawling ability

Collectively allows for gathering all pages that relate to a website bare host name with all . or tld extenstion or subdomains without sacrificing speed on crawl.

use spider::website::Website;
fn main() {
  let mut website: Website = Website::new("https://a11ywatch.com"); 
  website.configuration.subdomains = true;
  website.configuration.tld = true;
  website.crawl();
}

--
Examples of output to validate since current test cases / examples do not use subdomains.

Before 25 links on the domain a11ywatch.com -

After 50+ links on the domain a11ywatch.com -

--

This pr combines two features into one - subdomains and tld ignoring. It might make sense moving tld to a different variable and option since anyone can own a tld thats not attached to the exact hostname. You can use the combination of blacklist url to ignore certain tld extensions. Example - you can own myspace.com and someone else has the domain for myspace.net.

j-mendez force-pushed the feat/crawl-subdomains branch 2 times, most recently from 8e79647 to 017baa1 Compare June 24, 2022 17:09

feat(crawl): add subdomain crawling with tld ignore

a72d1e7

j-mendez force-pushed the feat/crawl-subdomains branch from 017baa1 to a72d1e7 Compare June 24, 2022 17:09

j-mendez changed the title ~~feat(crawl): add subdomain crawling with tld ignore~~ feat(crawl): add subdomain and tld crawling Jun 24, 2022

j-mendez requested a review from madeindjs June 24, 2022 17:44

j-mendez force-pushed the feat/crawl-subdomains branch from 23251a2 to 9664a0a Compare June 24, 2022 18:15

chore(tld): add tld config seperate

6dec196

j-mendez force-pushed the feat/crawl-subdomains branch from 9664a0a to 6dec196 Compare June 24, 2022 18:17

j-mendez merged commit 760b3ce into master Jun 24, 2022

j-mendez deleted the feat/crawl-subdomains branch June 24, 2022 18:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(crawl): add subdomain and tld crawling #59

feat(crawl): add subdomain and tld crawling #59

j-mendez commented Jun 24, 2022 •

edited

Loading

feat(crawl): add subdomain and tld crawling #59

feat(crawl): add subdomain and tld crawling #59

Conversation

j-mendez commented Jun 24, 2022 • edited Loading

j-mendez commented Jun 24, 2022 •

edited

Loading