Mini Web Crawler🌐

A web crawler cli for small-sites. It checks how many internal links exists in a website.

Quick Intallation

clone this repository

git clone https://github.com/RealNai/go-web-crawler.git
cd go-web-crawler

then run

./crawler <website> <max-go-routine> <max-page>

Example:

./crawler https://en.wikipedia.org/wiki/Main_Page 5 50

max-go-routine is the maximum number of concurent go routine that can run at one.

max-page is the maximum number of pages to crawl. Big website like wikipedia takes so much time to crawl. This allows the program to exit prematurely.

Reference

A guided project from https://www.boot.dev/

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
configure.go		configure.go
crawl_page.go		crawl_page.go
crawler		crawler
get_html.go		get_html.go
get_url_html_test.go		get_url_html_test.go
get_urls_from_html.go		get_urls_from_html.go
go.mod		go.mod
go.sum		go.sum
main.go		main.go
normalize_url.go		normalize_url.go
normalize_url_test.go		normalize_url_test.go
print_report.go		print_report.go
validate_arg.go		validate_arg.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mini Web Crawler🌐

Quick Intallation

Reference

About

Releases

Packages

Languages

License

PitiphongK/go-web-crawler

Folders and files

Latest commit

History

Repository files navigation

Mini Web Crawler🌐

Quick Intallation

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages