Web Crawler with Node.js

Welcome to my web crawler project! This project involved setting up a Node.js environment, normalizing URLs, extracting URLs from HTML, and recursively crawling websites to gather data. Additionally, I integrated Jest for test-driven development, providing a solid foundation for reliable and maintainable code.

Introduction

This web crawler project involves setting up a Node.js environment, normalizing and extracting URLs from HTML, and recursively crawling websites to gather data.

Features

Normalize URLs: Ensures consistency in URL format.
Extract URLs from HTML: Parses HTML content to find and extract URLs.
Recursive Crawling: Crawls web pages recursively to gather data.
Generating a Report: Generates a report on the status of the crawled pages.
Test-Driven Development: Uses Jest for testing to ensure code reliability.

Technologies Used

Node.js: Backend runtime environment.
JavaScript: Programming language for writing the crawler logic.
Fetch API: For making HTTP requests to web pages.
Jest: Testing framework for JavaScript.

Installation

To run this project locally, follow these steps:

Clone the repository:

git clone https://github.com/WhisperNet/webCrawler-node.js.git

Navigate to the project directory:
```
cd webCrawler-node.js
```
Install dependencies:
```
npm install
```

Usage

Run the crawler:
```
npm run start https://example.com
```
Run tests:
```
npm run test
```

Lessons Learned

Throughout this project, I gained valuable insights into:

URL Normalization: Ensuring consistency and correctness in URL formats.
HTML Parsing: Extracting useful information from HTML content.
Recursive Algorithms: Implementing recursive logic for web crawling.
Test-Driven Development: Writing tests with Jest to ensure code reliability and maintainability.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
.nvmrc		.nvmrc
README.md		README.md
crawler.js		crawler.js
crawler.test.js		crawler.test.js
main.js		main.js
package-lock.json		package-lock.json
package.json		package.json
report.js		report.js
report.test.js		report.test.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Crawler with Node.js

Table of Contents

Introduction

Features

Technologies Used

Installation

Usage

Lessons Learned

About

Releases

Packages

Languages

WhisperNet/webCrawler-node.js

Folders and files

Latest commit

History

Repository files navigation

Web Crawler with Node.js

Table of Contents

Introduction

Features

Technologies Used

Installation

Usage

Lessons Learned

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages