WebDiver - Website Crawler

WebDiver is a Python-based tool designed for extracting and analyzing comprehensive information from websites. It caters specifically to the needs of the Open Source Intelligence (OSINT) community, providing robust capabilities for web data extraction, link analysis, metadata retrieval, and more.

Features

Data Extraction

WebDiver systematically retrieves and analyzes data from web pages, including titles, descriptions, internal links, and external links. This capability is crucial for understanding the structure and content of target websites.

Link Analysis

The tool identifies relationships between different web pages through internal and external links. This analysis provides insights into information flow, dependencies within a site, and potential connections to other web entities.

Metadata Retrieval

WebDiver extracts metadata such as titles, descriptions, and other meta tags from web pages. This information helps in understanding a website's focus, content, and possibly its ownership or creator details.

Email Extraction

It automatically finds and extracts email addresses embedded within web pages. This feature aids in discovering contact information or conducting further investigations related to email communications.

Error Handling and Logging

WebDiver implements robust error handling mechanisms to manage encountered issues during the crawling process. Errors are logged comprehensively, providing transparency and facilitating troubleshooting.

IP Information Retrieval

The tool fetches detailed IP information for target URLs, including ASN (Autonomous System Number), ASN CIDR (Classless Inter-Domain Routing), country, city, latitude, and longitude. This feature enhances the understanding of server locations and network infrastructure associated with a website.

Asynchronous Processing

Utilizing asyncio, WebDiver performs asynchronous web crawling and data extraction. This approach optimizes performance by allowing concurrent operations, improving speed and efficiency in data retrieval tasks.

Progress Monitoring

The integration of tqdm provides a visual progress bar during the extraction of external links. This feature offers real-time feedback on task completion, enhancing user experience and task management.

Usage Instructions

Prerequisites:
- Ensure Python 3 is installed on your system. If not, download it from python.org and follow the installation instructions.
Running the Script:
- Navigate to the directory containing webdiver.py using a terminal or command prompt.
- Execute the script by running:
```
git clone https://github.com/AnonCatalyst/WebDiver && cd WebDiver
python3 install.py
python3 webdiver.py
```
  Replace python3 with python if python3 command is not recognized on your system.
On-screen Instructions:
- Enter the target URL (Enter target URL:) when prompted. This specifies the starting point for the website crawling process.
- WebDiver initiates crawling from the specified URL, extracting comprehensive information and displaying progress using the progress bar.

Notes

Customize the script to meet specific requirements or enhancements needed for your OSINT investigations or web analysis tasks.
Ensure internet connectivity during script execution to fetch web pages and extract information effectively.

WebDiver enhances the ability to gather, analyze, and interpret web-based data, making it an invaluable tool for researchers, investigators, cybersecurity professionals, and businesses interested in understanding online activities and trends.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
src		src
LICENSE		LICENSE
README.md		README.md
install.py		install.py
requirements.txt		requirements.txt
webdiver.py		webdiver.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebDiver - Website Crawler

Features

Data Extraction

Link Analysis

Metadata Retrieval

Email Extraction

Error Handling and Logging

IP Information Retrieval

Asynchronous Processing

Progress Monitoring

Usage Instructions

Notes

About

Releases

Packages

Contributors 2

Languages

License

AnonCatalyst/WebDiver

Folders and files

Latest commit

History

Repository files navigation

WebDiver - Website Crawler

Features

Data Extraction

Link Analysis

Metadata Retrieval

Email Extraction

Error Handling and Logging

IP Information Retrieval

Asynchronous Processing

Progress Monitoring

Usage Instructions

Notes

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages