GitHub - MalikShoaib678/deep-sea-web-crawler: A next generation web crawler. It crawls website urls and javascript files.. Makes sitemap of whole website.(Beta Version)

# 🌐 Web Crawler

This script is a powerful 🕷️ web crawler designed to scan and analyze subdomains. It efficiently crawls through websites, providing valuable insights and information.

## Usage

To use the script, follow these steps:

1. Ensure you have the necessary libraries installed:
   - `threading`
   - `requests`
   - `bs4` from `BeautifulSoup`
   - `urllib.parse`
   - `argparse`
   - `os`
   - `time`
   - `curses`

2. Clone the repository and navigate to the script's directory.
    git clone https://github.com/MalikShoaib678/deep-sea-crawler.git
    cd deep-sea-crawler
3. Provide input by creating a file with the target subdomains. For example:

echo 'testphp.vulnweb.com' > target.txt


4. Execute the command below to start the scan:
```
python3 deep-sea-crawler.py --file target.txt --max_threads 40 --max_depth 3

Results

After the scan is completed, the script generates an output directory (default name: "result") where the results are stored. The following files are created:

🗄️ hidden-input-fields.txt: Contains hidden input fields found in the crawled pages.
🖼️ image_files.txt: Lists the image files discovered during the crawling process.
📜 jsfiles.txt: Lists the JavaScript files found.
📋 jsons.txt: Lists the JSON files encountered.
🔐 password-input-fields.txt: Contains password input fields found in the scanned pages.
📥 submit-input-fields.txt: Lists submit input fields discovered.
✏️ text-input-fields.txt: Contains text input fields found.
🔗 urls.txt: Lists the URLs encountered during the crawling process.
📄 xmls.txt: Lists the XML files encountered.

The script also provides a summary of the scan, displaying the count of URLs, JavaScript files, other files, and images discovered.

To get the line count of each result file, navigate to the output directory(default 'result') and use the following command:

wc -l *

Note

This script is intended for professional use and should be used responsibly. Ensure that you have the necessary permissions before scanning any websites. Additionally, this README provides a brief overview of the script's functionality and not an exhaustive guide.

Feel free to contribute to this open-source project by submitting bug reports, feature requests, or pull requests. Your feedback is highly appreciated!

Enjoy exploring the depths of the web with the Web Crawler! 🌊🔍

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
deep-sea-crawler.py		deep-sea-crawler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Results

Note

About

Releases

Packages

Languages

MalikShoaib678/deep-sea-web-crawler

Folders and files

Latest commit

History

Repository files navigation

Results

Note

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages