Skip to content

m-ayush-2004/Web-Crawlers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

Web-Crawlers


🕸️ Web Crawlers and Automation Projects

Welcome to my repository! This collection features various web crawlers and automation tools I've developed. These projects demonstrate the power of Python libraries such as Selenium, Beautiful Soup, and others, for tasks like data extraction, site reading, audio conversion, and natural language processing.

📑 Table of Contents

  • Project Overview
  • Features
  • Libraries and Tools Used
  • Project Sections :
    • Site Reader and Learners
    • Data Extraction Bots
    • Audio Book Conversion Bots
    • NLP Learning Chat Bots
  • Installation
  • Usage

🔍 Project Overview

This repository contains all the major and minor web crawlers I've created. These projects are designed to interact with websites, extract data, convert text to speech, and even provide chatbot functionality. They are built using powerful Python libraries to automate tasks and simplify the user experience.

✨ Features

  • Automated Web Crawling: Interact with dynamic content using Selenium and parse HTML with Beautiful Soup.
  • Text to Speech Conversion: Convert text to speech using pyttsx3, supporting SAPI5, Google, and Bing APIs.
  • Speech to Text Conversion: Facilitate voice interaction with bots and web crawlers.
  • Natural Language Processing: Use NLP for chatbots that respond based on crawled data.

🛠️ Libraries and Tools Used

  • 🔗 Selenium: A browser automation tool that allows interaction with dynamic web content, perfect for handling JavaScript-heavy websites learn more.

  • 🍲 Beautiful Soup: A Python library for parsing HTML and XML documents, making it easy to extract data from web pages learn more.

  • 🔊 pyttsx3: A text-to-speech conversion library in Python, compatible with multiple speech engines including SAPI5, Google, and Bing. This allows you to create audio content from text data seamlessly learn more.

  • 🌐 url_request: A Python module used for fetching URLs. It allows performing HTTP requests, handling responses, and is vital for downloading content from the web learn more.

  • 🤖 NLP Retrieval Based Chat Bots: Chatbots that are developed using Natural Language Processing, retrieving and responding based on data collected from web crawlers. This provides a more interactive way of engaging with the extracted information.

🗂️ Project Sections

📖 Site Reader and Learners

These tools are designed to read content from educational and informational websites, making it easier for users to consume information through either text or audio.

📊 Data Extraction Bots

Bots created to extract specific data from websites, such as product information, news articles, or any other content that needs to be scraped and analyzed.

🎧 Audio Book Conversion Bots

These bots convert textual content into audio books, using text-to-speech libraries like pyttsx3, providing an easy way to consume content on the go.

🤖 NLP Learning Chat Bots

Chatbots that utilize the data gathered by web crawlers to engage in conversation with users, providing insights, answering queries, and assisting with learning.

🚀 Installation

To install the required dependencies, use the following command:

pip install -r requirements.txt

📚 Usage

Each project section comes with its own usage instructions. Refer to the specific folder for detailed steps on how to run the scripts and what inputs are needed.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages