Skip to content

Latest commit

 

History

History
46 lines (22 loc) · 1.75 KB

README.md

File metadata and controls

46 lines (22 loc) · 1.75 KB

Ai_Web_Scraper

You may put the link the bot will scrape the data as well as answer any queries you pass

Project Overview

The AI Web Scraper is a powerful tool designed to extract data from web pages. Given a URL, the scraper retrieves the page's content, parses it, and can respond to user queries regarding the extracted data. This project integrates advanced parsing techniques with AI capabilities to interactively answer queries based on the scraped content.

Features:

Data Extraction: Automates the process of extracting raw HTML from any webpage.

Data Parsing: Utilizes BeautifulSoup4 and lxml to parse the HTML content into a manageable format.

Query Handling: Leverages Ollama 3.1 and LangChain to answer queries based on the parsed data.

User Interface: Streamlit-based front end for easy interaction with the tool.

Technologies Used

Selenium: For automating web browser interaction to scrape data.

BeautifulSoup4 and lxml: For parsing HTML and XML documents.

Ollama 3.1: For processing and answering queries based on natural language understanding.

LangChain: To integrate AI and language processing capabilities.

Streamlit: For creating the front end, making it interactive and user-friendly.

Python-dotenv: To manage environment variables.

HTML5lib: A compliant library for parsing and serializing HTML documents.

Chromedriver: To interface with Google Chrome.

RESULT

Screenshot 2024-09-12 at 9 08 52 PM (2)

Screenshot 2024-09-12 at 9 09 16 PM

Screenshot 2024-09-12 at 9 09 45 PM