Skip to content

Latest commit

 

History

History
15 lines (10 loc) · 1.33 KB

README.md

File metadata and controls

15 lines (10 loc) · 1.33 KB

Web Scraping Labs Repository

This repository contains three interactive labs demonstrating different web scraping techniques using Python. Each lab focuses on a different popular tool: Scrapy, Beautiful Soup, and Selenium. This project was done in the subject 'Web analytics' in collaboration with the students named in the notebook file.

Contents

  1. Beautiful_Soup.ipynb
    This lab introduces the basics of web scraping using Beautiful Soup, a Python library for parsing HTML and XML documents. There are exercises about how to extract data from static web pages, navigate the HTML structure, and retrieve specific information such as text, links, and images.

  2. Scrapy.ipynb
    In this lab, the library used is Scrapy, a powerful and flexible web crawling framework. The notebook covers setting up a Scrapy project, writing spiders to crawl websites, and extracting data from dynamic and complex web pages. Scrapy's asynchronous nature makes it ideal for large-scale scraping tasks.

  3. Selenium.ipynb
    This lab focuses on Selenium, a tool used for automating web browsers. Selenium is particularly useful for scraping dynamic content generated by JavaScript. This notebook shows how to simulate browser interactions, such as clicking buttons or filling forms, to extract data that isn't readily available in the HTML source.