This repository contains three interactive labs demonstrating different web scraping techniques using Python. Each lab focuses on a different popular tool: Scrapy, Beautiful Soup, and Selenium. This project was done in the subject 'Web analytics' in collaboration with the students named in the notebook file.
-
Beautiful_Soup.ipynb
This lab introduces the basics of web scraping using Beautiful Soup, a Python library for parsing HTML and XML documents. There are exercises about how to extract data from static web pages, navigate the HTML structure, and retrieve specific information such as text, links, and images. -
Scrapy.ipynb
In this lab, the library used is Scrapy, a powerful and flexible web crawling framework. The notebook covers setting up a Scrapy project, writing spiders to crawl websites, and extracting data from dynamic and complex web pages. Scrapy's asynchronous nature makes it ideal for large-scale scraping tasks. -
Selenium.ipynb
This lab focuses on Selenium, a tool used for automating web browsers. Selenium is particularly useful for scraping dynamic content generated by JavaScript. This notebook shows how to simulate browser interactions, such as clicking buttons or filling forms, to extract data that isn't readily available in the HTML source.