Web Scraping Labs Repository

This repository contains three interactive labs demonstrating different web scraping techniques using Python. Each lab focuses on a different popular tool: Scrapy, Beautiful Soup, and Selenium. This project was done in the subject 'Web analytics' in collaboration with the students named in the notebook file.

Beautiful_Soup.ipynb
This lab introduces the basics of web scraping using Beautiful Soup, a Python library for parsing HTML and XML documents. There are exercises about how to extract data from static web pages, navigate the HTML structure, and retrieve specific information such as text, links, and images.
Scrapy.ipynb
In this lab, the library used is Scrapy, a powerful and flexible web crawling framework. The notebook covers setting up a Scrapy project, writing spiders to crawl websites, and extracting data from dynamic and complex web pages. Scrapy's asynchronous nature makes it ideal for large-scale scraping tasks.
Selenium.ipynb
This lab focuses on Selenium, a tool used for automating web browsers. Selenium is particularly useful for scraping dynamic content generated by JavaScript. This notebook shows how to simulate browser interactions, such as clicking buttons or filling forms, to extract data that isn't readily available in the HTML source.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Beautiful_Soup.ipynb		Beautiful_Soup.ipynb
README.md		README.md
Scrapy.ipynb		Scrapy.ipynb
Selenium.ipynb		Selenium.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraping Labs Repository

Contents

About

Releases

Packages

Languages

mariamagro/Web_Scraping

Folders and files

Latest commit

History

Repository files navigation

Web Scraping Labs Repository

Contents

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages