Skip to content

mariamagro/Web_Scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Web Scraping Labs Repository

This repository contains three interactive labs demonstrating different web scraping techniques using Python. Each lab focuses on a different popular tool: Scrapy, Beautiful Soup, and Selenium. This project was done in the subject 'Web analytics' in collaboration with the students named in the notebook file.

Contents

  1. Beautiful_Soup.ipynb
    This lab introduces the basics of web scraping using Beautiful Soup, a Python library for parsing HTML and XML documents. There are exercises about how to extract data from static web pages, navigate the HTML structure, and retrieve specific information such as text, links, and images.

  2. Scrapy.ipynb
    In this lab, the library used is Scrapy, a powerful and flexible web crawling framework. The notebook covers setting up a Scrapy project, writing spiders to crawl websites, and extracting data from dynamic and complex web pages. Scrapy's asynchronous nature makes it ideal for large-scale scraping tasks.

  3. Selenium.ipynb
    This lab focuses on Selenium, a tool used for automating web browsers. Selenium is particularly useful for scraping dynamic content generated by JavaScript. This notebook shows how to simulate browser interactions, such as clicking buttons or filling forms, to extract data that isn't readily available in the HTML source.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published