Skip to content

This repository contains Jupyter Notebooks for web scraping, transforming and loading flight data from 2 online travel companies.

Notifications You must be signed in to change notification settings

Lacerdash/WebScrapping-Flight-Data

Repository files navigation

Web Scrapping Flight Data

This is a personal project to extract and compare flight prices from two different websites: Decolar.com and Passagens Promo. The objective is to be able to compare the prices between the two websites after extracting, transforming, and loading the data. The project is inspired by a real business need that I experienced and it aims to help me train my web scraping skills that I have studied for real-world projects.

Project Description

The project consists of 3 main .ipynb files:

Additionally, the repository includes the following files:

  • Fligh Data.xlsx: This file contains the final output data
  • Dim_iata.xlsx: This file contains a list of IATA codes for airports used by the .ipynb files
  • search_parameters.xlsx: This file contains randomly generated search parameters for the .ipynb files

Requirements

This project requires the following dependencies:

  • Python 3
  • Requests
  • Beautiful Soup 4
  • Pandas

Usage

To run this project, follow these steps:

  1. Clone the repository to your local machine:

    git clone https://github.com/Lacerdash/WebScrapping-Flight-Data.git
  2. Navigate to the repository directory:

    cd WebScrapping-Flight-Data
  3. Open the WebScrappingPassagens.ipynb file in a Jupyter notebook environment or your preferred IDE, and run the cells to execute the code.

  4. The output files will be saved in the output directory.

About

This repository contains Jupyter Notebooks for web scraping, transforming and loading flight data from 2 online travel companies.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published