Skip to content

data pipelining project at Ironhack Barcelona - Jan 2023

Notifications You must be signed in to change notification settings

germanortola/airbnb-bcn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Pipelining Project

Airbnb Barcelona

Data Analytics Bootcamp @ Ironhack

Disclaimer:

This project's results are intended to be part of a larger research. Please refer to the following Powerpoint presentation. https://github.com/germanortola/airbnb-bcn/blob/main/output/Airbnb_Barcelona_data_pipelining.pptx

Note for Week 3 deliverables: Please refer to Data Visualization.ipynb file for a working and "clean" version of all produced code. https://github.com/germanortola/airbnb-bcn/blob/main/notebooks/Data%20Visualization.ipynb

First: Choosing a subject and asking questions

The main goal of this project is to answer relevant questions about the current status of Airbnb accommodation in Barcelona, in the year 2022.

This project makes use of different Python libraries for cleaning, processing and transforming dataframes. Also downloads data through API. Once the data is ready, it is visualized to provide with insights to answer the questions that drive this analysis.

What was the current situation of Airbnb in Barcelona as for 2022?

What is the price range?

How many locations are in the city?

How are these locations distributed?

What do guests say about Airbnb in Barcelona?

How much of the accommodation offer does Airbnb represent?

Second: Collecting the data

Dowloaded the main database for this project from Inside Airbnb

http://insideairbnb.com/data-requests

This is a project centered in studying and providing data for topics as:

regulations to protect housing impact on housing impact on residential communities touristification and overtourism gentrification unethical tech companies

Third: Data Cleaning

Cleaned different columns, although the database was solid and very well organized.

Checking the ratio of missing data Dropping null values Checking and Converting data types Checking for duplicates

Python methods, as well as Pandas Library.

Fourth: Data Processing

The data was transformed by extracting new subsets. Pivot tables were created, aggregating by different basic statistic methods. Count Sum Mean Median Max Min

Pandas library was the main tool here.

Fifth: Data Analysis

The analysis was performed by comparing values from different attributes. Focusing on the original questions was key to staying on track.

Running tests to see the results with Python code.

Sixth: Data Visualization

The process to visualize was the final step in order to achieve thorough interpretation of the data ralationships.

Plotly, Seaborn and Matplotlib libraries were the tools for this task.

to be continued...

About

data pipelining project at Ironhack Barcelona - Jan 2023

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published