As a consultant working for a real estate start-up, you have collected Airbnb listing data from various sources to investigate the short-term rental market in New York. This assignment focuses on analyzing this data to provide insights to your client. The goal is to use exploratory data analysis (EDA) techniques to extract meaningful trends, perform data transformations, and visualize findings effectively.
-
airbnb_price.csv
: Contains data on Airbnb listing prices and locations.listing_id
: Unique identifier of the listing.price
: Nightly listing price in USD.nbhood_full
: Name of borough and neighborhood where the listing is located.
-
airbnb_room_type.xlsx
: Contains data on listing descriptions and room types.listing_id
: Unique identifier of the listing.description
: Listing description.room_type
: Type of room (shared room, private room, entire home/apartment).
-
airbnb_last_review.tsv
: Contains data on Airbnb host names and review dates.listing_id
: Unique identifier of the listing.host_name
: Name of the listing host.last_review
: Date when the listing was last reviewed.
This assignment is open-ended, meaning you can refer to internet resources, documentation, and tutorials for assistance. However, for each solution, you must include 2 to 3 sentences of justification or reasoning explaining:
- How you arrived at the solution.
- What steps you took to achieve the solution.
Additionally:
- Your submission file should include all steps in the process, not just the final answer. Each question should have a complete code workflow.
- Visualization questions require well-labeled and aesthetically pleasing charts.
-
Dates of Reviews
- What are the dates of the earliest and most recent reviews?
- Store these values as two separate variables,
earliest_review
andmost_recent_review
.
- Store these values as two separate variables,
- What are the dates of the earliest and most recent reviews?
-
Private Room Listings
- How many of the listings are private rooms?
- Save this count into a variable called
nb_private_rooms
.
- Save this count into a variable called
- How many of the listings are private rooms?
-
Average Price Calculation
- What is the average listing price?
- Round to the nearest two decimal places and store it in a variable called
avg_price
.
- Round to the nearest two decimal places and store it in a variable called
- What is the average listing price?
-
Summary Table
- Combine the calculated values into a new DataFrame called
review_dates
with the following columns (in order):first_reviewed
,last_reviewed
,nb_private_rooms
, andavg_price
. The DataFrame should contain only one row of values.
- Combine the calculated values into a new DataFrame called
-
Neighborhood Trends
- Which neighborhoods have the highest and lowest average listing prices?
- Create a DataFrame with columns
neighborhood
,average_price
, andnumber_of_listings
for the top 5 most expensive neighborhoods.
- Create a DataFrame with columns
- Which neighborhoods have the highest and lowest average listing prices?
-
Word Analysis in Descriptions
- Find the top 10 most frequently used words in the
description
column (excluding stopwords like "the," "and," etc.).- Use
pandas.Series.str.split
and explore theCounter
class from thecollections
module.
- Use
- Find the top 10 most frequently used words in the
-
Room Type Comparison
- Compare the average prices for each
room_type
(shared rooms, private rooms, entire homes/apartments).- Create a bar chart visualizing the differences.
- Compare the average prices for each
-
Trend Over Time
- Analyze the number of reviews over time for all listings.
- Plot a line graph showing the trend of reviews per month over the years. (Hint: Use
pandas.to_datetime
andgroupby
.)
- Plot a line graph showing the trend of reviews per month over the years. (Hint: Use
- Analyze the number of reviews over time for all listings.
-
Exploring Unique Matplotlib Functions
- Create a scatter plot with a regression line showing the relationship between
price
and the length of thedescription
.- Use
matplotlib.axes.Axes.annotate
to highlight outliers in the graph. (Note: Students should explore this function independently.)
- Use
- Create a scatter plot with a regression line showing the relationship between
-
Exploring Unique Seaborn Functions
- Generate a strip plot for prices grouped by
room_type
using thehue
parameter to distinguish neighborhoods.- Students should explore the
seaborn.stripplot
function.
- Students should explore the
- Generate a strip plot for prices grouped by
-
Bar Chart
- Create a bar chart showing the count of listings for each room type.
- Add proper labels, titles, and a legend.
- Create a bar chart showing the count of listings for each room type.
-
Heatmap
- Generate a heatmap to show the correlation (if any) between listing price and the frequency of reviews.
- Use the
sns.heatmap
function.
- Use the
- Generate a heatmap to show the correlation (if any) between listing price and the frequency of reviews.
-
Pie Chart
- Create a pie chart to visualize the proportion of room types available.
- Use a custom color palette and annotate the chart with percentages.
- Create a pie chart to visualize the proportion of room types available.
-
Histogram
- Plot a histogram showing the distribution of listing prices.
- Use bins to group prices in increments of $50.
- Plot a histogram showing the distribution of listing prices.
-
Violin Plot
- Create a violin plot to compare price distributions across neighborhoods.
- Your notebook should include:
- Step-by-step solutions for each question.
- Explanations for each solution in 2-3 sentences.
- Properly labeled graphs and visualizations.
- Ensure all code cells are executed, and the outputs are visible.
- Submit the
.ipynb
file with the name format:EDA_Project1_<YourName>.ipynb
.
-
Outlier Detection
- Identify listings with unusually high prices (outliers) using the interquartile range (IQR) method.
- Highlight these listings in a scatter plot.
- Identify listings with unusually high prices (outliers) using the interquartile range (IQR) method.
-
Interactive Visualization
- Explore
plotly
oraltair
to create an interactive visualization forprice
trends byneighborhood
.- Include tooltips to display additional information such as
room_type
anddescription
.
- Include tooltips to display additional information such as
- Explore
Rock n Roll everyone! Times ticking...