Skip to content

Latest commit

 

History

History
114 lines (75 loc) · 3.29 KB

A. Data Exploration and Cleansing.md

File metadata and controls

114 lines (75 loc) · 3.29 KB

📚 Case Study - Online Hotel Web Search: A. Data Exploration and Cleansing

Using Python - Google Colab vs Power BI

IMPORT Library

#import lib
#import thư viện
import pandas as pd
import numpy as np
import matplotlib as plt
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib import dates
import datetime
print('Completed import lib')

IMPORT DATASET

1.Import csv file and check info

df =  pd.read_csv('/content/marketplace_data_2019.csv')
df.info()

image

2.Checking Null Values

#Checking null Values
df.isnull().sum()

image

---> There is no Null Values

3. Unpivot Dataframe

#Unpivot dataframe 
df_unpivot = pd.wide_to_long(df,
                             stubnames =['clicks', 'cost','bookings','booking_rev'],  #select columns
                             i = ['date','ttt_group'], #
                             j = 'Advertiser', 
                             sep ='_', 
                             suffix ='\w+')
print(df_unpivot.head())

image

4. Transforming MultiIndex to Column"

#Transforming MultiIndex to Columns
df = df_unpivot.reset_index()
print(df.head())
df.info()

image

5. Save file "

#Save file 
df.to_csv('Final_mindx.csv')

IMPORT CLEAN FILE TO POWER BI

1. Transform Data

Use First rows as Header, Change type of click,cost,bookings,booking_rev,month column to Number type. Remove First Column (order column)

  • Source data : image

  • Promoted header, Change type & Replace value : image

  • Change data type and remove the first column (order column)
    image

2.Add new measure dax - Conversion rate, % Booking, Cost/Booking, Profit/Cost, Rev_Per_booking

  • Conversion rate : calculated by dividing the number of bookings by the number of clicks image

  • % Booking : Calculate the percent of booking monthly image

  • Cost/Booking: The Average Cost per booking image

  • Profit/Cost : The profit make from cost image

  • Rev_Per_booking : The Revenue per booking image