Berlin used to be an affordable place not so long ago. When I moved to Berlin, my uncle was paying around 580 euros for a one-bedroom apartment in the Mitte neighborhood. From there, you could walk through history, art, and culture without even taking a bus. Things have changed since then. Berliners not only need to compete between thousands of applications to find a place, but also they need to pay hefty rents to secure what they want.
I want to investigate this matter and answer some questions. More importantly, I decided to make a machine learning model predicting the rent prices in Berlin using different features. Since there are not many public datasets about berlin house prices available out there, I used a kaggle dataset. This dataset contains apartment rental offers in Germany from the dates 2018-09-22, 2019-05-10, and 2019-10-08 (Collected from immobilienscout24). I filtered out Berlin listings to work on it exclusively.
The berlin dataset is available on my Github.
- Gathered berlin listings from immobilienscout24, validated and cleaned data
- Performed an exploratory data analysis (plus AutoEDA) and answered different questions about the berlin real state market
- Built machine learning models to predict rent prices in berling using regression techniques
pandas
numpy
matplotlib
seaborn
scikit-learn
missingno
AutoML by:
pandas_profiling
AutoViz
The variables of this dataset are:
regio1
- bundeslandserviceCharge
- aucilliary costs such as electricty or internet in €heatingType
- type of heatingtelekomTvOffer
- is payed TV included if so which offertelekomHybridUploadSpeed
- how fast is the hybrid inter upload speednewlyConst
- is the building newly constructedbalcony
- if a listing has a balconypicturecount
- how many pictures were uploaded to the listingpricetrend
- price trend as calculated by ImmoscouttelekomUploadSpeed
- how fast is the internet upload speedtotalRent
- total rent (usually a sum of base rent, service charge and heating cost)yearConstructed
- construction yearscoutId
- immoscout IdnoParkSpaces
- number of parking spacesfiringTypes
- main energy sources, separated by colonhasKitchen
- has a kitchengeo_bln
- bundesland (state), same as regio1cellar
- has a cellaryearConstructedRange
- binned construction year, 1 to 9baseRent
- base rent without electricity and heatinghouseNumber
- house numberlivingSpace
- living space in sqmgeo_krs
- district, above ZIP codecondition
- condition of the flatinteriorQual
- interior qualitypetsAllowed
- are pets allowed, can be yes, no or negotiablestreet
- street namestreetPlain
- street name (plain, different formating)lift
- is elevator availablebaseRentRange
- binned base rent, 1 to 9typeOfFlat
- type of flatgeo_plz
- ZIP codenoRooms
- number of roomsthermalChar
- energy need in kWh/(m^2a), defines the energy efficiency classfloor
- which floor is the flat onnumberOfFloors
- number of floors in the buildingnoRoomsRange
- binned number of rooms, 1 to 5garden
- has a gardenlivingSpaceRange
- binned living space, 1 to 7regio2
- District or Kreis, same as geo krsregio3
- City/towndescription
- free text description of the listingfacilities
- free text description about available facilitiesheatingCosts
- monthly heating costs in €energyEfficiencyClass
- energy efficiency class (based on binned thermalChar, deprecated since Feb 2020)lastRefurbish
- year of last renovationelectricityBasePrice
- monthly base price for electricity in € (deprecated since Feb 2020)electricityKwhPrice
- electricity price per kwh (deprecated since Feb 2020)date
- time of scraping