- To webscrape new listings install Scrapy:
$ pip install scrapy
- To follow my analysis in the Jupyter Notebbok install the following packages:
- Tabula to extract information fom PDFs:
$ pip pip install tabula-py
- FuzzyWuzzy to match strings:
$ pip install fuzzywuzzy
- Plotly for visualizations in the Jupyter Notebook:
$ pip install plotly
- Chart studio if you want to export your Plotly Visualisations:
$ pip install chart_studio
- Tabula to extract information fom PDFs:
To stop the ever increasing costs of housing, the Berlin state government passed a controversial law that caps the rent. On February 23, the new law will come into effect.
As my next data science side project, I decided to analyse current online listings on ImmobilienScout24, to see whether current landlords already respect the new rent cap.
- How many listings had a higher price than the allowed rent cap?
- How much more per month would all tenants pay than they had to under the new rent cap?
- How much would the average cold rent decrease under the new law?
- What is the distribution of the excess rent under the new law?
- How would the average cold rent price change per district?
- Which big real estate firms are charging the most excess rent?
To run the spider/web crawler:
- Go to the base folder berlin_rental_prices and change to the subfolder berlin_rental_prices (
$ cd berlin_rental_prices
) - Run the spider in your terminal with
scrapy crawl immo_scraper -o your_file_name.csv
To run the jupyter noteebook:
- Navigate to the following folder: berlin_rental_prices -> berlin_rental_prices -> berlin_rental_prices
- Open the data_analysis.ipynb Jupyter Notebook
The main findings of the code can be found at the post available here.
Must give credit to the author for the data. Otherwise, feel free to use the code here as you would like!