GitHub - Santhin/real-estate: Real estate crawler with ML on scraped data

🧐 About

Project was created for "SKNS Warsztaty z Pythona".
Consists crawler for scraping real estate data from gumtree and jupyter notebook with ML.

🏁 Getting Started

To clone repository type:

git clone https://github.com/Santhin/real-estate

To run crawler locally:

pip install -r requirements
python app.py

Project structure

.
├── crawler
│   ├── app.py
│   ├── aps_asyncio.py
│   ├── gumtree
│   │   ├── __init__.py
│   │   ├── items.py
│   │   ├── middlewares.py
│   │   ├── pipelines.py
│   │   ├── settings.py
│   │   └── spiders
│   │       ├── gumtree_crawler.py
│   │       ├── __init__.py
│   │       └── stack.py
│   ├── install_asyncio.py
│   ├── Procfile
│   ├── requirements.txt
│   └── scrapy.cfg
├── LICENSE
├── ml
│   ├── features
│   │   ├── rankingcen.xlsx
│   │   ├── Ranking\ Dzielnic\ 2020\ Warszawa.pdf
│   │   ├── ranking_dzielnic_warszawy_pod_wzgledem_atrakcyjnosci_warunkow_zycia_2017.pdf
│   │   ├── ranking_otodom.csv
│   │   ├── ranking.txt
│   │   └── ranking.xlsx
│   ├── notebooks
│   │   ├── ML\ endgame\ floydhub.ipynb
│   │   ├── ML\ endgame.ipynb
│   │   ├── NLP\ eda\ etc.ipynb
│   │   ├── Pipeline\ mongoRaw\ to\ clean\ before\ EDA.ipynb
│   │   └── real\ EDA.ipynb
│   └── pictures
│       ├── images.png
│       ├── ml_map.png
│       ├── simple-house-exterior-white-background_1308-50195.jpg
│       ├── unnamed.jpg
│       └── white-house-background-check-democratic-party-republican-party-house-png.jpg
└── README.md

6 directories, 32 files

🚀 Deployment

The crawler was deployed on Heroku and in 15min intervals was activated with advanced python scheduler.

⛏️ Built Using

Scrapy - Crawler
MongoDB - Database
Heroku - Deployment
Floydhub - Traning model

🛠️ Todo

add requirements.txt to ML folder

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧐 About

🏁 Getting Started

Project structure

🚀 Deployment

⛏️ Built Using

🛠️ Todo

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
crawler		crawler
ml		ml
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

Santhin/real-estate

Folders and files

Latest commit

History

Repository files navigation

🧐 About

🏁 Getting Started

Project structure

🚀 Deployment

⛏️ Built Using

🛠️ Todo

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages