Machine Learning model that predicts sales prices for apartments based on ads at www.zapimoveis.com.br.
The dataset was provided by Data Science Challenge by Grupo ZAP.
This project shouldn't be used in production environment or for decision making without validating its results.
This project has no support lifecycle and has only learning purposes.
Machine Learning experiments are tracked and models are saved using MLflow.
More information in docs/ml_model.md
The project has an API (powered by FastAPI) to be consumed by other applications.
More information in docs/api.md
The project has a Data App (powered by Streamlit) that facilitates interaction with the model and visualization of the documentation.
More information in docs/data_app.md
- api: API's code (powered by FastAPI).
- app: Data App's code (powered by Streamlit).
- data: Datasets (raw and processed).
- docs: Documentation files.
- ds_code: Project code and modeling notebook.
- mlruns: Machine learning experiments (powered by MLFlow).
- properties: Application properties.
The project was divided into 3 parts.
Processing - ds_code/processing
- Download, extract and preprocess the datasets.
- Provide scripts to be used for all steps.
Exploratory Data Analysis (EDA) - app/eda.py
- Data visualization for the training dataset
Available in the data app.
Modeling - ds_code/modeling
- Refining the dataset and training the model.
Experiments are tracked on MLflow.
We made a careful feature selection of the datasets (training and test). We include geographic data provided by IBGE Census 2010.
See more information in the project documentation.
The results and some business answers are in docs/report.md or on the Data App.
All documentation about this project is in .md
(markdown) files in the docs
This documentation is also available on the Data App.
All necessary packages are listed in requirements.txt.
To install them, run the command below in the project directory.
pip install -r requirements.txt