- The data used is available here.
Download the csv file and update the path to the csv file in the
config.yaml
file indata.raw_data
or in the environment variableDATA_PATH
- Update the python environment in
.env
file - Install
poetry
if not already installed - Install the dependencies using poetry
poetry install
- update the config and model parameters in the
config.yaml
file - Add
./src
to thePYTHONPATH
-export PYTHONPATH="${PYTHONPATH}:./src"
- Run
python src/main.py
orpoetry run python src/main.py
- Build the docker image -
docker build -t regression .
- Bring up the dependencies by using
docker compose up -d
- Run the container with the correct
DATA_PATH
andMLFLOW_TRACKING_URI
as environment variables. (Refer to the following Environment Variables table for complete list)
docker run -e DATA_PATH=/app/artefacts/HousingData.csv -e MLFLOW_TRACKING_URI=http://host.docker.internal:5000 -v ./artefacts:/app/artefacts --rm regression
The following environment variables can be set to configure the training:
Variable | Default Value | Description |
---|---|---|
DATA_PATH | ./artefacts/HousingData.csv |
File path to the raw data CSV data used for training |
CONFIG_PATH | ./config.yaml |
File path to the model training and other configuration file |
LOG_LEVEL | INFO |
The logging level for the application. Valid values are DEBUG , INFO , WARNING , ERROR , and CRITICAL . |
MLFLOW_TRACKING_URI | http://localhost:5000 |
MLFlow tracking URI. Use http://host.docker.internal:5000 if the MLFlow is running within docker container. |
GITHUB_USERNAME | None | Githuib username. This is needed to pull the data form the dvc repo. |
GITHUB_PASSWORD | None | Githuib token. This is needed to pull the data form the dvc repo. |
DVC_REMOTE | s3://artifacts |
Dvc remote |
DVC_REMOTE_NAME | regression-model-remote |
Dvc remote name. |
DVC_ENDPOINT_URL | http://minio |
The URL endpoint for the DVC storage backend. This is typically the URL of an S3-compatible service, such as MinIO, used to store and manage datasets and model files. |
AWS_DEFAULT_REGION | eu-west-2 |
The dvc remote s3 bucket region |
DVC_ACCESS_KEY_ID | None | Access key id for dvc remote. Optional. Not needed if using IAM based access for dvc remote. |
DVC_SECRET_ACCESS_KEY | None | secret access key for dvc remote. Optional. Not needed if using IAM based access for dvc remote. |
DEPLOY_AS_CODE | False | Should manual intervention or evaluation is needed to register the trained model or not. |
DEPLOY_MODEL_NAME | house_price_prediction |
The name with which the model will be registered. This name and alias will be used for deployment. |
DEPLOY_MODEL_ALIAS | champion |
The alias to be added to the model. This alias along with the name will be used for deployment. |
Ensure that you have the project requirements already set up by following the Model training instructions
- Ensure
pytest
is installed.poetry install
will install it as a dev dependency. -
- For integration tests, set up the dependencies (MLFlow) by running,
docker-compose up -d
- For integration tests, set up the dependencies (MLFlow) by running,
- Run the tests with
poetry run pytest ./tests