Authors: Danish Karlin Isa, Nicholas Varabioff, Ximin Xu, Zuer Zhong
This project is part of the coursework for DSCI 522 Data Science Workflows, a course of the Master of Data Science program at the University of British Columbia.
This project attempts to predict the level of acceptability of cars using machine learning methods. Using a 1997 dataset, the influence of various attributes of a new car on its acceptability among customers is analysed. These attributes include:
- The buying price of the car
- Maintenance costs
- Number of doors
- Passenger capacity
- Boot size
- Safety ratings
This analysis aims to identify the key factors that determine whether a car is considered acceptable, good, or unacceptable according to standardized criteria. To achieve this, several common machine learning models were explored. The SVM RBF classifier was identified as the best-performing model, achieving a test accuracy of 0.952. The SVM RBF model demonstrated exceptional performance, with an accuarcy of 0.99 on unseen data. This makes the SVM RBF model a solid choice for this project.
The dataset used in this project is the Car Evaluation Database created by M. Bohanec and V. Rajkovic in the early 1990s. It was sourced from the UCI Machine Learning Repository and is publicly available for research and can be found in the UCI Machine Learning Repository.
The final report can be found here.
- Docker
-
If you are using Windows or Mac, make sure Docker Desktop is running.
-
Clone this GitHub repository.
-
Navigate to the root of this repository.
-
Run the following command-line command:
docker compose up
-
In the terminal, look for a URL that starts with
http://127.0.0.1:8888/lab?token=
as shown in the image below. Copy and paste that URL into your browser. This will launch Jupyter Lab. -
To reset the project to a clean slate, open a terminal in the root of the project and run the following command.
make clean
-
To run the entire analysis, open a terminal in the root of the project and run the following command.
make all
- To shut down the container and clean up the resources, type
Ctrl
+C
in the terminal where you launched the container. Then, typedocker compose rm
.
conda
(version 23.9.0 or higher)conda-lock
(version 2.5.7 or higher)- Python and packages listed in
environment.yml
-
Create a new branch.
-
Add dependency to the
environment.yaml
file. Make sure to pin the desired package version. -
Update the
conda-linux-64.yml
by running the following command-line command from the root of the project:conda-lock -k explicit --file environment.yml -p linux-64
-
Rebuild the Docker image locally to ensure it builds and runs properly.
docker build --tag env-test --platform=linux/amd64 . docker run --rm -it --platform=linux/amd64 env-test /bin/bash
-
Push the changes to GitHub. A new Docker image will be built and pushed to DockerHub automatically. It will be tagged with the SHA for the commit that changed the file.
-
Update the
docker-compose.yaml
file on your branch to use the new container image (make sure to update the tag specifically). -
Send a Pull Request and merge your branch changes into the
main
branch.
-
Launch Jupyter Lab using the same
docker compose up
command in the Running the Analysis section. -
To run the test suite, open a terminal in the root of the project and run the
pytest
command.
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.The software code contained within this repository is licensed under the MIT license. See the license filefor more information.
- Van Rossum, G., & Drake, F. L. (2009). Python 3 Reference Manual. CreateSpace, Scotts Valley, CA. ISBN: 1441412697.
- McKinney, W. (2010). Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference (pp. 56-61). Edited by S. van der Walt and J. Millman. DOI: 10.25080/Majora-92bf1922-00a.
- Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. Computing in Science & Engineering, 9(3), 90-95. DOI: 10.1109/MCSE.2007.55.
- Pedregosa, F. et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825-2830.
- Bohanec, M. (1988). Car Evaluation. UCI Machine Learning Repository. DOI: 10.24432/C5JP48.
- Harris, C. R. et al. (2020). Array programming with NumPy. Nature, 585(7825), 357-362. DOI: 10.1038/s41586-020-2649-2.
- Developers, Scikit-learn. 2024. Grid Search — Scikit-Learn Documentation. https://scikit-learn.org/stable/modules/grid_search.html.