Tiger Population Estimation

This project is a part of ISI DataFest Integration 2023, which aims to predict the number of tigers in a given dataset using machine learning algorithms. The dataset contains information about different tiger sightings, including the location, soil type, and other environmental factors.

To predict the number of tigers, the project uses two main techniques: Supervised K-Nearest Neighbor (KNN) clustering and feature selection using the Logit function. Supervised KNN is a clustering algorithm that is useful when labeled data is expensive or impossible to obtain. It can achieve high accuracy in a wide variety of prediction-type problems. The Logit function, on the other hand, is a useful technique for predicting binary outcomes, such as whether a tiger is unique or not.

The project first loads the training dataset, which contains labeled data, and splits it into training and testing sets. It then trains the KNN model on the training set and tests its accuracy on the testing set. Once the model is trained, it is used to predict the number of tigers in a new dataset using the KNN algorithm.

To further improve the accuracy of the prediction, the project uses feature selection techniques to identify the most useful features in the dataset. Specifically, it applies the Logit function to the dataset and examines the regression summary to determine which features have the most significant impact on the outcome variable (i.e., whether a tiger is unique or not). These features are then used to refine the KNN model and improve its accuracy.

Overall, this project demonstrates how machine learning algorithms can be used to predict the number of tigers in a given dataset. By combining KNN clustering and feature selection techniques, it achieves high accuracy in predicting the number of unique tigers in the dataset.

Getting Started

To get started with this project, you will need to clone the repository to your local machine and install the required libraries using pip. Download the dataset from the kaggle page here or alternatively, using the kaggle API* use:

kaggle competitions download -c im-hard-to-spot

*You can follow the instructions in this repo to set up the API.

Prerequisites

You will need the following libraries:

pandas
sklearn
statsmodels
matplotlib
seaborn

You can install them using pip:

pip install pandas sklearn statsmodels matplotlib seaborn

Alternatively use the below command after cloning the repo:

pip install -r requirements.txt

Installing

To install this project, simply clone the repository to your local machine:

git clone https://github.com/your-username/your-repository.git

License

This project is licensed under the MIT license.

Acknowledgments

ISI DataFest Integration 2023

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Datasets.md		Datasets.md
LICENSE		LICENSE
README.md		README.md
estimate.py		estimate.py
get_features.py		get_features.py
requirements.txt		requirements.txt
tiger-population-estimation.ipynb		tiger-population-estimation.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tiger Population Estimation

Getting Started

Prerequisites

Installing

License

Acknowledgments

About

Releases

Packages

Contributors 3

Languages

License

atharv-naik/tiger-population-estimation

Folders and files

Latest commit

History

Repository files navigation

Tiger Population Estimation

Getting Started

Prerequisites

Installing

License

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages