Possible project for Kharagpur Winter of Code 2020
- How are the topics even related to ML?
- What will the project entail?
- How to start with the project?
- What are the prerequisites for the project?
- What can you contribute to the project?
- Expectations from the project
- How much is ML and how much is statistics/econometrics?
- Who to contact?
Often while building models in ML we become too concerned with accuracy and forget whether the model does what we initially set out to do. Statistics and Econometrics help in building better models and understanding the data. They can help in better feature engineering, and a better understanding of the assumptions which can help in ultimately building better models. Running linear regression sounds easy but what if someone asks you what assumptions you made while running the model. If your answer is "Umm..." then you are on the track to understanding what these topics can contribute to ML (if you didn't already know).
Due to certain limitations, for the time being, we are concerned with only Linear Regression. This is just a very small subset of ML but let's start with tiny steps to progress.
The project aims to have a series of notebooks that will help in understanding the basic topics. The notebooks could be used to get a broad overview of the topic or to quickly revise the topic. The notebooks can be helpful in the following ways:
- You are participating in a competition and you want to run some quick checks on the data/model
- You are sitting for internship/placement and need to revise some topics fast
- You want some code snippet for a certain test and how to interpret the test results.
- Install Jupyter Notebook, recommended installing with Anaconda
- Learn how to use Jupyter Notebook, and python libraries NumPy, pandas, and matplotlib
- Clone this repo and make a new branch
- Each ipynb file should be able to stand independently so you should be able to open it using Jupyter Notebook
- Basic knowledge of at least one programming language (preferable python)
- Basic knowledge of probability (class 12 level)
- Desire to learn statistics
Easy: Make some changes to the existing graphs or explanation to make them look better, add new ideas to 'ideas.md', check if existing notebooks make sense
Intermediate: Start with a new notebook of your own
Advanced: Make a series of notebooks or explain a complicated/advanced topic
There will be a variety of issues, some easy to get you started and one harder to make you significantly contribute. But I'll set down the minimum expected work that you should do to pass. By medievals, you should have at least one new notebook and by endevals, you should have at least three new notebooks ready. Each notebook should have some introduction to the topic, mathematical proofs if required, the code to implement that topic from scratch, and any ready-made library code, if available.
The notebook referred to here are Jupyter Notebooks.
Well, your learning from this will be less towards ML. These topics are to provide support to ML and do not replace the importance of doing a course/project purely based on machine learning.
The project was started by PetalsOnWind (Pankhuri Saxena, a fourth-year Economics student at IIT KGP). She can be reached at pankhurisaxena[dot]iitkgp[at]gmail[dot]com.