Skip to content

πŸ“Š 30 Days of Data Science is a daily challenge to guide you through Data Science essentials. From basics to advanced, this repo offers clear examples, practical exercises, and resources to help you master Data Science, one day at a time. Whether you're new or refining your skills, this challenge has something for you. Join the journey now! πŸš€

License

Notifications You must be signed in to change notification settings

SamarthGarge/30-Days-Of-DataScience

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ‘¨β€πŸ”¬ 30 Days of Data Science

Day Topic Topics Covered
01 Introduction to Data Science Setting up Python, Jupyter Notebook
02 Basics of the Language + Git Basics Python syntax, variables, Git setup
03 Control Flow If-else, loops
04 Functions and Modular Programming Defining & calling functions
05 Data Structures Lists, tuples, dictionaries
06 Data Frames and Tables pandas DataFrame
07 Importing Data Reading CSV, Excel, JSON files
08 Data Cleaning Handling missing values, duplicates
09 Exploratory Data Analysis (EDA) Descriptive statistics
10 Data Visualization Basics matplotlib, seaborn
11 Advanced Data Visualization Plotly, advanced matplotlib
12 SQL for Data Retrieval sqlite3, SQLAlchemy
13 Time Series Analysis Introduction pandas datetime, matplotlib
14 Working with APIs and JSON requests, JSON module
15 Regular Expressions re module
16 Statistical Concepts Scipy, NumPy
17 Hypothesis Testing t-test, chi-square
18 Basic Machine Learning Introduction scikit-learn basics
19 Linear Regression LinearRegression in scikit-learn
20 Logistic Regression LogisticRegression in scikit-learn
21 Clustering (K-Means) KMeans in scikit-learn
22 Decision Trees DecisionTreeClassifier in scikit-learn
23 Handling Imbalanced Data SMOTE, class weighting
24 Feature Engineering Encoding, scaling, feature selection
25 Model Evaluation and Metrics Confusion matrix, ROC-AUC
26 Advanced ML: Hyperparameter Tuning GridSearchCV, RandomizedSearchCV
27 Natural Language Processing (NLP) NLTK, spaCy, Hugging Face
28 Time Series Forecasting ARIMA, Prophet
29 Working with Big Data PySpark basics
30 Building a Data Science Pipeline sklearn pipeline, joblib
31 Deployment on Cloud Platform Deploy with Flask/FastAPI to AWS, Azure, or GCP

πŸ“˜ Day 1

Welcome

Congratulations on deciding to participate in a 30 Days of Data Science challenge! In this challenge, you will dive into the essential concepts of data science, from foundational programming skills to data analysis, visualization, and machine learning.

Introduction

Data Science is an interdisciplinary field that uses programming, mathematics, and domain knowledge to extract insights from structured and unstructured data. Python is one of the most popular tools in data science due to its versatility, ease of use, and robust ecosystem of libraries. This challenge is designed to help you build a strong foundation in Python while applying it to practical data science tasks. The topics are distributed over 30 days, with clear explanations, real-world examples, and hands-on exercises.

This challenge is suitable for beginners as well as professionals looking to strengthen their data science skills. It may take 30 to 100 days to complete, depending on your pace.

Why Learn Data Science?

Data Science is revolutionizing industries by enabling data-driven decision-making. It combines programming, statistics, and domain expertise to solve complex problems. Python has become the go-to language in the data science community due to its simplicity and extensive library support for tasks like data cleaning, visualization, and modeling. Whether you aim to work in business analytics, artificial intelligence, or research, data science skills will open up endless possibilities.

Setting Up Your Environment

Installing Python

To start coding in Python, you need to install it on your computer. Visit the official Python website to download the latest version.

  • Windows users: Download Python by clicking the appropriate button.
  • macOS users: Follow similar steps to install Python for Mac.

To confirm the installation, open your terminal or command prompt and type:

python --version

You should see the installed version, which should be Python 3.6 or above. For example:

Python 3.12.4

If the command displays the Python version, you are ready to proceed.

Python Shell

Python is an interpreted language, meaning you can execute code line by line. Python comes with an interactive shell, which allows you to write and test Python commands directly. To open the shell, type the following command in your terminal:

python

Once the shell is open, you can start entering Python commands after the >>> prompt. For example, typing 2 + 3 will output 5. To exit the shell, type exit().

If you enter an invalid command, Python will provide an error message, helping you debug and learn. Debugging is the process of identifying and fixing errors in your code. You will encounter common error types such as SyntaxError, NameError, and TypeError throughout this challenge. Understanding these errors is crucial for becoming a proficient programmer.

Installing Visual Studio Code

While the Python shell is great for quick tests, real-world data science projects require robust code editors. For this challenge, we recommend using Visual Studio Code, a popular and lightweight editor. Feel free to use other editors if you prefer.

To start, download and install Visual Studio Code. Once installed, create a folder named 30DaysOfDataScience on your computer and open it using Visual Studio Code. Inside the folder, create a new file, such as helloworld.py, to write your first Python script. This will serve as the workspace for your projects throughout the challenge.

Exploring the Editor

Visual Studio Code offers many features to enhance productivity, including debugging tools, extensions, and an intuitive interface. Spend some time familiarizing yourself with its layout and shortcuts.

Installing Jupyter Notebook

In addition to Visual Studio Code, another essential tool for data science is Jupyter Notebook. It is an interactive web-based environment where you can write and execute Python code, visualize data, and document your analysis all in one place. Jupyter Notebook is widely used in the data science community because it simplifies exploratory data analysis and data visualization.

Installing Jupyter Notebook

To install Jupyter Notebook, you'll first need to install pip, the Python package manager, which should already be available if you've installed Python. Open your terminal or command prompt and type:

pip install notebook

Once the installation is complete, you can launch Jupyter Notebook by typing:

jupyter notebook

This command will open Jupyter Notebook in your default web browser. You will see an interface that allows you to create and organize notebooks in different folders.

Using Jupyter Notebook

To create a new notebook:

  1. Navigate to the folder where you'd like to save your notebooks.
  2. Click New (on the top-right corner) and select Python 3 (ipykernel).

A new notebook will open where you can write Python code in individual cells. Press Shift + Enter to execute the code in a cell. You can also add explanatory text using Markdown cells to make your analysis more readable.

Here is a simple example to get started:

  1. Create a new notebook and name it Day1_Basics.ipynb.
  2. Write the following code in a cell and execute it:
# This is your first code in Jupyter Notebook
print("Hello, Data Science!")

You should see the output below the cell:

Hello, Data Science!

Installing JupyterLab (Optional)

If you'd like a more modern interface with enhanced features, you can use JupyterLab, an upgraded version of Jupyter Notebook. Install it using:

pip install jupyterlab

Launch it by typing:

jupyter lab

Integration with Visual Studio Code

If you prefer to work within Visual Studio Code but want the interactivity of Jupyter Notebook, you can install the Jupyter extension in Visual Studio Code:

  1. Open Visual Studio Code and go to the Extensions Marketplace (the square icon on the sidebar).
  2. Search for "Jupyter" and install the extension.
  3. Open a .ipynb file, or create one using the command palette (Ctrl + Shift + P or Cmd + Shift + P on Mac) and selecting Jupyter: Create New Blank Notebook.

Now you can use Jupyter notebooks directly within Visual Studio Code!

Day 2 >>

About

πŸ“Š 30 Days of Data Science is a daily challenge to guide you through Data Science essentials. From basics to advanced, this repo offers clear examples, practical exercises, and resources to help you master Data Science, one day at a time. Whether you're new or refining your skills, this challenge has something for you. Join the journey now! πŸš€

Topics

Resources

License

Stars

Watchers

Forks