Skip to content

A collection of code snippets from the publication Daily Dose of Data Science on Substack: https://avichawla.substack.com.

Notifications You must be signed in to change notification settings

Wuifi/Daily-Dose-of-Data-Science

Β 
Β 

Repository files navigation

View on GitHub View on Medium Daily Dose of Data Science View on LinkedIn

alt text

Daily Dose of Data Science is a publication on Substack that brings together intriguing frameworks, libraries, technologies, and tips that make the life cycle of a Data Science project effortless.

This repository is a collection of all the code snippets presented in my publication. If you want to receive these tips in your mailbox daily, you can subscribe to my Substack newsletter.

Star History

Star History Chart

Run These Code Snippets on Your Local Machine

To download the tips listed here, you can clone this repo.

git clone https://github.com/ChawlaAvi/Daily-Dose-of-Data-Science

Table of Contents

  1. Pandas
  2. Jupyter Tips
  3. Python
  4. Plotting
  5. NumPy
  6. Memory Optimization
  7. Cool Tools
  8. Run-time Optimization
  9. Sklearn
  10. Debugging
  11. Missing Data
  12. ML-AI News
  13. Machine Learning
  14. Statistics
  15. Testing
  16. Terminal
  17. Documents
  18. Animations

Pandas

Title Notebook Substack Article
One-Minute Guide To Becoming a Polars-savvy Data Scientist πŸ”— πŸ”—
Avoid Using Pandas' Apply() Method At All Times πŸ”— πŸ”—
Pandas vs Polars β€” Run-time and Memory Comparison πŸ”— πŸ”—
A Lesser-Known Feature of the Merge Method in Pandas πŸ”— πŸ”—
A Highly Overlooked Approach To Analysing Pandas DataFrames πŸ”— πŸ”—
The Most Common Misconception About Inplace Operations in Pandas πŸ”— πŸ”—
Become A Bilingual Data Scientist With These Pandas to SQL Translations πŸ”— πŸ”—
Avoid This Costly Mistake When Indexing A DataFrame πŸ”— πŸ”—
AutoProfiler: Automatically Profile Your DataFrame As You Work πŸ”— πŸ”—
Why You Should Avoid Appending Rows To A DataFrame πŸ”— πŸ”—
Are You Sure You Are Using The Correct Pandas Terminologies? πŸ”— πŸ”—
If You Are Not Able To Code A Vectorized Approach, Try This. πŸ”— πŸ”—
Why Are We Typically Advised To Never Iterate Over A DataFrame? πŸ”— πŸ”—
PyGWalker: Analyze Pandas Dataframe in Jupyter using a Tableau-style Interface πŸ”— πŸ”—
A Simple Trick to Make The Most Out of Pivot Tables in Pandas πŸ”— πŸ”—
Never Worry About Parsing Errors Again While Reading CSV with Pandas πŸ”— πŸ”—
An Interesting and Lesser-Known Way To Create Plots Using Pandas πŸ”— πŸ”—
Generate Helpful Hints As You Write Your Pandas Code πŸ”— πŸ”—
Speed-up Parquet I/O of Pandas by 5x πŸ”— πŸ”—
Stop Using The Describe Method in Pandas. Instead, use Skimpy. πŸ”— πŸ”—
Stop Using The Describe Method in Pandas. Instead, use Summarytools. πŸ”— πŸ”—
Analyze A Pandas DataFrame Without Code πŸ”— πŸ”—
70x Faster Pandas By Changing Just One Line of Code πŸ”— πŸ”—
Reduce Memory Usage Of A Pandas DataFrame By 90% πŸ”— πŸ”— πŸ”—
Speed-up Pandas Apply 5x with NumPy πŸ”— πŸ”—
A Lesser-Known Feature of Apply Method In Pandas πŸ”— πŸ”—
Create Pandas DataFrame from Dataclass πŸ”— πŸ”—
Run SQL in Jupyter To Analyze A Pandas DataFrame πŸ”— πŸ”—
When You Should Not Use the head() Method In Pandas πŸ”— πŸ”—
Three Lesser-known Tips For Reading a CSV File Using Pandas πŸ”— πŸ”—
The Best File Format To Store A Pandas DataFrame πŸ”— πŸ”— πŸ”—
Lesser-Known Feature of the Merge Method in Pandas πŸ”— πŸ”—
The Best Way to Use Apply() in Pandas πŸ”— πŸ”—
A No-code Tool To Understand Your Data Quickly πŸ”— πŸ”—
Display Progress Bar With Apply() in Pandas πŸ”— πŸ”—
Supercharge value_counts() Method in Pandas With Sidetable πŸ”— πŸ”—
Explore CSV Data Right From The Terminal πŸ”— πŸ”—
Define the Correct DataType for Categorical Columns πŸ”— πŸ”— πŸ”—
Don't Create Conditional Columns in Pandas with Apply πŸ”— πŸ”—
Write Your Own Flavor Of Pandas πŸ”— πŸ”—
Create DataFrame Hassle-free By Using Clipboard πŸ”— πŸ”—
Alter the Datatype of Multiple Columns at Once πŸ”— πŸ”—
Why you should not dump DataFrames to a CSV πŸ”— πŸ”— πŸ”—
Why You Should Not Read CSVs with Pandas πŸ”— πŸ”— πŸ”—
Parallelize Pandas Apply() With Swifter πŸ”— πŸ”—
A Hidden Feature of Describe Method In Pandas πŸ”— πŸ”—
Enrich Your Notebook With Interactive Controls πŸ”— πŸ”—
Data Analysis Using No-Code Pandas In Jupyter πŸ”— πŸ”—
Create Pivot Tables, Aggregations and Plots Without Any Code πŸ”— πŸ”— πŸ”—
Parallelize Pandas with Pandarallel πŸ”— πŸ”— πŸ”—
Pretty Plotting With Pandas πŸ”— πŸ”—
How to Read Multiple CSV Files Efficiently πŸ”— πŸ”— πŸ”—
Configure Sklearn To Output Pandas DataFrame πŸ”— πŸ”—
Datatype For Handling Missing Valued Columns in Pandas πŸ”— πŸ”— πŸ”—
Vectorization Does Not Always Guarantee Better Performance πŸ”— πŸ”—

Jupyter Tips

Title Notebook Substack Article
Declutter Your Jupyter Notebook Using Interactive Controls πŸ”— πŸ”—
πŸš€ Jupyter Notebook + Spreadsheet + AI β€” All in One Place With Mito πŸ”— πŸ”—
The Coolest GitHub-Colab Integration You Would Ever See πŸ”— πŸ”—
Break the Linear Presentation of Notebooks With Stickyland πŸ”— πŸ”—
Restart Jupyter Kernel Without Losing Variables πŸ”— πŸ”—
Annotate Data With The Click Of A Button Using Pigeon πŸ”— πŸ”—
Build Elegant Web Apps Right From Jupyter Notebook with Mercury πŸ”— πŸ”—
Supercharge Your Jupyter Kernel With ipyflow πŸ”— πŸ”—
PyGWalker: Analyze Pandas Dataframe in Jupyter using a Tableau-style Interface πŸ”— πŸ”—
Draw The Data You Are Looking For In Seconds πŸ”— πŸ”—
Never Search Jupyter Notebooks Manually Again To Find Your Code πŸ”— πŸ”—
Stop Previewing Raw DataFrames. Instead, Use DataTables πŸ”— πŸ”—
Label Your Data With The Click Of A Button πŸ”— πŸ”—
The Coolest Jupyter Notebook Hack πŸ”— πŸ”—
View Documentation in Jupyter Notebook πŸ”— πŸ”—
Get Notified When Jupyter Cell Has Executed πŸ”— πŸ”—
Clear Cell Output In Jupyter Notebook During Run-time πŸ”— πŸ”—
CodeSquire: The AI Coding Assistant You Should Use Over GitHub Copilot πŸ”— πŸ”—
Find Your Code Hiding In Some Jupyter Notebook With Ease πŸ”— πŸ”—
Enrich Your Notebook With Interactive Controls πŸ”— πŸ”—
Data Analysis Using No-Code Pandas In Jupyter πŸ”— πŸ”—
Create Pivot Tables, Aggregations and Plots Without Any Code πŸ”— πŸ”— πŸ”—
Restart Notebook Without Losing Variables πŸ”— πŸ”— πŸ”—
Retrieve Previously Computed Output In Jupyter Notebook πŸ”— πŸ”— πŸ”—
Transfer Variables Between Jupyter Notebooks πŸ”— πŸ”— πŸ”—

Python

Title Notebook Substack Article
7 Elegant Usages of Underscore in Python πŸ”— πŸ”—
How To Enforce Type Hints in Python? πŸ”— πŸ”—
A Common Misconception About Deleting Objects in Python πŸ”— πŸ”—
What Makes The Join() Method Blazingly Faster Than Iteration? πŸ”— πŸ”—
A Hidden Feature of a Popular String Method in Python πŸ”— πŸ”—
Execute Python Project Directory as a Script πŸ”— πŸ”—
Improve Python Run-time Without Changing A Single Line of Code πŸ”— πŸ”—
A Lesser-Known Difference Between For-Loops and List Comprehensions πŸ”— πŸ”—
A Lesser-Known Difference Between For-Loops and List Comprehensions πŸ”— πŸ”—
Magic Methods: An Underrated Gem of Python OOP πŸ”— πŸ”—
9 Command Line Flags To Run Python Scripts More Flexibly πŸ”— πŸ”—
Use Custom Python Objects In A Boolean Context πŸ”— πŸ”—
You Were Probably Given Incomplete Info About A Tuple's Immutability πŸ”— πŸ”—
A Counterintuitive Thing About Python Dictionaries πŸ”— πŸ”—
A Counterintuitive Thing About Python Dictionaries πŸ”— πŸ”—
Probably The Fastest Way To Execute Your Python Code πŸ”— πŸ”—
A Counterintuitive Fact About Python Functions πŸ”— πŸ”—
Manipulating Mutable Objects In Python Can Get Confusing At Times πŸ”— πŸ”—
Most Python Programmers Don't Know This About Python OOP πŸ”— πŸ”—
You Can Add a List As a Dictionary's Key (Technically)! πŸ”— πŸ”—
Why Python Does Not Offer True OOP Encapsulation πŸ”— πŸ”—
Most Python Programmers Don't Know This About Python For-loops πŸ”— πŸ”—
How To Enable Function Overloading In Python πŸ”— πŸ”—
The Right Way to Roll Out Library Updates in Python πŸ”— πŸ”—
F-strings Are Much More Versatile Than You Think πŸ”— πŸ”—
A Single Line That Will Make Your Python Code Faster πŸ”— πŸ”—
Make Dot Notation More Powerful in Python πŸ”— πŸ”—
An Elegant Way To Perform Shutdown Tasks in Python πŸ”— πŸ”—
What Are Class Methods and When To Use Them? πŸ”— πŸ”—
Hide Attributes While Printing A Dataclass Object πŸ”— πŸ”—
List : Tuple :: Set : ? πŸ”— πŸ”—
Post_init: Add Attributes To A Dataclass Post Initialization πŸ”— πŸ”—
Simplify Your Functions With Partial Functions πŸ”— πŸ”—
DotMap: A Better Alternative to Python Dictionary πŸ”— πŸ”—
Prevent Wild Imports With all in Python πŸ”— πŸ”—
Performance Comparison of Python 3.11 and Python 3.10 πŸ”— πŸ”—
Why 256 is 256 But 257 is not 257? πŸ”— πŸ”—
Make a Class Object Behave Like a Function πŸ”— πŸ”—
Lesser-known Feature of Pickle Files πŸ”— πŸ”—
Specify Loops and Runs In %%timeit πŸ”— πŸ”—
Don't Use time.time() To Measure Execution Time πŸ”— πŸ”—
Import Your Python Package as a Module πŸ”— πŸ”—
Fine-grained Error Tracking With Python 3.11 πŸ”— πŸ”—
Run Python Project Directory As A Script πŸ”— πŸ”—
Use Slotted Class To Improve Your Python Code πŸ”— πŸ”—
Using Dictionaries In Place of If-conditions πŸ”— πŸ”—
In Defense of Match-case Statements in Python πŸ”— πŸ”—

Plotting

Title Notebook Substack Article
Don't Overuse Scatter, Line and Bar Plots. Try These Four Elegant Alternatives. πŸ”— πŸ”—
Sankey Diagrams: An Underrated Gem of Data Visualization πŸ”— πŸ”—
Enrich Your Heatmaps With This Simple Trick πŸ”— πŸ”—
The Coolest Matplotlib Hack to Create Subplots Intuitively πŸ”— πŸ”—
Waterfall Charts: A Better Alternative to Line/Bar Plot πŸ”— πŸ”— πŸ”—
Enrich Your Confusion Matrix With A Sankey Diagram πŸ”— πŸ”—
A Simple One-Liner to Create Professional Looking Matplotlib Plots πŸ”— πŸ”—
Visualise The Change In Rank Over Time With Bump Charts πŸ”— πŸ”—
A Simple Trick That Significantly Improves The Quality of Matplotlib Plots πŸ”— πŸ”—
A Lesser-known Feature of Creating Plots with Plotly πŸ”— πŸ”—
A Little Bit Of Extra Effort Can Hugely Transform Your Basic Matplotlib Plots πŸ”— πŸ”—
Interactively Visualise A Decision Tree With A Sankey Diagram πŸ”— πŸ”—
Use Histograms With Caution. They Are Highly Misleading! πŸ”— πŸ”—
Three Simple Ways To (Instantly) Make Your Scatter Plots Clutter Free πŸ”— πŸ”—
Matplotlib Has Numerous Hidden Gems. Here's One of Them. πŸ”— πŸ”—
A Simple Trick That Will Make Heatmaps More Elegant πŸ”— πŸ”—
The Limitations Of Heatmap That Are Slowing Down Your Data Analysis πŸ”— πŸ”—
An Underrated Technique To Improve Your Data Visualizations πŸ”— πŸ”—
Who Said Matplotlib Cannot Create Interactive Plots? πŸ”— πŸ”—
Don't Create Messy Bar Plots. Instead, Try Bubble Charts! πŸ”— πŸ”—
Use Box Plots With Caution! They May Be Misleading. πŸ”— πŸ”—
An Underrated Technique To Create Better Data Plots πŸ”— πŸ”—
An Interesting and Lesser-Known Way To Create Plots Using Pandas πŸ”— πŸ”—
Style Matplotlib Plots To Make Them More Attractive πŸ”— πŸ”—
Simple One-Liners to Preview a Decision Tree Using Sklearn πŸ”— πŸ”—
Create Data Plots Right From The Terminal πŸ”— πŸ”—
Make Your Matplotlib Plots More Professional πŸ”— πŸ”—
Perfplot: Measure, Visualize and Compare Run-time With Ease πŸ”— πŸ”—
Prettify Word Clouds In Python πŸ”— πŸ”—
Calendar Map As A Richer Alternative to Line Plot πŸ”— πŸ”—
Density Plot As A Richer Alternative to Scatter Plot πŸ”— πŸ”— πŸ”—
Python One-Liner To Create Sketchy Hand-drawn Plots πŸ”— πŸ”—
Create a Moving Bubbles Chart in Python πŸ”— πŸ”—
Visualizing Google Search Trends of 2022 using Python πŸ”— πŸ”—
Create A Racing Bar Chart In Python πŸ”— πŸ”—
Elegantly Plot the Decision Boundary of a Classifier πŸ”— πŸ”—
Dot Plot: A Potential Alternative to Bar Plot πŸ”— πŸ”— πŸ”—
Hexbin Plots As A Richer Alternative to Scatter Plots πŸ”— πŸ”— πŸ”—
Enrich Your Notebook With Interactive Controls πŸ”— πŸ”—
Regression Plot Made Easy with Plotly πŸ”— πŸ”—
Pretty Plotting With Pandas πŸ”— πŸ”—
Polynomial Linear Regression Plot Made Easy With Seaborn πŸ”— πŸ”—
Analyse Flow Data With Sankey Diagrams πŸ”— πŸ”—
Waterfall Charts: A Better Alternative to Line/Bar Plot πŸ”— πŸ”— πŸ”—

NumPy

Title Notebook Substack Article
A Major Limitation of NumPy Which Most Users Aren't Aware Of πŸ”— πŸ”—
Beware of This Unexpected Behaviour of NumPy Methods πŸ”— πŸ”—
Speedup NumPy Methods 25x With Bottleneck πŸ”— πŸ”—
Speed-up NumPy 20x with Numexpr πŸ”— πŸ”—
An Elegant Way To Perform Matrix Multiplication πŸ”— πŸ”—
Difference Between Dot and Matmul in NumPy πŸ”— πŸ”—
Don't Print NumPy Arrays! Use Lovely-NumPy Instead πŸ”— πŸ”—
Polynomial Linear Regression with NumPy πŸ”— πŸ”—

Memory Optimization

Title Notebook Substack Article
70x Faster Pandas By Changing Just One Line of Code πŸ”— πŸ”—
Reduce Memory Usage Of A Pandas DataFrame By 90% πŸ”— πŸ”— πŸ”—
The Best File Format To Store A Pandas DataFrame πŸ”— πŸ”— πŸ”—
Define the Correct DataType for Categorical Columns πŸ”— πŸ”— πŸ”—
Datatype For Handling Missing Valued Columns in Pandas πŸ”— πŸ”— πŸ”—
Save Memory with Python Generators πŸ”— πŸ”—

Cool Tools

Title Notebook Substack Article
CNN Explainer: Interactively Visualize a Convolutional Neural Network πŸ”— πŸ”—
Break the Linear Presentation of Notebooks With Stickyland πŸ”— πŸ”—
Annotate Data With The Click Of A Button Using Pigeon πŸ”— πŸ”—
Mito Just Got Supercharged With AI! πŸ”— πŸ”—
PyGWalker: Analyze Pandas Dataframe in Jupyter using a Tableau-style Interface πŸ”— πŸ”—
Supercharge Shell With Python Using Xonsh πŸ”— πŸ”—
Draw The Data You Are Looking For In Seconds πŸ”— πŸ”—
Preview Your README File Locally In GitHub Style πŸ”— πŸ”—
This GUI Tool Can Possibly Save You Hours Of Manual Work πŸ”— πŸ”—
Stop Previewing Raw DataFrames. Instead, Use DataTables. πŸ”— πŸ”—
Converting Python To LaTeX Has Possibly Never Been So Simple πŸ”— πŸ”—
Label Your Data With The Click Of A Button πŸ”— πŸ”—
Analyze A Pandas DataFrame Without Code πŸ”— πŸ”—
A No-Code Online Tool To Explore and Understand Neural Networks πŸ”— πŸ”—
Speed-up NumPy 20x with Numexpr πŸ”— πŸ”—
Debugging Made Easy With PySnooper πŸ”— πŸ”—
Deep Learning Network Debugging Made Easy πŸ”— πŸ”—
CodeSquire: The AI Coding Assistant You Should Use Over GitHub Copilot πŸ”— πŸ”—
Find Unused Python Code With Ease πŸ”— πŸ”—
Enrich Your Notebook With Interactive Controls πŸ”— πŸ”—
Data Analysis Using No-Code Pandas In Jupyter πŸ”— πŸ”—
Modify Python Code During Run-Time πŸ”— πŸ”— πŸ”—
Modify Function During Run-Time πŸ”— πŸ”— πŸ”—
Importing Modules Made Easy with Pyforest πŸ”— πŸ”—
Create Pivot Tables, Aggregations and Plots Without Any Code πŸ”— πŸ”— πŸ”—

Run-time Optimization

Title Notebook Substack Article
Pandas vs Polars β€” Run-time and Memory Comparison πŸ”— πŸ”—
The Limitation of KMeans Which Is Often Overlooked by Many πŸ”— πŸ”—
Most Sklearn Users Don't Know This About Its LinearRegression Implementation πŸ”— πŸ”—
Probably The Fastest Way To Execute Your Python Code πŸ”— πŸ”—
Why Are We Typically Advised To Never Iterate Over A DataFrame? πŸ”— πŸ”—
Speed-up Parquet I/O of Pandas by 5x πŸ”— πŸ”—
A Single Line That Will Make Your Python Code Faster πŸ”— πŸ”—
Make Sklearn KMeans 20x times faster πŸ”— πŸ”—
Speed-up NumPy 20x with Numexpr πŸ”— πŸ”—
The Best File Format To Store A Pandas DataFrame πŸ”— πŸ”— πŸ”—
The Best Way to Use Apply() in Pandas πŸ”— πŸ”—
Don't Create Conditional Columns in Pandas with Apply πŸ”— πŸ”—
Why you should not dump DataFrames to a CSV πŸ”— πŸ”— πŸ”—
Parallelize Pandas Apply() With Swifter πŸ”— πŸ”—
Parallelize Pandas with Pandarallel πŸ”— πŸ”— πŸ”—
How to Read Multiple CSV Files Efficiently πŸ”— πŸ”— πŸ”—

Sklearn

Title Notebook Substack Article
Why Sklearn's Linear Regression Has No Hyperparameters? πŸ”— πŸ”—
Scikit-LLM: Integrate Sklearn API with Large Language Models πŸ”— πŸ”—
Most Sklearn Users Don't Know This About Its LinearRegression Implementation πŸ”— πŸ”—
A Lesser-Known Feature of Sklearn To Train Models on Large Datasets πŸ”— πŸ”—
Sklearn One-liner to Generate Synthetic Data πŸ”— πŸ”—
Skorch: Use Scikit-learn API on PyTorch Models πŸ”— πŸ”—
Make Sklearn KMeans 20x times faster πŸ”— πŸ”—
Build Baseline Models Effortlessly With Sklearn πŸ”— πŸ”—
Polynomial Linear Regression with NumPy πŸ”— πŸ”—
An Elegant Way to Import Metrics From Sklearn πŸ”— πŸ”—
Feature Tracking Made Simple In Sklearn Transformers πŸ”— πŸ”—
Configure Sklearn To Output Pandas DataFrame πŸ”— πŸ”—

Debugging

Title Notebook Substack Article
Debugging Made Easy With PySnooper πŸ”— πŸ”—
Don't use print() to debug your code. πŸ”— πŸ”— πŸ”—
Inspect Program Flow with IceCream πŸ”— πŸ”— πŸ”—
Lesser-known Feature of f-strings in Python πŸ”— πŸ”—

Missing Data

Title Notebook Substack Article
Handle Missing Data With Missingno πŸ”— πŸ”—
Datatype For Handling Missing Valued Columns in Pandas πŸ”— πŸ”—

ML-AI News

Title Notebook Substack Article
Now You Can Use DALLΒ·E With OpenAI API πŸ”— πŸ”—

Machine Learning

Title Notebook Substack Article
Decision Trees ALWAYS Overfit. Here's A Lesser-Known Technique To Prevent It. πŸ”— πŸ”—
Evaluate Clustering Performance Without Ground Truth Labels πŸ”— πŸ”—
The Most Common Misconception About Continuous Probability Distributions πŸ”— πŸ”—
A Common Misconception About Feature Scaling and Standardization πŸ”— πŸ”—
Random Forest May Not Need An Explicit Validation Set For Evaluation πŸ”— πŸ”—
A Visual and Overly Simplified Guide To Bagging and Boosting πŸ”— πŸ”—
10 Most Common (and Must-Know) Loss Functions in ML πŸ”— πŸ”—
A Visual and Overly Simplified Guide To Bagging and Boosting πŸ”— πŸ”—
10 Most Common (and Must-Know) Loss Functions in ML πŸ”— πŸ”—
Theil-Sen Regression: The Robust Twin of Linear Regression πŸ”— πŸ”—
The Limitations Of Elbow Curve And What You Should Replace It With πŸ”— πŸ”—
21 Most Important (and Must-know) Mathematical Equations in Data Science πŸ”— πŸ”—
Try This If Your Linear Regression Model is Underperforming πŸ”— πŸ”—
The Limitation of KMeans Which Is Often Overlooked by Many πŸ”— πŸ”—
Nine Most Important Distributions in Data Science πŸ”— πŸ”—
The Limitation of Linear Regression Which is Often Overlooked By Many πŸ”— πŸ”—
The Limitation of Linear Regression Which is Often Overlooked By Many πŸ”— πŸ”—
A Reliable and Efficient Technique To Measure Feature Importance πŸ”— πŸ”—
Does Every ML Algorithm Rely on Gradient Descent? [πŸ”—](https://github.com/ChawlaAvi/Daily-Dose-of-Data-Science/blob/main/Machine%20Learning/Does Every ML Algorithm Rely on Gradient Descent?.ipynb) πŸ”—
Visualize The Performance Of Linear Regression With This Simple Plot πŸ”— πŸ”—
Confidence Interval and Prediction Interval Are Not The Same πŸ”— πŸ”—
The Ultimate Categorization of Performance Metrics in ML πŸ”— πŸ”—
The Most Overlooked Problem With One-Hot Encoding πŸ”— πŸ”—
9 Most Important Plots in Data Science πŸ”— πŸ”—
Is Categorical Feature Encoding Always Necessary Before Training ML Models? πŸ”— πŸ”—
The Counterintuitive Behaviour of Training Accuracy and Training Loss πŸ”— πŸ”—
A Highly Overlooked Point In The Implementation of Sigmoid Function πŸ”— πŸ”—
The Ultimate Categorization of Clustering Algorithms πŸ”— πŸ”—
A Lesser-Known Feature of Sklearn To Train Models on Large Datasets πŸ”— πŸ”—
Visualize The Performance Of Any Linear Regression Model With This Simple Plot πŸ”— πŸ”—
How To Truly Use The Train, Validation and Test Set πŸ”— πŸ”—
The Advantages and Disadvantages of PCA To Consider Before Using It πŸ”— πŸ”—
Loss Functions: An Algorithm-wise Comprehensive Summary πŸ”— πŸ”—
Is Data Normalization Always Necessary Before Training ML Models? πŸ”— πŸ”—
A Visual Guide to Stochastic, Mini-batch, and Batch Gradient Descent πŸ”— πŸ”—
The Taxonomy Of Regression Algorithms That Many Don't Bother To Remember πŸ”— πŸ”—
The Limitation of PCA Which Many Folks Often Ignore πŸ”— πŸ”—
Breathing KMeans: A Better and Faster Alternative to KMeans πŸ”— πŸ”—
How Many Dimensions Should You Reduce Your Data To When Using PCA? πŸ”— πŸ”—
A Visual Guide To Sampling Techniques in Machine Learning πŸ”— πŸ”—
A Visual and Overly Simplified Guide to PCA πŸ”— πŸ”—
The Limitation Of Euclidean Distance Which Many Often Ignore πŸ”— πŸ”—
Visualising The Impact Of Regularisation Parameter πŸ”— πŸ”—
A (Highly) Important Point to Consider Before You Use KMeans Next Time πŸ”— πŸ”—
Is Class Imbalance Always A Big Problem To Deal With? πŸ”— πŸ”—
A Visual Comparison Between Locality and Density-based Clustering πŸ”— πŸ”—
Why Don't We Call It Logistic Classification Instead? πŸ”— πŸ”—
A Typical Thing About Decision Trees Which Many Often Ignore πŸ”— πŸ”—
Always Validate Your Output Variable Before Using Linear Regression πŸ”— πŸ”—
Why Is It Important To Shuffle Your Dataset Before Training An ML Model πŸ”— πŸ”—
Why Are We Typically Advised To Set Seeds for Random Generators? πŸ”— πŸ”—
This Small Tweak Can Significantly Boost The Run-time of KMeans πŸ”— πŸ”—
Most ML Folks Often Neglect This While Using Linear Regression πŸ”— πŸ”—
Is This The Best Animated Guide To KMeans Ever? πŸ”— πŸ”—
An Effective Yet Underrated Technique To Improve Model Performance πŸ”— πŸ”—
How to Encode Categorical Features With Many Categories? πŸ”— πŸ”—
Why KMeans May Not Be The Apt Clustering Algorithm Always πŸ”— πŸ”—
Skorch: Use Scikit-learn API on PyTorch Models πŸ”— πŸ”—
A No-Code Online Tool To Explore and Understand Neural Networks πŸ”— πŸ”—
Make Sklearn KMeans 20x times faster πŸ”— πŸ”—
Deep Learning Network Debugging Made Easy πŸ”— πŸ”—
Build Baseline Models Effortlessly With Sklearn πŸ”— πŸ”—
Polynomial Linear Regression with NumPy πŸ”— πŸ”—

Statistics

Title Notebook Substack Article
Be Cautious Before Drawing Any Conclusions Using Summary Statistics πŸ”— πŸ”—
The Limitation Of Pearson Correlation Which Many Often Ignore πŸ”— πŸ”—
Pandas and NumPy Return Different Values for Standard Deviation. Why? πŸ”— πŸ”—
Why Correlation (and Other Statistics) Can Be Misleading πŸ”— πŸ”—

Testing

Title Notebook Substack Article
Generate Your Own Fake Data In Seconds πŸ”— πŸ”—

Terminal

Title Notebook Substack Article
Supercharge Shell With Python Using Xonsh πŸ”— πŸ”—
Most Command-line Users Don't Know This Cool Trick About Using Terminals πŸ”— πŸ”—
Never Refactor Your Code Manually Again. Instead, Use Sourcery! πŸ”— πŸ”—
Create Data Plots Right From The Terminal πŸ”— πŸ”—
Visualize Commit History of Git Repo With Beautiful Animations πŸ”— πŸ”—
How Would You Identify Fuzzy Duplicates In A Data With Million Records? πŸ”— πŸ”—
Automated Code Refactoring With Sourcery πŸ”— πŸ”— πŸ”—
Explore CSV Data Right From The Terminal πŸ”— πŸ”—

Documents

Title Document Substack Article
Daily Dose of Data Science - Full Archive πŸ”— πŸ”—
35 Hidden Python Libraries That Are Absolute Gems πŸ”— πŸ”—
40 Open-Source Tools to Supercharge Your Pandas Workflow πŸ”— πŸ”—
37 Hidden Python Libraries That Are Absolute Gems πŸ”— πŸ”—
10 Automated EDA Tools That Will Save You Hours Of (Tedious) Work πŸ”— πŸ”—
30 Python Libraries to (Hugely) Boost Your Data Science Productivity πŸ”— πŸ”—

Animations

Title Notebook Substack Video
Visualizing The Data Transformation of a Neural Network πŸ”— πŸ”—

About

A collection of code snippets from the publication Daily Dose of Data Science on Substack: https://avichawla.substack.com.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.8%
  • Other 0.2%