Collection of useful data science topics along with articles and videos.
To receive a condensed overview of these tools and additional resources, sign up for CodeCut's free PDF guide. This comprehensive 264-page document covers over 100 essential data science tools, providing you with a valuable reference for your work.
To download the code in this repo, you can simply use git clone
git clone https://github.com/khuyentran1401/Data-science
- MLOps
- Data Management Tools
- Testing
- Productive Tools
- Python Helper Tools
- Tools for Deployment
- Speed-up Tools
- Math Tools
- Machine Learning
- Natural Language Processing
- Computer Vision
- Time Series
- Feature Engineering
- Visualization
- Mathematical Programming
- Scraping
- Python
- Logging and Debugging
- Linear Algebra
- Data Structure
- Statistics
- Web Applications
- Share Insights
- Cool Tools
- Learning Tips
- Productive Tips
- VSCode
- Book Review
- Data Science Portfolio
Title | Article | Repository | Video |
---|---|---|---|
Stop Hard Coding in a Data Science Project β Use Configuration Files Instead | π | π | π |
Poetry: A Better Way to Manage Python Dependencies | π | π | |
Git for Data Scientists: Learn Git through Practical Examples | π | π | |
Introduction to Weight & Biases: Track and Visualize your Machine Learning Experiments in 3 Lines of Code | π | π | |
Kedro β A Python Framework for Reproducible Data Science Project | π | π | |
Orchestrate a Data Science Project in Python With Prefect | π | π | |
Orchestrate Your Data Science Project with Prefect 2.0 | π | π | π |
DagsHub: a GitHub Supplement for Data Scientists and ML Engineers | π | π | |
4 pre-commit Plugins to Automate Code Reviewing and Formatting in Python | π | π | π |
BentoML: Create an ML Powered Prediction Service in Minutes | π | π | π |
How to Structure a Data Science Project for Maintainability (with DVC) | π | π | π |
How to Structure an ML Project for Reproducibility and Maintainability (with Prefect) | π | π | |
GitHub Actions in MLOps: Automatically Check and Deploy Your ML Model | π | π | |
Create Robust Data Pipelines with Prefect, Docker, and GitHub | π | π | |
Create a Maintainable Data Pipeline with Prefect and DVC | π | π | |
Build a Full-Stack ML Application With Pydantic And Prefect | π | π | π |
Streamline Code Updates with DVC and GitHub Actions | π | π | π |
Create Observable and Reproducible Notebooks with Hex | π | π | π |
Build Reliable Machine Learning Pipelines with Continuous Integration | π | π | π |
Automate Machine Learning Deployment with GitHub Actions | π | π | π |
How to Build a Fully Automated Data Drift Detection Pipeline | π | π | π |
Title | Article | Repository | Video |
---|---|---|---|
Introduction to DVC: Data Version Control Tool for Machine Learning Projects | π | π | π |
Great Expectations: Always Know What to Expect From Your Data | π | π | |
Validate Your pandas DataFrame with Pandera | π | π | π |
Introduction to Schema: A Python Libary to Validate your Data | π | π | |
How to Create Fake Data with Faker | π | π | |
Hypothesis and Pandera: Generate Synthesis Pandas DataFrame for Testing | π | π | π |
What is dbt (data build tool) and When should you use it? | π | π | π |
Streamline dbt Model Development with Notebook-Style Workspace | π | π | π |
Title | Article | Repository | Video |
---|---|---|---|
Pytest for Data Scientists | π | π | π |
4 Lessor-Known Yet Awesome Tips forΒ Pytest | π | π | |
DeepDiff β Recursively Find and Ignore Trivial Differences Using Python | π | π | |
Checklist β Behavioral Testing of NLP Models | π | π | |
Detect Defects in a Data Pipeline Early with Validation and Notifications | π | π | π |
Write Readable Tests for Your Machine Learning Models with Behave | π | π | π |
Title | Article | Repository |
---|---|---|
3 Tools to Track and Visualize the Execution of your Python Code | π | π |
2 Tools to Automatically Reload when Python Files Change | π | π |
3 Ways to Get Notified with Python | π | π |
How to Create Reusable Command-Line | π | |
How to Strip Outputs and Execute Interactive Code in a Python Script | π | π |
Sending Slack Notifications in Python with Prefect | π | π |
Title | Article | Repository | Video |
---|---|---|---|
Pydash: A Kitchen Sink of Missing Python Utilities | π | π | |
Write Clean Python Code Using Pipes | π | π | π |
Introducing FugueSQL β SQL for Pandas, Spark, and Dask DataFrames | π | π | |
Fugue and DuckDB: Fast SQL Code in Python | π | π | |
Simplify Data Science Workflows on BigQuery with Fugue and Python | π | π |
Title | Article | Repository |
---|---|---|
How to Effortlessly Publish your Python Package to PyPI Using Poetry | π | π |
Typer: Build Powerful CLIs in One Line of Code using Python | π | π |
Title | Article | Repository |
---|---|---|
Cython-A Speed-Up Tool for your Python Function | π | π |
Train your Machine Learning Model 150x Faster with cuML | π | π |
Title | Article | Repository |
---|---|---|
SymPy: Symbolic Computation in Python | π | π |
Title | Article | Repository | Video |
---|---|---|---|
How to Monitor And Log your Machine Learning Experiment Remotely with HyperDash | π | π | |
How to Efficiently Fine-Tune your Machine Learning Models | π | π | |
How to Learn Non-linear Dataset with Support Vector Machines | π | π | |
Introduction to IBM Federated Learning: A Collaborative Approach to Train ML Models on Private Data | π | π | |
3 Steps to Improve your Efficiency when Hypertuning ML Models | π | ||
human-learn: Create a Human Learning Model by Drawing | π | π | |
Patsy: Build Powerful Features with Arbitrary Python Code | π | π | |
SHAP: Explain Any Machine Learning Model in Python | π | π | |
Predict Movie Ratings with User-Based Collaborative Filtering | π | π | |
River: Online Machine Learning in Python | π | π | π |
Human-Learn: Rule-Based Learning as an Alternative to Machine Learning | π | π | π |
Title | Article | Repository | Video |
---|---|---|---|
Sentiment Analysis of LinkedInΒ Messages | π | π | |
Find Common Words in Article with Python Module Newspaper and NLTK | π | π | |
How to Tokenize Tweets with Python | π | π | |
How to Solve Analogies with Word2Vec | π | π | |
What is PyTorch | π | π | |
Convolutional Neural Network in Natural Language Processing | π | π | |
Supercharge your Python String with TextBlob | π | π | π |
pyLDAvis: Topic Modelling Exploration Tool That Every NLP Data Scientist Should Know | π | π | |
Streamlit and spaCy: Create an App to Predict Sentiment and Word Similarities with Minimal Domain Knowledge | π | π | |
Build a Robust Conversational Assistant with Rasa | π | π | |
I Analyzed 2k Data Scientist and Data Engineer Jobs and This is What I Found | π | π | |
Checklist β Behavioral Testing of NLP Models | π | π | |
PRegEx: Write Human-Readable Regular Expressions in Python | π | π | π |
Texthero: Text Preprocessing, Representation, and Visualization for a pandas DataFrame | π | π |
Title | Article | Repository |
---|---|---|
How to Create an App to Classify Dogs Using fastai and Streamlit | π | π |
Title | Article | Repository |
---|---|---|
Kats: a Generalizable Framework to Analyze Time Series Data in Python | π | π |
How to Detect Seasonality, Outliers, and Changepoints in Your Time Series | π | π |
4 Tools to Automatically Extract Data from Datetime in Python | π | π |
Title | Article | Repository | Video |
---|---|---|---|
3 Ways to Extract Features from Dates with Python | π | π | |
Similarity Encoding for Dirty Categories Using dirty_cat | π | π | |
Snorkel β A Human-In-The-Loop Platform to Build Training Data | π | π | π |
Title | Article | Repository | Video |
---|---|---|---|
How to Embed Interactive Charts on your Articles and Personal Website | π | π | |
What I Learned from Scraping 15k Data Science Articles on Medium | π | π | |
How to Create Interactive Plots with Altair | π | π | |
How to Create a Drop-Down Menu and a Slide Bar for your Favorite Visualization Tool | π | π | |
I Scraped more than 1k Top Machine Learning Github Profiles and this is what I Found | π | π | |
Top 6 Python Libraries for Visualization: Which one to Use? | π | π | |
Introduction to Yellowbrick: A Python Library to Visualize the Prediction of your Machine Learning Model | π | π | |
Visualize Gender-Specific Tweets with Scattertext | π | π | |
Visualize Your Teamβs Projects Using Python Gantt Chart | π | π | |
How to Create Bindings and Conditions Between Multiple Plots Using Altair | π | π | |
How to Sketch your Data Science Ideas With Excalidraw | π | ||
Pyvis: Visualize Interactive Network Graphs in Python | π | π | π |
Build and Analyze Knowledge Graphs with Diffbot | π | ||
Observe The Friend Paradox in Facebook Data Using Python | π | π | |
What skills and backgrounds do data scientists have in common? | π | π | |
Visualize Similarities Between Companies With Graph Database | π | π | |
Visualize GitHub Social Network with PyGraphistry | π | π | |
Find the Top Bootcamps for Data Professionals From Over 5k Profiles | π | π | |
floWeaver β Turn Flow Data Into a Sankey Diagram In Python | π | π | |
atoti β Build a BI Platform in Python | π | π | |
Analyze and Visualize URLs with Network Graph | π | π | |
statsannotations: Add Statistical Significance Annotations on Seaborn Plots | π | π | π |
Title | Article | Repository |
---|---|---|
How to choose stocks to invest in with Python | π | π |
Maximize your Productivity with Python | π | π |
How to Find a Good Match with Python | π | π |
How to Solve a Staff Scheduling Problem with Python | π | π |
How to Find Best Locations for your Restaurants with Python | π | π |
How to Schedule Flights in Python | π | π |
How to Solve a Production Planning and Inventory Problem in Python | π | π |
Title | Article | Repository |
---|---|---|
Web Scrape Movie Database with Beautiful Soup | π | π |
top-github-scraper: Scrape Top Github Users and Repositories Based On a Keyword in One Line of Code | π | π |
Title | Article | Repository | Video |
---|---|---|---|
6 Common Mistakes to Avoid in Data Science Code | π | π | |
5 Steps to Transform Messy Functions into Production-Ready Code | π | π | π |
Numpy Tricks for your Data Science Projects | π | π | |
Timing for Efficient Python Code | π | π | |
How to Use Lambda for Efficient Python Code | π | π | |
Python Tricks for Keeping Track of Your Data | π | π | |
Boost Your Efficiency With Specialized Dictionary Implementations in Python | π | π | |
Dictionary as an Alternative to If-Else | π | π | |
How to Use Zip to Manipulate a List of Tuples | π | π | |
Get the Most out of Your Array With These Four Numpy Methods | π | π | |
3 Python Tricks to Read, Create, and Run Multiple Files Automatically | π | π | |
How to Exclude the Outliers in Pandas DataFrame | π | π | |
Python Clean Code: 6 Best Practices to Make Your Python Functions More Readable | π | π | π |
3 Techniques to Effortlessly Import and Execute Python Modules | π | π | |
Simplify Your Functions with Functoolsβ Partial and Singledispatch | π | π |
Title | Article | Repository | Video |
---|---|---|---|
How to Create and View Interactive Cheatsheets on the Command-line | π | ||
Understand CSV Files from your Terminal with XSV | π | ||
Prettify your Terminal Text With Termcolor and Pyfiglet | π | π | |
Loguru: Simple as Print, Flexible as Logging | π | π | π |
Stop Using Print to Debug in Python. Use Icecream Instead | π | ||
Rich: Generate Rich and Beautiful Text in the Terminal with Python | π | π | |
Create a Beautiful Dashboard in your Terminal with Wtfutil | π | π | |
3 Tools to Monitor and Optimize your Linux System | π | ||
Ptpython: A Better Python REPL | π | π | |
fd: a Simple but Powerful Tool to Find and Execute Files on the Command Line | π | ||
Speed Up your Command-Line Navigation with These 3 Tools | π | ||
Python and Data Science Snippets on the Command Line | π | π |
Title | Article | Repository |
---|---|---|
Can Datasets of a Dinosaur and a Circle have Identical Statistics? | π | π |
Introduction to One-Way ANOVA: A Test to Compare the Means between More than Two Groups | π | π |
Bayesβ Theorem, Clearly Explained with Visualization | π | π |
Detect Change Points with Bayesian Inference and PyMC3 | π | π |
Bayesian Linear Regression with Bambi | π | π |
Earn More Salary as a Coder β Higher Degree or More Years of Experience? | π | π |
Title | Article | Repository |
---|---|---|
How to Build a Matrix Module from Scratch | π | π |
Linear Algebra for Machine Learning: Solve a System of Linear Equations | π | π |
Title | Article | Repository |
---|---|---|
Convex Hull: An Innovative Approach to Gift-Wrap your Data | π | π |
How to Visualize Social Network With Graph Theory | π | π |
How to Search Data with KDTree | π | π |
How to Find the Nearest Hospital with a Voronoi Diagram | π | π |
Title | Article | Repository |
---|---|---|
How to Create an Interactive Startup Growth Calculator with Python | π | π |
Streamlit and spaCy: Create an App to Predict Sentiment and Word Similarities with Minimal Domain Knowledge | π | π |
PyWebIO: Write Interactive Web App in Script Way Using Python | π | π |
PyWebIO 1.3.0: Add Tabs, Pin Input, and Update an Input Based on Another Input | π | π |
Create an App to Deal with Boredom Using PyWebIO | π | π |
Build a Robust Workflow to Visualize Trending GitHub Repositories in Python | π | π |
Title | Article | Repository |
---|---|---|
Introduction to Datapane: A Python Library to Build Interactive Reports | π | |
Datapaneβs New Features: Create a Beautiful Dashboard in Python in a Few Lines of Code | π | π |
Introduction to Datasette: Explore and Publish Your Data in One Line of Code | π | |
How to Share your Python Objects Across Different Environments in One Line of Code | π | π |
How to Share your Jupyter Notebook in 3 Lines of Code with Ngrok | π | |
Introduction to Deepnote: Real-time Collaboration on Jupyter Notebook | π |
Title | Article | Repository |
---|---|---|
Simulate Real-life Events in Python Using SimPy | π | π |
How to Create Mathematical Animations like 3Blue1Brown Using Python | π | π |
Title | Article | Repository |
---|---|---|
How to Learn Data Science when Life does not Give You a Break | π | |
How to Accelerate your Data Science Career by Putting yourself in the Right Environment | π | |
To become a Better Data Scientist, you need to Think like a Programmer | π | |
How not to be Overwhelmed with Data Science | π |
Title | Article | Repository |
---|---|---|
How to Organize your Data Science Articles with Github | π | π |
5 Reasons why you should Switch from Jupyter Notebook to Scripts | π | |
7 Reasons Why you Should Start Documenting your Code | π |
Title | Article | Repository |
---|---|---|
How to Leverage Visual Studio Code for your Data Science Projects | π | |
Top 4 Code Viewers for Data Scientist in VSCode | π | |
Incorporate the Best Practices for Python with These Top 4 VSCode Extensions | π | |
Boost Your Efficiency with Customized Code Snippets on VSCode | π | |
Top 9 Keyboard Shortcuts in VSCode for Data Scientists | π |
Title | Article | Repository |
---|---|---|
Python Machine Learning: A Comprehensive Handbook for Machine Learning | π |
Title | Article | Repository |
---|---|---|
How to Create an Elegant Website for your Data Science Portfolio in 10 minutes | π | |
Build an Impressive Github Profile in 3 Steps | π |