Data science Python notebooks—a collection of Jupyter notebooks on machine learning, deep learning, statistical inference, data analysis and visualization.
This repo contains various Python Jupyter notebooks I have created to experiment and learn with the core libraries essential for working with data in Python and work through exercises, assignments, course works, and explore subjects that I find interesting such as machine learning and deep learning. Familiarity with Python as a language is assumed.
The essential core libraries that I will be focusing on for working with data are NumPy, Pandas, Matplotlib, PyTorch, TensorFlow, Keras, Caffe, scikit-learn, spaCy, NLTK, Gensim, and related packages.
- Data Science Notebooks
- Table of Contents
- How to Use this Repo
- About
- Software
- Deep Learning
- Projects
- DL Assignments, Exercises or Course Works
- fast.ai's Deep Learning Part 1: Practical Deep Learning for Coders 2018 (v2): Oct - Dec 2017
- fast.ai's Deep Learning Part 1: Practical Deep Learning for Coders 2019 (v3): Oct - Dec 2018
- fast.ai's Deep Learning Part 2: Cutting Edge Deep Learning for Coders 2017 (v1): Feb - Apr 2017
- fast.ai's Deep Learning Part 2: Cutting Edge Deep Learning for Coders 2018 (v2): Mar - May 2018
- Machine Learning
- Libraries or Frameworks
- Kaggle Competitions
- License
- Run the code using the Jupyter notebooks available in this repository's notebooks directory.
- Launch a live notebook server with these notebooks using binder:
The notebooks were written and tested with Python 3.6, though other Python versions (including Python 3.x) should work in nearly all cases.
See index.ipynb for an index of the notebooks available.
The code in the notebook was tested with Python 3.6, though most (but not all) will also work correctly with Python 3.x.
The packages I used to run the code in the notebook are listed in requirements.txt (Note that some of these exact version numbers may not be available on your platform: you may have to tweak them for your own use). To install the requirements using conda, run the following at the command-line:
$ conda install --file requirements.txt
To create a stand-alone environment named DSN with Python 3.6 and all the required package versions, run the following:
$ conda create -n DSN python=3.5 --file requirements.txt
You can read more about using conda environments in the Managing Environments section of the conda documentation.
Notebook | Description |
---|---|
Deep Painterly Harmonization | Implement Deep Painterly Harmonization paper in PyTorch |
Language modelling in Malay language for downstream NLP tasks | Implement Universal Language Model Fine-tuning for Text Classification (ULMFiT) in PyTorch |
Not Hotdog AI Camera mobile app | Asia virtual study group project for fast.ai deep learning part 1, v3 course. Ship a convolutional neural network on Android/iOS with PyTorch and Android Studio/Xcode |
Notebooks for trying out transformer and large language models.
Notebook | Description |
---|---|
Flan-UL2 20B | Flan 20B with UL2 code walkthrough. This shows how you can get it running on 1x A100 40GB GPU with the HuggingFace library and using 8-bit inference. Using CoT, zeroshot (logical reasoning, story writing, common sense reasoning, speech writing). Testing large (2048) token input. |
Notebook | Description |
---|---|
lesson1, lesson1-vgg, lesson1-rxt50, keras_lesson1 |
Lesson 1 - Recognizing Cats and Dogs |
lesson2-image_models | Lesson 2 - Improving Your Image Classifier |
lesson3-rossman | Lesson 3 - Understanding Convolutions |
lesson4-imdb | Lesson 4 - Structured Time Series and Language Models |
lesson5-movielens | Lesson 5 - Collaborative Filtering; Inside the Training Loop |
lesson6-rnn, lesson6-sgd |
Lesson 6 - Interpreting Embeddings; RNNs from Scratch |
lesson7-cifar10, lesson7-CAM |
Lesson 7 - ResNets from Scratch |
Deep Learning Part 1: 2019 Edition
Deep Learning Part 2: 2017 Edition
Deep Learning Part 2: 2018 Edition
- Lesson 1 - Random Forest
- Lesson 2 - Random Forest Interpretation
- Lesson 3 - Random Forest Foundations
- Lesson 4 - MNIST SGD
- Lesson 5 - Natural Language Processing (NLP)
Notebook | Description |
---|---|
NumPy in 10 minutes | Introduction to NumPy for deep learning in 10 minutes |
WIP
Notebook | Description |
---|---|
Guide to TensorFlow Keras on TPUs MNIST | Guide to TensorFlow + Keras on TPU v2 for free on Google Colab |
WIP
WIP
WIP
Notebook | Description |
---|---|
planet_cv | Planet: Understanding the Amazon from Space—use satellite data to track the human footprint in the Amazon rainforest |
Rossmann | Rossmann Store Sales—forecast sales using store, promotion, and competitor data |
fish | The Nature Conservancy Fisheries Monitoring—Can you detect and classify species of fish? |
This repository contains a variety of content; some developed by Cedric Chee, and some from third-parties. The third-party content is distributed under the license provided by those parties.
I am providing code and resources in this repository to you under an open source license. Because this is my personal repository, the license you receive to my code and resources is from me and not my employer.
The content developed by Cedric Chee is distributed under the following license:
The code in this repository, including all code samples in the notebooks listed above, is released under the MIT license. Read more at the Open Source Initiative.
The text content of the book is released under the CC-BY-NC-ND license. Read more at Creative Commons.