Skip to content

Data science Python notebooks—a collection of Jupyter notebooks on machine learning, deep learning, statistical inference, data analysis and visualization.

License

Notifications You must be signed in to change notification settings

cedrickchee/data-science-notebooks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Science Notebooks

Data science Python notebooks—a collection of Jupyter notebooks on machine learning, deep learning, statistical inference, data analysis and visualization.

This repo contains various Python Jupyter notebooks I have created to experiment and learn with the core libraries essential for working with data in Python and work through exercises, assignments, course works, and explore subjects that I find interesting such as machine learning and deep learning. Familiarity with Python as a language is assumed.

The essential core libraries that I will be focusing on for working with data are NumPy, Pandas, Matplotlib, PyTorch, TensorFlow, Keras, Caffe, scikit-learn, spaCy, NLTK, Gensim, and related packages.

Table of Contents

How to Use this Repo

  • Run the code using the Jupyter notebooks available in this repository's notebooks directory.
  • Launch a live notebook server with these notebooks using binder: Binder

About

The notebooks were written and tested with Python 3.6, though other Python versions (including Python 3.x) should work in nearly all cases.

See index.ipynb for an index of the notebooks available.

Software

The code in the notebook was tested with Python 3.6, though most (but not all) will also work correctly with Python 3.x.

The packages I used to run the code in the notebook are listed in requirements.txt (Note that some of these exact version numbers may not be available on your platform: you may have to tweak them for your own use). To install the requirements using conda, run the following at the command-line:

$ conda install --file requirements.txt

To create a stand-alone environment named DSN with Python 3.6 and all the required package versions, run the following:

$ conda create -n DSN python=3.5 --file requirements.txt

You can read more about using conda environments in the Managing Environments section of the conda documentation.

Deep Learning

Projects

Notebook Description
Deep Painterly Harmonization Implement Deep Painterly Harmonization paper in PyTorch
Language modelling in Malay language for downstream NLP tasks Implement Universal Language Model Fine-tuning for Text Classification (ULMFiT) in PyTorch
Not Hotdog AI Camera mobile app Asia virtual study group project for fast.ai deep learning part 1, v3 course. Ship a convolutional neural network on Android/iOS with PyTorch and Android Studio/Xcode

Language Models

Notebooks for trying out transformer and large language models.

Notebook Description
Flan-UL2 20B Flan 20B with UL2 code walkthrough. This shows how you can get it running on 1x A100 40GB GPU with the HuggingFace library and using 8-bit inference. Using CoT, zeroshot (logical reasoning, story writing, common sense reasoning, speech writing). Testing large (2048) token input.

DL Assignments, Exercises or Course Works

fast.ai's Deep Learning Part 1: Practical Deep Learning for Coders 2018 (v2): Oct - Dec 2017

Notebook Description
lesson1,
lesson1-vgg,
lesson1-rxt50,
keras_lesson1
Lesson 1 - Recognizing Cats and Dogs
lesson2-image_models Lesson 2 - Improving Your Image Classifier
lesson3-rossman Lesson 3 - Understanding Convolutions
lesson4-imdb Lesson 4 - Structured Time Series and Language Models
lesson5-movielens Lesson 5 - Collaborative Filtering; Inside the Training Loop
lesson6-rnn,
lesson6-sgd
Lesson 6 - Interpreting Embeddings; RNNs from Scratch
lesson7-cifar10,
lesson7-CAM
Lesson 7 - ResNets from Scratch

fast.ai's Deep Learning Part 1: Practical Deep Learning for Coders 2019 (v3): Oct - Dec 2018

Deep Learning Part 1: 2019 Edition

Notebook Description
00_notebook_tutorial.ipynb,
lesson1-pets.ipynb
Lesson 1 - Image Recognition
lesson2-download.ipynb
lesson2-sgd.ipynb
Lesson 2 - Computer Vision: Deeper Applications
lesson3-planet.ipynb
lesson3-camvid.ipynb
lesson3-head-pose.ipynb
lesson3-imdb.ipynb
Lesson 3 - Multi-label, Segmentation, Image Regression, and More
lesson4-tabular.ipynb
lesson4-collab.ipynb
Lesson 4 - NLP, Tabular, and Collaborative Filtering
lesson5-sgd-mnist.ipynb Lesson 5 - Foundations of Neural Networks
lesson6-rossmann.ipynb
rossman_data_clean.ipynb
lesson6-pets-more.ipynb
Lesson 6 - Foundations of Convolutional Neural Networks
lesson7-resnet-mnist.ipynb
lesson7-superres-gan.ipynb
lesson7-superres-imagenet.ipynb
lesson7-superres.ipynb
lesson7-wgan.ipynb
lesson7-human-numbers.ipynb
Lesson 7 - ResNets, U-Nets, GANs and RNNs

fast.ai's Deep Learning Part 2: Cutting Edge Deep Learning for Coders 2017 (v1): Feb - Apr 2017

Deep Learning Part 2: 2017 Edition

Notebook Description
neural-style Lesson 8 - Artistic Style
imagenet-processing Lesson 9 - Generative Models
neural-sr,
keras-dcgan,
pytorch-tutorial,
wgan-pytorch
Lesson 10 - Multi-modal & GANs
kmeans-clustering,
babi-memory-neural-net
Lesson 11 - Memory Networks
spelling_bee_RNN Lesson 12 - Attentional Models
translate-pytorch,
densenet-keras
Lesson 13 - Neural Translation
rossmann,
tiramisu-keras
Lesson 14 - Time Series & Segmentation

fast.ai's Deep Learning Part 2: Cutting Edge Deep Learning for Coders 2018 (v2): Mar - May 2018

Deep Learning Part 2: 2018 Edition

Notebook Description
Pascal VOC—Object Detection Lesson 8 - Object Detection
Pascal VOC—Multi Object Detection Lesson 9 - Single Shot Multibox Detector (SSD)
IMDB—Language Model Lesson 10 - Transfer Learning for NLP and NLP Classification
WMT15 Giga French-English—Neural Machine Translation,
DeViSE (Deep Visual-Semantic Embedding Model)
Lesson 11 - Neural Translation; Multi-modal Learning
CIFAR-10 DarkNet,
CIFAR-10 DAWNBench,
Wasserstein GAN,
CycleGAN
Lesson 12 - DarkNet; Generative Adversarial Networks (GANs)
TrainingPhase API,
Neural Algorithm of Artistic Style Transfer
Lesson 13 - Image Enhancement; Style Transfer; Data Ethics
Super Resolution,
Real-time Style Transfer Neural Net,
Kaggle Carvana Image Masking,
Kaggle Carvana Image Masking using U-Net,
Kaggle Carvana Image Masking using U-Net Large
Lesson 14 - Super Resolution; Image Segmentation with U-Net

Machine Learning

ML Assignments, Exercises or Course Works

Andrew Ng's "Machine Learning" class on Coursera

fast.ai's machine learning course

Libraries or Frameworks

Notebook Description
NumPy in 10 minutes Introduction to NumPy for deep learning in 10 minutes

WIP

Notebook Description
Guide to TensorFlow Keras on TPUs MNIST Guide to TensorFlow + Keras on TPU v2 for free on Google Colab

WIP

WIP

WIP

Kaggle Competitions

Notebook Description
planet_cv Planet: Understanding the Amazon from Space—use satellite data to track the human footprint in the Amazon rainforest
Rossmann Rossmann Store Sales—forecast sales using store, promotion, and competitor data
fish The Nature Conservancy Fisheries Monitoring—Can you detect and classify species of fish?

License

This repository contains a variety of content; some developed by Cedric Chee, and some from third-parties. The third-party content is distributed under the license provided by those parties.

I am providing code and resources in this repository to you under an open source license. Because this is my personal repository, the license you receive to my code and resources is from me and not my employer.

The content developed by Cedric Chee is distributed under the following license:

Code

The code in this repository, including all code samples in the notebooks listed above, is released under the MIT license. Read more at the Open Source Initiative.

Text

The text content of the book is released under the CC-BY-NC-ND license. Read more at Creative Commons.

About

Data science Python notebooks—a collection of Jupyter notebooks on machine learning, deep learning, statistical inference, data analysis and visualization.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published