TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

Deep Learning Compiler Study

This is a repository of the study "DL Compiler". The goal of this study is to understand the acceleration of nerual networks with DL Compiler. The topic of acceleration includes On-Device AI,DL Compiler, TVM, ONNX , Compiler. Our study is based on this paper (The Deep Learning Compiler: A Comprehensive Survey, IEEE TPDS 2021). Also we discuss other topics such as HW architecture, SW acceleration. Our materials are open to git and youtube.

Presentation with Video

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

Presenter: Constant Park (sonicstage12@naver.com)
Date: February, 25, 2021
PPT: https://github.com/ConstantPark/DL_Compiler/blob/main/TVM.pdf
Video: https://youtu.be/wzy1QMci_Zs

XLA: Optimizing Compiler for Machine Learning

Presenter: Tee Jung (naey05@gmail.com, https://b.mytears.org/)
Date: March, 11, 2021
PPT: https://github.com/ConstantPark/DL_Compiler/blob/main/XLA101.pdf
Video: https://youtu.be/_3ykXQH5h2o

Efficient Execution of Quantized Deep Learning Models: A Compiler Approach

Presenter: 이제민 (leejaymin@cnu.ac.kr)
Date: March, 25, 2021
PPT: https://www.slideshare.net/leejaymin/efficient-execution-of-quantized-deep-learning-models-a-compiler-approach
Video: https://youtu.be/JV31xwqJUKI

PlaidML: Portable Deep Learning Compiler

Presenter: Seo Sanghyeon (sanxiyn@gmail.com)
Date: April, 22, 2021
PPT: https://github.com/ConstantPark/DL_Compiler/blob/main/PlaidML.pdf
Video: https://youtu.be/GJ_IYfVmPg4

AutoTVM and Auto Scheduler

Presenter: 류재훈 (jaehunryu@postech.ac.kr)
Date: April, 22, 2021
PPT: https://github.com/ConstantPark/DL_Compiler/blob/main/Auto_Opt.pdf
Video: https://youtu.be/rl8pobauUn4

MLIR: A Compiler Infrastructure for the End of Moore’s Law

Presenter: Dong-hee Na (donghee.na92@gmail.com)
Date: May, 06, 2021
PPT: https://github.com/ConstantPark/DL_Compiler/blob/main/Introduction%20to%20MLIR.pdf
Video: https://youtu.be/vZy_aHERPDY

BYOC: Bring Your Own Codegen to Deep Learning Compiler

Presenter: Hyunwoo Cho
Date: May, 20, 2021
PPT: https://github.com/ConstantPark/DL_Compiler/blob/main/BYOC.pdf
Video: https://youtu.be/q3jE7nu0EgQ

Tensor Comprehension

Presenter: Jungju Oh
Date: June, 10, 2021
PPT: https://github.com/ConstantPark/DL_Compiler/blob/main/Tensor%20Comprehensions.pdf
Video: https://youtu.be/8MutpjppKlw

Chameleon: Adaoptive Code Optimization for Expedited Deep Neural Network Compilation

Presenter: Taehee Jeong
Date: June, 14, 2021
PPT: https://github.com/ConstantPark/DL_Compiler/blob/main/%5BDL%20Study%5D%20Chameleon_%20Adaptive%20Code%20Optimization%20for%20Expedited%20Deep%20Neural%20Network%20Compilation.pdf
Video: https://youtu.be/vCJpEwSnEu0

Glow: Graph Lowering Compiler Techniques for Neural Networks

Presenter: Jeongho Kim
Date: July, 1, 2021
PPT: https://github.com/ConstantPark/DL_Compiler/blob/main/Glow_%20Graph%20Lowering%20Compiler%20Techniques%20for%20Neural%20Networks.pdf
Video: https://youtu.be/wmIiPUDgzl4

Glow for NXP MCUs

Presenter: Dongshik Won
Date: July, 15, 2021
PPT: https://github.com/ConstantPark/DL_Compiler/blob/main/Glow%20for%20NXP%20MCUs.pdf
Video: https://youtu.be/6ALFNYbnnQs

TensorDIMM: Practical Near-Memory Processing Archiecture for Embeddings and Tensor Operations in DL

Presenter: Constant Park
Date: August, 05, 2021
PPT: https://github.com/ConstantPark/DL_Compiler/blob/main/TensorDIMM.pdf
Video: -

ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning

Presenter: Constant Park
Date: September, 09, 2021
PPT: https://github.com/ConstantPark/DL_Compiler/blob/main/ConfuciuX.pdf
Video: https://youtu.be/XWkQQQhoBMI

Optimizing DNN Computation with Relaxed Graph Substitutions & TASO: Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions

Presenter: 류재훈 (jaehunryu@postech.ac.kr)
Date: September, 30, 2021
PPT: https://github.com/ConstantPark/DL_Compiler/blob/main/taso.pdf
Video: https://youtu.be/XZdnRYbM1g0

HAWQ-V3: Dyadic Neural Network Quantization

Presenter: 이제민 (leejaymin@cnu.ac.kr)
Date: October, 14, 2021
PPT: https://www.slideshare.net/leejaymin/hawqv3-dyadic-neural-network-quantization
Video: https://www.youtube.com/watch?v=Hxrw4cDM0Tw&list=UU03m_PqzOeNJPmZyyY2dRQw&index=2

I-BERT: Integer-only BERT Quantization

Presenter: Dongshik Won
Date: November, 11, 2021
PPT: https://github.com/ConstantPark/DL_Compiler/blob/main/I-BERT_%20Integer-only%20BERT%20Quantization.pdf
Video: https://youtu.be/--Is5DxG1wU

DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion

Presenter: Taehee Jeong
Date: November, 15, 2021
PPT: https://github.com/ConstantPark/DL_Compiler/blob/main/DNNFusion.pdf
Video: https://youtu.be/P-LZ-RZIH0U

TENET: A Framework for Modeling Tensor Dataflow Based on Relation-centric Notation

Presenter: Hyunwoo Cho
Date: December, 09, 2021
PPT: https://github.com/ConstantPark/DL_Compiler/blob/main/DLC_Study_211209_HyunwooCho.pdf
Video: https://youtu.be/snh4BZ0v6jI

AIMET: AI Model Efficiency Toolkit

Presenter: Tee Jung (naey05@gmail.com, https://b.mytears.org/)
Date: January, 06, 2022
PPT: https://github.com/ConstantPark/DL_Compiler/blob/main/AIMET.pdf
Video: -

Newton: A DRAM-maker's Accelerator-in-Memory (AiM) Architecture for ML

Presenter: Yongwon Shin (ywshin@postech.ac.kr)
Date: February, 17, 2022
PPT: https://github.com/ConstantPark/DL_Compiler/blob/main/newton.pdf
Video: https://youtu.be/2076HWa7abY

Heterogeneous Dataflow Accelerators for Multi-DNN Workloads

Presenter:  Constant Park
Date: August, 08, 2022
PPT: https://github.com/ConstantPark/DL_Compiler/blob/main/HDA.pdf
Video: https://youtu.be/C_GyaR4ukP0

MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer

Presenter:  이제민 (leejaymin@etri.re.kr)
Date: August, 22, 2022
PPT: https://github.com/ConstantPark/DL_Compiler/blob/main/220822_MobileViTv1.pdf
Video: https://youtu.be/dVH02_O2MzQ

AsyMo: Scalable and Efficient Deep-Learning Inference on Asymmetric Mobile CPUs

Presenter:  박준형 (dkdkernel@gmail.com)
Date: September, 19, 2022
PPT: https://github.com/ConstantPark/DL_Compiler/blob/main/AsyMo-%E1%84%80%E1%85%A9%E1%86%BC%E1%84%80%E1%85%A2%E1%84%8B%E1%85%AD%E1%86%BC.pptx.pdf
Video: https://youtu.be/MKYkq92Hdbk

Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization

Presenter:  류재훈 (jaehunryu@postech.ac.kr)
Date: September, 19, 2022
PPT: https://github.com/ConstantPark/DL_Compiler/blob/main/Unity.pdf
Video: https://youtu.be/YMlXaP6uHnU

Google Neural Network Models for Edge Devices: Analyzing and Mitigating Machine Learning Inference Bottlenecks

Presenter:  신용원 (ywshin@postech.ac.kr)
Date: September, 19, 2022
PPT: https://github.com/ConstantPark/DL_Compiler/blob/main/MENSA.pdf
Video: https://youtu.be/bnpdoZQB6Qs

Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design

Presenter:  Hyunwoo Cho (hyunwoocho@sogang.ac.kr)
Date: November, 11, 2022
PPT: https://github.com/ConstantPark/DL_Compiler/blob/main/DLC_Study_111422_hwcho.pdf
Video: https://youtu.be/7-aKgDxxOXU

A Learned Performance Model for Tensor Processing Units

Presenter:  박주언 (jueonpark@postech.ac.kr)
Date: November, 19, 2022
PPT: -
Video: https://youtu.be/g-MJlRgRfto

The Deep Learning Compiler: A Comprehensive Survey

Presenter:  이태영 (managingc@gmail.com)
Date: December, 12, 2022
PPT: https://github.com/ConstantPark/DL_Compiler/blob/main/The%20Deep%20Learning%20Compiler.pdf
Video: https://youtu.be/O2TjOvYl8Ys

Distilling Bit-level Sparsity Parallelism for General Purpose Deep Learning Acceleration

Presenter:  노대철 (sheocjf1025@gmail.com)
Date: November, 26, 2022
PPT: https://github.com/ConstantPark/DL_Compiler/blob/main/221226%20-%20Distilling%20Bit-level%20Sparsity%20Parallelism%20for%20General%20Purpose%20Deep%20Learning%20Acceleration.pdf
Video: https://youtu.be/7-aKgDxxOXU

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

Presenter:  윤유경 (yugyoung@postech.ac.kr)
Date: Januaray, 09, 2023
PPT: -
Video: https://youtu.be/LGYYRRKxCjE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Learning Compiler Study

Presentation with Video

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

XLA: Optimizing Compiler for Machine Learning

Efficient Execution of Quantized Deep Learning Models: A Compiler Approach

PlaidML: Portable Deep Learning Compiler

AutoTVM and Auto Scheduler

MLIR: A Compiler Infrastructure for the End of Moore’s Law

BYOC: Bring Your Own Codegen to Deep Learning Compiler

Tensor Comprehension

Chameleon: Adaoptive Code Optimization for Expedited Deep Neural Network Compilation

Glow: Graph Lowering Compiler Techniques for Neural Networks

Glow for NXP MCUs

TensorDIMM: Practical Near-Memory Processing Archiecture for Embeddings and Tensor Operations in DL

ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning

Optimizing DNN Computation with Relaxed Graph Substitutions & TASO: Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions

HAWQ-V3: Dyadic Neural Network Quantization

I-BERT: Integer-only BERT Quantization

DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion

TENET: A Framework for Modeling Tensor Dataflow Based on Relation-centric Notation

AIMET: AI Model Efficiency Toolkit

Newton: A DRAM-maker's Accelerator-in-Memory (AiM) Architecture for ML

Heterogeneous Dataflow Accelerators for Multi-DNN Workloads

MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer

AsyMo: Scalable and Efficient Deep-Learning Inference on Asymmetric Mobile CPUs

Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization

Google Neural Network Models for Edge Devices: Analyzing and Mitigating Machine Learning Inference Bottlenecks

Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design

A Learned Performance Model for Tensor Processing Units

The Deep Learning Compiler: A Comprehensive Survey

Distilling Bit-level Sparsity Parallelism for General Purpose Deep Learning Acceleration

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

About

Releases

Packages

Contributors 5

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
220822_MobileViTv1.pdf		220822_MobileViTv1.pdf
221226 - Distilling Bit-level Sparsity Parallelism for General Purpose Deep Learning Acceleration.pdf		221226 - Distilling Bit-level Sparsity Parallelism for General Purpose Deep Learning Acceleration.pdf
ACCEL_PPT.pdf		ACCEL_PPT.pdf
AIMET.pdf		AIMET.pdf
AsyMo-공개용.pptx.pdf		AsyMo-공개용.pptx.pdf
Auto_Opt.pdf		Auto_Opt.pdf
BYOC.pdf		BYOC.pdf
Chameleon.pdf		Chameleon.pdf
ConfuciuX.pdf		ConfuciuX.pdf
DLC_221128_JueonPark.pdf		DLC_221128_JueonPark.pdf
DLC_Study_111422_hwcho.pdf		DLC_Study_111422_hwcho.pdf
DLC_Study_211209_HyunwooCho.pdf		DLC_Study_211209_HyunwooCho.pdf
DNNFusion.pdf		DNNFusion.pdf
Glow for NXP MCUs.pdf		Glow for NXP MCUs.pdf
Glow_ Graph Lowering Compiler Techniques for Neural Networks.pdf		Glow_ Graph Lowering Compiler Techniques for Neural Networks.pdf
HDA.pdf		HDA.pdf
I-BERT_ Integer-only BERT Quantization.pdf		I-BERT_ Integer-only BERT Quantization.pdf
Introducing AIMET.pdf		Introducing AIMET.pdf
Introduction to MLIR.pdf		Introduction to MLIR.pdf
MENSA.pdf		MENSA.pdf
PlaidML.pdf		PlaidML.pdf
README.md		README.md
TVM.pdf		TVM.pdf
Tensor Comprehensions.pdf		Tensor Comprehensions.pdf
TensorDIMM.pdf		TensorDIMM.pdf
The Deep Learning Compiler.pdf		The Deep Learning Compiler.pdf
Unity.pdf		Unity.pdf
XLA101.pdf		XLA101.pdf
[DL Study] Chameleon_ Adaptive Code Optimization for Expedited Deep Neural Network Compilation.pdf		[DL Study] Chameleon_ Adaptive Code Optimization for Expedited Deep Neural Network Compilation.pdf
newton.pdf		newton.pdf
taso.pdf		taso.pdf

ConstantPark/DL_Compiler

Folders and files

Latest commit

History

Repository files navigation

Deep Learning Compiler Study

Presentation with Video

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

XLA: Optimizing Compiler for Machine Learning

Efficient Execution of Quantized Deep Learning Models: A Compiler Approach

PlaidML: Portable Deep Learning Compiler

AutoTVM and Auto Scheduler

MLIR: A Compiler Infrastructure for the End of Moore’s Law

BYOC: Bring Your Own Codegen to Deep Learning Compiler

Tensor Comprehension

Chameleon: Adaoptive Code Optimization for Expedited Deep Neural Network Compilation

Glow: Graph Lowering Compiler Techniques for Neural Networks

Glow for NXP MCUs

TensorDIMM: Practical Near-Memory Processing Archiecture for Embeddings and Tensor Operations in DL

ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning

Optimizing DNN Computation with Relaxed Graph Substitutions & TASO: Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions

HAWQ-V3: Dyadic Neural Network Quantization

I-BERT: Integer-only BERT Quantization

DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion

TENET: A Framework for Modeling Tensor Dataflow Based on Relation-centric Notation

AIMET: AI Model Efficiency Toolkit

Newton: A DRAM-maker's Accelerator-in-Memory (AiM) Architecture for ML

Heterogeneous Dataflow Accelerators for Multi-DNN Workloads

MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer

AsyMo: Scalable and Efficient Deep-Learning Inference on Asymmetric Mobile CPUs

Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization

Google Neural Network Models for Edge Devices: Analyzing and Mitigating Machine Learning Inference Bottlenecks

Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design

A Learned Performance Model for Tensor Processing Units

The Deep Learning Compiler: A Comprehensive Survey

Distilling Bit-level Sparsity Parallelism for General Purpose Deep Learning Acceleration

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Packages