Efficient Computing of Deep Neural Network

2025-07-03 #DL 16

This post contains my personal notes from the course Efficient Computing of Deep Neural Networks (04835640, Summer 2025, at Peking University). The course is instructed by Professor Yu Bei from The Chinese University of Hong Kong, who teaches a formal identically named course there (CMSC5743, Fall 2024). Copyright for all relevant figures belongs to the Lecturer.

Overview

In comtemporary deep neural network (DNN) applications, achieving real-time online inference presents a high speed demand (~10 fps, ~60 ms) for many models. In this course, we will discuss some effective methods to accelerate inference speed by reducing computational operations and memory burden. Specifically, our discussion is organized into 2 branches, Model level (Mo) and Implementation level (Im):

Model level
1. Mo1: Pruning
2. Mo2: Decomposition
3. Mo3: Quantization
4. Mo4: BNN
5. Mo5: KD
6. Mo6: NAS
Implementation level
1. Im1: GEMM
2. Im2: Direct Conv
3. Im3: Winograd
4. Im4: Sparse Conv
5. Im5: CUDA
6. Im6: TVM

Efficient Computing of Deep Neural Network

http://localhost:8090/archives/pytG9Jtq

作者

酱紫瑞

发布于

2025-07-03

更新于

2025-07-03

许可协议

#DL