Efficient Computing of Deep Neural Network
#DL 5

This post contains my personal notes from the course Efficient Computing of Deep Neural Networks (04835640, Summer 2025, at Peking University). The course is instructed by Professor Yu Bei from The Chinese University of Hong Kong, who teaches a formal identically named course there (CMSC5743, Fall 2024). Copyright for all relevant figures belongs to the Lecturer.

Overview

In comtemporary deep neural network (DNN) applications, achieving real-time online inference presents a high speed demand (~10 fps, ~60 ms) for many models. In this course, we will discuss some effective methods to accelerate inference speed by reducing computational operations and memory burden. Specifically, our discussion is organized into 2 branches, Model level (Mo) and Implementation level (Im):

  1. Model level

    1. Mo1: Pruning

    2. Mo2: Decomposition

    3. Mo3: Quantization

    4. Mo4: BNN

    5. Mo5: KD

    6. Mo6: NAS

  2. Implementation level

    1. Im1: GEMM

    2. Im2: Direct Conv

    3. Im3: Winograd

    4. Im4: Sparse Conv

    5. Im5: CUDA

    6. Im6: TVM

Efficient Computing of Deep Neural Network
http://localhost:8090/archives/pytG9Jtq
作者
酱紫瑞
发布于
更新于
许可协议