Systems for ML

1 minute read

Published: April 13, 2019

CSE 599W: Systems for ML by Tianqi Chen, Haichen Shen, and Arvind Krishnamurthy at University of Washington.

Lecture 1: Introduction to Deep Learning

Not about Learning aspect of Deep Learning (except for the first two); System aspect of deep learning: faster training, efficient serving, lower memory consumption.

Lecture 3: Components Overview of Deep Learning System

Typical Deep Learning System Stack

User API: Programming API; Gradient Calculation (Differentiation API)

System Components: Computational Graph Optimization and Execution; Runtime Parallel Scheduling

Architecture: GPU Kernels, Optimizing Device Code; Accelerators and Hardwares

Lecture 4: Backprop and Automatic Differentiation

Numerical differentiation

Tool to check the correctness of implementation

Backpropagation

Easy to understand and implement
Bad for memory use and schedule optimization

Automatic differentiation

Generate gradient computation to entire computation graph
Better for system optimization

Lecture 5: Hardware backends: GPU

Tips for high performance

Use existing libraries, which are highly optimized, e.g. cublas, cudnn.
Use nvprof or nvvp (visual profiler) to debug the performance.
Use high level language to write GPU kernels.

Lecture 6: Optimize for hardware backends

Optimizations = Too Many Variant of Operators

Different tiling patterns
Different fuse patterns
Different data layout
Different hardware backends

Share on

Twitter Facebook LinkedIn

Ren Wang

Systems for ML

Share on

You May Also Enjoy

Deep Learning Specialization

AI For Everyone

Deep Unsupervised Learning

Applied Machine Learning