Systems for ML

1 minute read

Published:

CSE 599W: Systems for ML by Tianqi Chen, Haichen Shen, and Arvind Krishnamurthy at University of Washington.

Lecture 1: Introduction to Deep Learning

Not about Learning aspect of Deep Learning (except for the first two); System aspect of deep learning: faster training, efficient serving, lower memory consumption.

Lecture 3: Components Overview of Deep Learning System

Typical Deep Learning System Stack

User API: Programming API; Gradient Calculation (Differentiation API)

System Components: Computational Graph Optimization and Execution; Runtime Parallel Scheduling

Architecture: GPU Kernels, Optimizing Device Code; Accelerators and Hardwares

Lecture 4: Backprop and Automatic Differentiation

Numerical differentiation

  • Tool to check the correctness of implementation

Backpropagation

  • Easy to understand and implement
  • Bad for memory use and schedule optimization

Automatic differentiation

  • Generate gradient computation to entire computation graph
  • Better for system optimization

Lecture 5: Hardware backends: GPU

Tips for high performance

  • Use existing libraries, which are highly optimized, e.g. cublas, cudnn.
  • Use nvprof or nvvp (visual profiler) to debug the performance.
  • Use high level language to write GPU kernels.

Lecture 6: Optimize for hardware backends

Optimizations = Too Many Variant of Operators

  • Different tiling patterns
  • Different fuse patterns
  • Different data layout
  • Different hardware backends