Deep Learning

The "Black" Art & "Elusive" Science of Training Deep Neural Networks

School of Computing, University of Nebraska-Lincoln
Spring 2021 (3-week session): CSCE 496/896

Synopsis: Deep Learning or representational learning is an exciting venture of modern Artificial Intelligence for solving hard problems in computer vision, natural language processing, and speech recognition, to name a few. A Deep Neural Network (DNN) is a gigantic conglomeration of computational units or artificial neurons, structured hierarchically in many successive layers. DNNs discover hidden patterns from the input data by creating layers of increasingly meaningful representations of the data. To unleash the full potential of DNNs, one must have knowledge of effective DNN architectures, learning algorithms, and optimization strategies. Training DNNs is tricky, no less than summoning a “genie”. Many consider it “black art”. This course will attempt to demystify the training process of DNN models with an emphasis on the modern state-of-the-art DNN architectures. It will explore the “black” art and “elusive” science of training DNNs.

Philosophy: I will advocate my mentor Richard Feynman's philosophy on learning: "What I cannot create, I do not understand". Your understanding of the DNN training issues will be incomplete unless you are able to implement DNNs such as CNN or RNN from scratch (without using any high-level ML libraries such as Scikit-Learn, Keras, or PyTorch). Start by implementing a multi-layer perceptron (MLP) using NumPy. Refer to the pseudocode from my GitHub notebook on MLP. Then, try implementing a CNN using NumPy. Refer to the pseudocode from my GitHub notebook on CNN.

Instructor: Dr. M. R. Hasan
Office Hours: See the course Canvas page

Lecture Time: Monday, Tuesday, Wednesday, Thursday, Friday: 10.00 AM - 11.30 AM via Zoom

Week	Date	Topic & PDF Slides	Video Links
1	Jan 4	Introduction Linear Neural Networks - Linear Regression	Course Introduction (watch from the 2.40 minute mark) Introduction to Deep Learning (watch from the 9.00 minute mark) Linear Neural Network - Linear Regression
1	Jan 5	Linear Neural Networks - Logistic Regression	Review of the previous lecture Linear Neural Network - Regression - Gradient Descent Linear Neural Network - Binary Classification - Introduction
1	Jan 6	Linear Neural Networks - Softmax Regression Jupyter Notebooks: Linear Neural Networks using TensorFlow-Keras	Review of the previous lecture Linear Neural Network - Logistic Regression - Binary Cross Entropy Loss Linear Neural Network - Logistic Regression - TensorFlow Demo Linear Neural Network - Softmax Regression
1	Jan 7	Multi-Layer Perceptron (MLP): Introduction Multi-Layer Perceptron (MLP): Training - Backpropagation Algorithm I	Review: On the Horizon of LNNs Nonlinear Neural Network - Multi-Layer Perceptron MLP Training, the Backpropagation algorithm
1	Jan 8	Multi-Layer Perceptron (MLP): Training - Backpropagation Algorithm II Art & Science of Training Deep MLPs-1 Jupyter Notebooks: Handcrafting MLP using NumPy How to use TensorFlow-Kears for Building Effective MLP classifiers MLP - Curious Dabbling MLP - Investigation of the Spatial Invariance Property	Review: Backpropagation algorithm Backpropagation algorithm, Loss function, Final Layer activation Art & Science of Training Deep MLPs
1	Jan 9	Art & Science of Training Deep MLPs-2	Vanishing & exploding gradient problem-I
2	Jan 11	Hidden Layer Activation Functions Art & Science of Training Deep MLPs-3 Art & Science of Training Deep MLPs-4	Review: Hidden Layer Activations (sigmoid, tanh, ReLU) & Weight Initializers (LeCun, Glorot) Vanishing & exploding gradient problem-II: ReLU, He Initializer, Variants of ReLU Vanishing & exploding gradient problem-III: Gradient clipping & Batch Normalization Achilles' Heel of SGD: Learning rate schedule and optimal learning rate by Hessian
2	Jan 12	Art & Science of Training Deep MLPs-5 CNN-1-Introduction	Review: Achille's Heel of SGD Achilles' Heel of SGD: Topology of Cost Function Fast Optimizers Convolutional Neural Networks (CNNs): Limitations of Dense MLPs
2	Jan 13	CNN-2-Convolution	CNN: Introduction & Architecture CNN: Convolution
2	Jan 14	CNN-3-Padding CNN-4-Pooling	CNN: Convolution (continued) CNN: Padding CNN: Pooling
2	Jan 15	CNN-5-Training-I CNN-6-Training-II CNN-7-Cheat Sheet CNN-8-Notable Architectures-I	CNN: Global Average Pooling; Depthwise Pooling CNN: Cheat sheet, Study of LeNet-5 CNN: ResNet
2	Jan 17		CNN: Notable Architectures-I
3	Jan 18	Martin Luther King Holiday
3	Jan 19	CNN-9-Notable Architectures-II RNN-1-Introduction CNN Visualization: CNN Interative Visualization Read Chollet: ch 5.4 Zeiler & Feugus (2013) Visualizing and Understanding Convolutional Networks How convolutional neural networks see the world Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization Jupyter Notebooks: CNN - Image Classification - Tricks of the Trade Data Augmentation for Deep Learning CNN - A Tour	CNN: Notable Architectures-II CNN: Training & Visualization Recurrent Neural Networks (RNN): Introduction
3	Jan 20	RNN-2-Architecture RNN-3-Training	RNN: Review RNN: Architecture RNN: Training
3	Jan 21	RNN-4-Training Issues RNN-5-Preventing Unstable Gradients RNN-6-Gated RNNs RNN-7-Gated RNNs-Prevent Vanishing Gradient RNN-8-Effective RNN Design Jupyter Notebooks: RNN - A Tour Efficient Methodology for Deep Learning	RNN: Interpretation of Loss Gradient RNN: Training Issues RNN: Preventing Unstable Gradients RNN: Gated RNNs

Deep Learning

The "Black" Art & "Elusive" Science of Training Deep Neural Networks

Schedule

Text Resources

I will refer to the following texts in some lectures for discussing the foundational concepts in Machine Learning

Following books are useful for Python refresher/introduction

Statistics, Linear Algebra & Calculus

Interesting & Enlightening Texts

Deep Learning Courses Elsewhere

Collaboration Tool

Google Colab Tutorials

Python

Open Data Repositories

ML Podcasts

Journals

Conferences Proceedings