Machine Learning

Introduction to Machine Learning

School of Computing, University of Nebraska-Lincoln
Fall 2021: CSCE 478/878

Synopsis: The Introduction to Machine Learning (ML) course provides a rigorous mathematical treatment of various ML models that include supervised as well as unsupervised learning approaches. It utilizes the probabilistic perspective, in particular the Bayesian view of Statistics, for presenting the models. The course requires implementing the ML algorithms from scratch (using vanilla python and its scientific non-ML libraries). Students must have strong programming skills in Python as well as a background in probability & statistics, linear algebra, calculus, and algorithm complexity analysis. The assignments are programming-heavy and time-consuming.

Instructor: Dr. M. R. Hasan
Office Hours: See the course Canvas page

Lecture Time: Tuesday and Thursday: 11.00 AM - 12.15 PM in Avery Hall 119

Assignments
Recitations
Syllabus
Class discussion
Teaching Assistant: See the course Canvas page; See the course Canvas page; See the course Canvas page; See the Piazza link on the course Canvas page; See the course Canvas page

GitHub repositories of my tutorials on Machine Learning and Deep Learning

Schedule

Topic, PDF Slides, & Misc. Resources	Video Links
Note: I will assume that you have a background in Probability Theory (discrete & continuous) and Linear Algebra. Thus, I will not go through all the slides from Probabilistic Reasoning and Linear Algebra. These are provided to refresh your memory. Only the Information Theory will be presented in the lecture. [ML Background] Probabilistic Reasoning Probabilistic Reasoning-1 Uncertainty & Probability Probabilistic Reasoning (Frequentist & Bayesian) Sample space and Random variable Probabilistic Reasoning-2 Discrete Probability Theory Sum & Product Rule Chain rule of Probability Bayes’ Rule Joint and Conditional Distribution Reducing The Complexity of Joint Distribution Unconditional and Conditional Independence Probabilistic Reasoning-3 Continuous Probability Theory Probability Density Function Expectation Variance Covariance & Correlation Readings: Bishop: 1.21, 1.22, 1.23, 1.6 Murphy: 2.2, 2.8
[ML Background] Gaussian Distribution Gaussian Distribution - Univariate Case Gaussian Distribution - Multivariate Case
[ML Background] Linear Algebra for Machine Learning Linear Algebra for Machine Learning-1 What is Linear Algebra? How is Linear Algebra useful in Machine Learning? Linear Algebra for Machine Learning-2 Mathematical Objects (Scalars, Vectors, Matrices, Tensors) Measuring the Size of Vectors and Matrices (various norms) Some Special Matrices (Symmetric, Identity, Diagonal, Orthogonal) Inverse of a Matrix Orthogonal Matrix Matrix & Vector Multiplication (dot, inner & Hadamard Product) Orthogonal Transformation Linear Algebra for Machine Learning-3 Motivation for solving a system of linear equation (linear systems) Method of Gauss elimination & back substitution Square Matrix: Gauss-Jordan Elimination Method Conditions for a unique solution of a linear system Determinant Singular Matrix Span of columns of a matrix Linear Independence of columns of a matrix Basis of the columns of a matrix Rank of a matrix Change of bases Computation of Rank: Row-echelon form Linear Algebra for Machine Learning-4 Intuition of the Eigenvalue equation Matrix eigenvalue problem Computing eigenvalues & eigenvectors Characteristic equation of a matrix Eigenbasis Matrix diagonalization Eigendecomposition Linear Algebra for Machine Learning-5 Quadratic form of a vector Positive Definite & positive semi-definite matrix Summary of the discussion on Linear Algebra for Machine Learning Readings: Array Programming with NumPy Ch 7 & 8: Advanced Engineering Mathematics (10th edition) by Erwin Kreyszig
Information Theory Information Theory-1 Information Theory (Message vs. Information) Entropy & Cross-Entropy Relative Entropy or Kullback-Leibler (KL) Divergence Information Theory-2 KL Divergence for Maximum Likelihood Estimation Information Theory-3 Independence of Random Variables & Information Gain KL Divergence & Information Gain Readings: Bishop: 1.21, 1.22, 1.23, 1.6 Murphy: 2.2, 2.8
Course Introduction Administrivia Introduction to ML Various ML Systems [Optional] Learning Problem & Problem of Learning-Slides Jupyter notebook demo: Machine Learning Models - Motivation for Scientific Understanding Readings: Russell & Norvig: 1 Geron: 1 Fuel your imagination How to Teach Computers to Learn on Their Own by Yaser S. Abu-Mostafa, Scientific American, July 2012 Artificial Intelligence is stupid and causal reasoning won't fix it (2020) - John Mark Bishop Why AI is Harder Than We Think (2021) - Melanie Mitchell Machine Learning: The Great Stagnation - The bureaucrats are running the asylum - Mark Saroufim Troubling Trends in Machine Learning Scholarship - Zachary C. Lipton, Jacob Steinhardt	[August 24] Course Introduction & Introduction to ML
Analogy-based Learning: K-Nearest Neighbors Nearest Neighbor-1 Nearest Neighbor-2 Nearest Neighbor-3 Nearest Neighbor-4 Nearest Neighbor-5 Classification-Various Performance Metrics Jupyter notebooks: K Nearest Neighbors - Learning Without Learning Study of Analogy based Learning - Image Classification Neighbourhood Components Analysis (NCA) Method Readings: Bishop: 2.5 Murphy: 1.4.1, 1.4.2, 1.4.3 Alpaydin: 8.1, 8.2, 8.3. 8.4 [Classification performance metrics] Geron: 3 Fuel your imagination The Computer Scientist Training AI to Think With Analogies	[August 24] Introduction (continue); Instance-Based & Non-parametric Model: Nearest Neighbor; K-NN algorithm [August 31] High-level overview of the discussion on KNN; KNN: What distance metric should be used? [September 2] KNN: Model selection & Cross-validation; KNN Practical Issue: Weighted distance metric [September 7] Classification performance metrics; KNN Practical Issue: Difference in feature variance & data scaling; [September 9] KNN (non-zero covariance, high-dimensional data, image recognition) [September 14] KNN (regression, non-parametric model); Probabilistic Reasoning; Frequentist vs Bayesian Probability, Sample Space, Random Variable, Sum & Product rules, Bayes' Rule
Frequentist & Bayesian Learning Frequentist Learning: Binary Classification Bayesian Learning: Binary Classification Bayesian Learning: Multi-class Classification Frequentist & Bayesian Learning: Regression Jupyter notebooks: Frequentist and Bayesian Learning for Binomial Distribution Maximum Likelihood Estimation Readings: Bishop: 2.1 Murphy: 2.3.1 Fuel your imagination Pathologies of Frequentist Statistics & Why isn't everyone a Bayesian (Murphy: 6.6) Machine Learning Street Talk Podcast: Tour De Bayesian with Connor Tann How to Grow a Mind: Statistics, Structure, and Abstraction	[September 16] Information Theory: Entropy, Cross-Entropy, Kullback-Leibler (KL) Divergence; Model-based learning (Motivation for learning probability distributions) [September 21] Frequentist Learning for binary classification: Bernoulli & Binomial Distributions, MLE; Limitation of MLE; Bayesian Learning for binary classification: Beta distribution, Conjugate Prior, MAP, Mean, Variance
Statistics & Probability-based Learning: Linear Regression Linear Regression-I-Introduction Linear Regression-II-Normal Equation Linear Regression-III-Frequentist Learning-MLE Linear Regression-IV-Frequentist Learning-MLE-Robust Linear Regression Linear Regression-V-Polynomial Regression Linear Regression-VI-Bayesian Learning-MAP-Regularization Linear Regression-VII-Gradient Descent Jupyter notebooks: Linear Regression - Closed-Form Solution Linear Regression - Effect of Outliers - OLS Solution & Beyond Linear Regression - Extensive Adventure Linear Regression - Gradient Descent Readings: Bishop: 1.1, 3.1.1, 3.1.2, 3.1.4 Murphy: 1.4.5, 1.4.7, 1.4.8, 2.4.3, 7.3, 7.3.1, 7.3.2, 7.5.1 Geron: 4 Wasserman: chapter 10	[September 23] Supervised Learning, Regression, Linear Regression; Closed-form solution; Normal Equation & Ordinary Least Square (OLS) Method; Frequentist Approach to Linear Regression (MLE) [September 28] Polynomial Linear Regression; Bias-Variance tradeoff [September 30] Bayesian Approach to Linear Regression (MAP) & Regularization [October 5] Linear Regression: Gradient Descent algorithm, batch GD [October 7] Linear Regression: Stochastic Gradient Descent; Ridge vs Lasso regression
Statistics & Probability-based Learning: Logistic Regression Logistic Regression-I-Binary Classification Logistic Regression-II-Binary Classification - Gradient Descent - Frequentist Approach - MLE Logistic Regression-III-Binary Classification - Gradient Descent - Bayesian Approach - MAP Logistic Regression-IV-Multi-class Classification - Softmax Regression Logistic Regression-V-Newton's Method Jupyter notebooks: Logistic Regression - A Comparative Understanding Readings: Bishop: 4.3.2 Murphy: 8.1, 8.2, 8.3.1, 8.3.2	[October 12] Choice of Loss Function; Robust Linear Regression; Discriminative Model: Logistic Regression Binary Classification: MLE [October 21] Logistic Regression Binary Classification; Construct the binary cross-entropy loss function (intuitive or ad-hoc approach); Batch Gradient Descent algorithm [October 26] Logistic Regression Binary Classification: Frequentist & Bayesian approach [October 28] Logistic Regression: Multi-Class Classification, Softmax Regression [November 2] Softmax Regression: derivation of the loss gradient, batch GD, SGD; Introduction to Artificial Neural Networks
Neural Connection-based Learning: Nonlinear Model 1 - Multi-Layer Perceptron Multi-Layer Perceptron-1-Introduction Multi-Layer Perceptron-2-Backpropagation-I Multi-Layer Perceptron-2-Backpropagation-II Multi-Layer Perceptron-4-Training Issues-I Multi-Layer Perceptron-5-Training Issues-II Multi-Layer Perceptron-6-Training Issues-III Jupyter notebooks: Linear Neural Networks using TensorFlow-Keras Build an MLP using Keras How to use TensorFlow-Kears for Building Effective MLP classifiers Handcrafting a MLP using NumPy MLP - Curious Dabbling MLP - Investigation of the Spatial Invariance Property Readings: Alpaydin: 11 Bishop: 5.1, 5.3, 5.5 Murphy: 5.1, 5.3, 5.5 Geron: 10 Fuel your imagination Springtime for AI: The Rise of Deep Learning by Yoshua Bengio, Scientific American, June 2016 The Robot Overlord Manual - Mark Saroufim	[November 4] Linear Neural Network (LNN), Nonlinear Neural Network; Multi-layer Perceptron (MLP); Feedforward Dense MLP [November 9] MLP: training using the Backpropagation algorithm [November 11] MLP: Mathematics of the Backpropagation Algorithm [November 16] MLP: Backpropagation algorithm (continued); Activation Functions (hidden layers & final layer) [November 18] MLP: Training Issues [November 23] MLP: Training Issues (optimizing SGD); Discriminative Model: Support Vector Machine
Analogy-based Learning: Support Vector Machine Support Vector Machine-I-Hard Margin SVM-Primal Problem Support Vector Machine-II-Hard Margin SVM-Dual Problem Support Vector Machine-III-Kernel Trick Support Vector Machine-IV-Soft Margin SVM-Primal & Dual Problem Support Vector Machine-V-Gradient Descent & Multi-class Classification Support Vector Machine-VI-Regression & Outlier Detection Constrained Optimization Problem Jupyter notebooks: Support Vector Machine-Beginner's Survival Kit Readings: Alpaydin: 10.3, 13.1, 13.2 Murphy: 14.5.2.2 Geron: chapter 5, appendix C Boyd: chapter 4 & 5 Fuel your imagination Every Model Learned by Gradient Descent Is Approximately a Kernel Machine - Pedro Domingos Kernels - Machine Learning Street Talk Podcast	[November 30] SVM: Primal Problem, Dual Problem [December 2] SVM: Dual Problem, Kernel Trick [December 7] SVM: Kernel Trick, Soft-margin Classifier [December 9] SVM: GD algorithm, hinge loss function; Decision Tree & Random Forest
Symbolic Rule-based Learning: Nonlinear Model 2 - Decision Tree Decision Tree-Random Forest-Summary Decision Tree-I-Introduction Decision Tree-II-Training Decision Tree-III-Pruning Jupyter notebooks: Decision - An Explainable Model Readings: Bishop: 14.4 16, 16.1, 16.2.1, 16.2.2, 16.2.3, 16.2.4 Geron: chapter 6 Fuel your imagination How interpretable is the Decision Tree model? Watch the Machine Learning Street Talk podcast on "Interpretable Machine Learning"
Ensemble-based Learning: Ensemble Method for Performoncement Enhancement: Bagging and Random Forest Ensemble Methods-Bagging Ensemble Methods-Random Forest Ensemble Methods-Boosting Jupyter notebooks: Random Forest - Wisdom of a Diverse Crowd Readings: Bishop: 14.2 Geron: chapter 7
Probability-based Learning: Naive Bayes Naive Bayes-I-Introduction Naive Bayes-II-Frequentist-Multivariate Bernoulli Naive Bayes-III-Frequentist-Multinomial Naive Bayes-IV-Bayesian-Multivariate Bernoulli Naive Bayes-V-Bayesian-Multinomial Naive Bayes-VI-Gaussian NB Naive Bayes-VII-NLP, Practical Issues, & Applications Jupyter notebooks: Naive Bayes Algorithms-A Foray Into Text Classification Naive Bayes Classifier: Benefit and Bane Text Analytics: Beginner's Toolbox Readings: Bishop: 2.1, 2.2 Murphy: 1.2, 1.3, 2.3.1, 2.3.2, 3.5, 3.5.1, 3.5.2, 3.5.3, 3.5.4, 3.5.5 Chapter 21 (Natural Language Processing) from Data Science from Scratch by Joel Grus (O’Reilly)
Probability-based Learning: Hidden Markov Model HMM-1-Formalization HMM-2-Evaluation HMM-3-Smoothing HMM-4-Most Likely Explanation HMM-5-Parameter Learning Jupyter notebooks: HMM Application: Season Prediction Readings: Alpaydin: 15 Russell & Norvig: 15 Rabiner-A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition
Unsupervised Learning: Dimensionality Reduction Dimensionality Reduction-Linear Method-PCA-I Dimensionality Reduction-Linear Method-PCA-II Dimensionality Reduction-Linear Method-LDA Dimensionality Reduction-Nonlinear Method-t-SNE Jupyter notebooks: Dimensionality Reduction- Get More From Less and See the Unseen Readings: Alpaydin: 6.1, 6.3, 6.6, 6.8 Geron: chapter 8	[December 7] Dimensionality Reduction - Linear Method - PCA
Unsupervised Learning: Clustering & Anomaly Detection Clustering-K-Means - 1 Clustering-K-Means - 2 Clustering-GMM - Summary Clustering-GMM using the EM Algorithm - 1 Clustering-GMM using the EM Algorithm - 2 Clustering-GMM using the EM Algorithm - 3 Clustering-GMM using the EM Algorithm - 4 Clustering & Anomaly Detection-Density-Based Spatial Clustering of Applications with Noise (DBSCAN) Anomaly Detection-Local Outlier Factor Anomaly Detection-Isolation Forest Anomaly Detection-Minimum Covariance Determinant (MCD) Jupyter notebooks: Clustering by K-Means: All You Care About Gaussian Mixture Model-Powerful Tool for Clustering, Anomaly Detection & Data Generation Density Based Clustering Anomaly Detection (DBSCAN) Local Outlier Factor-Effective Technique for Ranking Outliers Isolation Forest: Anomaly Detection Anomaly Detection Comparison: LOF, IsolationForest, FactMCD, GMM Readings: Bishop: 9.1, 9.1.1, 9.2, 9.3.2 Murphy: 11.1, 11.2, 11.3, 11.4.1, 11.4.2, 11.4.2.5, 11.4.2.6, 11.4.2.7 Geron: chapter 9	Note: Following lectures were given in the Data Modeling course (CSCE 411/811), thus you will find references to some course specific artifacts. Introduction to Clustering (watch from the 53 minute mark) Clustering - K-Means Clustering - K-Means & GMM Clustering - GMM: Summary Anomaly Detection - GMM & DBSCAN Clustering Anomaly Detection - DBSCAN & Local Outlier Factor Anomaly Detection - Local Outlier Factor & Isolation Forest Anomaly Detection - Isolation Forest (watch up to the 28 minute mark)

Text Resources

Lecture Slides & Jupyter notebooks (thorough and extensive) should provide a detailed account of the topics.

Machine Learning: A Probabilistic Perspective by Kevin P. Murphy
Pattern Recognition and Machine Learning by Christopher M. Bishop
Introduction to Machine learning (3rd ed.) by Ethem Alpaydin

For discussing practical implementation issues and hands-on insights, the following text will be used:

Hands-On Machine Learning with Scikit-Learn, Keras and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems (2nd Edition, 2019) by Aurélien Géron (O'Reilley)

Following books are useful as introductory texts

Machine Learning by Tom Mitchell
Data Science from Scratch by Joel Grus (O’Reilly)
Python for Data Analysis (2nd Edition) by Wes McKinney (O'Reilley)
Python Machine Learning by Sebastian Raschka (Packt Publishing)
The Hundred-Page Machine Learning Book by Andriy Burkov
Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig

Optional Texts

The Elements of Statistical Learning: Data Mining, Inference, and Prediction by Trevor Hastie, Robert Tibshirani, Jerome Friedman
Pattern Classification by Peter E. Hart, David G. Stork, and Richard O.Duda
Bayesian Reasoning and Machine Learning by David Barber
Information Theory, Inference, and Learning Algorithms by David MacKay
An Introduction to Support Vector Machines and Other Kernel-based Learning Methods by Nello Cristianini, and John Shawe-Taylor
Boosting: Foundations and Algorithms by Schapire, Robert E., and Freund, Yoav

Advanced Texts

Dive into Deep Learning by Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola
Deep Learning by Ian Goodfellow and Yoshua Bengio and Aaron Courville
Deep Learning with Python by Francois Chollet
Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto

Statistics, Linear Algebra & Calculus Texts

Advanced Engineering Mathematics (10th Ed.) by Erwin Kreyszig
All of Statistics: A Concise Course in Statistical Inference by Larry Wasserman
Convex Optimization by Boyd and Vandenberghe

Interesting & Enlightening Texts

The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World by Pedro Domingos
Artificial Intelligence: A Guide for Thinking Humans by Melanie Mitchell
The Deep Learning Revolution by Terrence J. Sejnowski
Prediction Machines: The Simple Economics of Artificial Intelligence by Ajay Agrawal, Joshua Gans and Avi Goldfarb
Thinking, Fast and Slow by Daniel Kahneman
The Drunkard's Walk: How Randomness Rules Our Lives by Leonard Mlodinow
The Signal and the Noise: Why So Many Predictions Fail - but Some Don't by Nate Silver
Calculated Risks: How to Know When Numbers Deceive You by Gerd Gigerenzer
The Black Swan: The Impact of the Highly Improbable by Nassim Nicholas Taleb
Surfaces and Essences: Analogy as the Fuel and Fire of Thinking by Douglas Hofstadter and Emmanuel Sander
The Book of Why: The New Science of Cause and Effect by Judea Pearl and Dana Mackenzie
Rebooting AI: Building Artificial Intelligence We Can trust by Gary Marcus and Ernest Davis
Interpretable Machine Learning: A Guide for Making Black Box Models Explainable by Christoph Molnar

Machine Learning & Related Courses/Talks

Collaboration Tool

Deepnote: Jupyter-compatible with real-time collaboration and running in the cloud

Machine Learning

Introduction to Machine Learning

Schedule

Text Resources

Following books are useful as introductory texts

Optional Texts

Advanced Texts

Statistics, Linear Algebra & Calculus Texts

Interesting & Enlightening Texts

Machine Learning & Related Courses/Talks

Collaboration Tool

Google Colab Tutorials

Python

Open Data Repositories

ML Podcasts

Journals

Conferences Proceedings