Introduction to Machine Learning

School of Computing, University of Nebraska-Lincoln
Fall 2021: CSCE 478/878

Synopsis: The Introduction to Machine Learning (ML) course provides a rigorous mathematical treatment of various ML models that include supervised as well as unsupervised learning approaches. It utilizes the probabilistic perspective, in particular the Bayesian view of Statistics, for presenting the models. The course requires implementing the ML algorithms from scratch (using vanilla python and its scientific non-ML libraries). Students must have strong programming skills in Python as well as a background in probability & statistics, linear algebra, calculus, and algorithm complexity analysis. The assignments are programming-heavy and time-consuming.

Instructor
Dr. M. R. Hasan
Office Hours
See the course Canvas page

Lecture Time

Tuesday and Thursday: 11.00 AM - 12.15 PM in Avery Hall 119

Assignments
Recitations
Syllabus
Class discussion
Teaching Assistant

See the course Canvas page
See the course Canvas page
See the course Canvas page
See the Piazza link on the course Canvas page
See the course Canvas page

GitHub repositories of my tutorials on Machine Learning and Deep Learning



Schedule

Topic, PDF Slides, & Misc. Resources Video Links
Note: I will assume that you have a background in Probability Theory (discrete & continuous) and Linear Algebra. Thus, I will not go through all the slides from Probabilistic Reasoning and Linear Algebra. These are provided to refresh your memory. Only the Information Theory will be presented in the lecture.

[ML Background] Probabilistic Reasoning

  • Probabilistic Reasoning-1
    • Uncertainty & Probability
    • Probabilistic Reasoning (Frequentist & Bayesian)
    • Sample space and Random variable
  • Probabilistic Reasoning-2
    • Discrete Probability Theory
    • Sum & Product Rule
    • Chain rule of Probability
    • Bayes’ Rule
    • Joint and Conditional Distribution
    • Reducing The Complexity of Joint Distribution
    • Unconditional and Conditional Independence
  • Probabilistic Reasoning-3
    • Continuous Probability Theory
    • Probability Density Function
    • Expectation
    • Variance
    • Covariance & Correlation
Readings:
  • Bishop: 1.21, 1.22, 1.23, 1.6
  • Murphy: 2.2, 2.8
[ML Background] Gaussian Distribution

[ML Background] Linear Algebra for Machine Learning

  • Linear Algebra for Machine Learning-1
    • What is Linear Algebra?
    • How is Linear Algebra useful in Machine Learning?
  • Linear Algebra for Machine Learning-2
    • Mathematical Objects (Scalars, Vectors, Matrices, Tensors)
    • Measuring the Size of Vectors and Matrices (various norms)
    • Some Special Matrices (Symmetric, Identity, Diagonal, Orthogonal)
    • Inverse of a Matrix
    • Orthogonal Matrix
    • Matrix & Vector Multiplication (dot, inner & Hadamard Product)
    • Orthogonal Transformation
  • Linear Algebra for Machine Learning-3
    • Motivation for solving a system of linear equation (linear systems)
    • Method of Gauss elimination & back substitution
    • Square Matrix: Gauss-Jordan Elimination Method
    • Conditions for a unique solution of a linear system
    • Determinant
    • Singular Matrix
    • Span of columns of a matrix
    • Linear Independence of columns of a matrix
    • Basis of the columns of a matrix
    • Rank of a matrix
    • Change of bases
    • Computation of Rank: Row-echelon form
  • Linear Algebra for Machine Learning-4
    • Intuition of the Eigenvalue equation
    • Matrix eigenvalue problem
    • Computing eigenvalues & eigenvectors
    • Characteristic equation of a matrix
    • Eigenbasis
    • Matrix diagonalization
    • Eigendecomposition
  • Linear Algebra for Machine Learning-5
    • Quadratic form of a vector
    • Positive Definite & positive semi-definite matrix
    • Summary of the discussion on Linear Algebra for Machine Learning
Readings:
Information Theory

  • Information Theory-1
    • Information Theory (Message vs. Information)
    • Entropy & Cross-Entropy
    • Relative Entropy or Kullback-Leibler (KL) Divergence
  • Information Theory-2
    • KL Divergence for Maximum Likelihood Estimation
  • Information Theory-3
    • Independence of Random Variables & Information Gain
    • KL Divergence & Information Gain
Readings:
  • Bishop: 1.21, 1.22, 1.23, 1.6
  • Murphy: 2.2, 2.8
Course Introduction

Jupyter notebook demo: Readings:
  • Russell & Norvig: 1
  • Geron: 1

Fuel your imagination
Analogy-based Learning: K-Nearest Neighbors

Jupyter notebooks: Readings:
  • Bishop: 2.5
  • Murphy: 1.4.1, 1.4.2, 1.4.3
  • Alpaydin: 8.1, 8.2, 8.3. 8.4
  • [Classification performance metrics] Geron: 3

Fuel your imagination
Frequentist & Bayesian Learning

Jupyter notebooks: Readings:
  • Bishop: 2.1
  • Murphy: 2.3.1

Fuel your imagination
Statistics & Probability-based Learning: Linear Regression

Jupyter notebooks: Readings:
  • Bishop: 1.1, 3.1.1, 3.1.2, 3.1.4
  • Murphy: 1.4.5, 1.4.7, 1.4.8, 2.4.3, 7.3, 7.3.1, 7.3.2, 7.5.1
  • Geron: 4
  • Wasserman: chapter 10
Statistics & Probability-based Learning: Logistic Regression

Jupyter notebooks: Readings:
  • Bishop: 4.3.2
  • Murphy: 8.1, 8.2, 8.3.1, 8.3.2
Neural Connection-based Learning: Nonlinear Model 1 - Multi-Layer Perceptron

Jupyter notebooks: Readings:
  • Alpaydin: 11
  • Bishop: 5.1, 5.3, 5.5
  • Murphy: 5.1, 5.3, 5.5
  • Geron: 10

Fuel your imagination
Analogy-based Learning: Support Vector Machine

Jupyter notebooks: Readings:
  • Alpaydin: 10.3, 13.1, 13.2
  • Murphy: 14.5.2.2
  • Geron: chapter 5, appendix C
  • Boyd: chapter 4 & 5

Fuel your imagination
Symbolic Rule-based Learning: Nonlinear Model 2 - Decision Tree

Jupyter notebooks: Readings:
  • Bishop: 14.4
  • 16, 16.1, 16.2.1, 16.2.2, 16.2.3, 16.2.4
  • Geron: chapter 6

Fuel your imagination
Ensemble-based Learning: Ensemble Method for Performoncement Enhancement: Bagging and Random Forest

Jupyter notebooks: Readings:
  • Bishop: 14.2
  • Geron: chapter 7
Probability-based Learning: Naive Bayes

Jupyter notebooks: Readings:
  • Bishop: 2.1, 2.2
  • Murphy: 1.2, 1.3, 2.3.1, 2.3.2, 3.5, 3.5.1, 3.5.2, 3.5.3, 3.5.4, 3.5.5
  • Chapter 21 (Natural Language Processing) from Data Science from Scratch by Joel Grus (O’Reilly)
Probability-based Learning: Hidden Markov Model

Jupyter notebooks: Readings:
Unsupervised Learning: Dimensionality Reduction

Jupyter notebooks: Readings:
  • Alpaydin: 6.1, 6.3, 6.6, 6.8
  • Geron: chapter 8
Unsupervised Learning: Clustering & Anomaly Detection

Jupyter notebooks: Readings:
  • Bishop: 9.1, 9.1.1, 9.2, 9.3.2
  • Murphy: 11.1, 11.2, 11.3, 11.4.1, 11.4.2, 11.4.2.5, 11.4.2.6, 11.4.2.7
  • Geron: chapter 9
Note: Following lectures were given in the Data Modeling course (CSCE 411/811), thus you will find references to some course specific artifacts.



Text Resources
  • Lecture Slides & Jupyter notebooks (thorough and extensive) should provide a detailed account of the topics.

  • Though there is no one required text for this course, my lectures will draw references from the following books.

  • Machine Learning: A Probabilistic Perspective by Kevin P. Murphy
  • Pattern Recognition and Machine Learning by Christopher M. Bishop
  • Introduction to Machine learning (3rd ed.) by Ethem Alpaydin
For discussing practical implementation issues and hands-on insights, the following text will be used:
  • Hands-On Machine Learning with Scikit-Learn, Keras and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems (2nd Edition, 2019) by Aurélien Géron (O'Reilley)
Following books are useful as introductory texts
  • Machine Learning by Tom Mitchell
  • Data Science from Scratch by Joel Grus (O’Reilly)
  • Python for Data Analysis (2nd Edition) by Wes McKinney (O'Reilley)
  • Python Machine Learning by Sebastian Raschka (Packt Publishing)
  • The Hundred-Page Machine Learning Book by Andriy Burkov
  • Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig
Optional Texts
  • The Elements of Statistical Learning: Data Mining, Inference, and Prediction by Trevor Hastie, Robert Tibshirani, Jerome Friedman
  • Pattern Classification by Peter E. Hart, David G. Stork, and Richard O.Duda
  • Bayesian Reasoning and Machine Learning by David Barber
  • Information Theory, Inference, and Learning Algorithms by David MacKay
  • An Introduction to Support Vector Machines and Other Kernel-based Learning Methods by Nello Cristianini, and John Shawe-Taylor
  • Boosting: Foundations and Algorithms by Schapire, Robert E., and Freund, Yoav
Advanced Texts
Statistics, Linear Algebra & Calculus Texts
  • Advanced Engineering Mathematics (10th Ed.) by Erwin Kreyszig
  • All of Statistics: A Concise Course in Statistical Inference by Larry Wasserman
  • Convex Optimization by Boyd and Vandenberghe
Interesting & Enlightening Texts
  • The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World by Pedro Domingos
  • Artificial Intelligence: A Guide for Thinking Humans by Melanie Mitchell
  • The Deep Learning Revolution by Terrence J. Sejnowski
  • Prediction Machines: The Simple Economics of Artificial Intelligence by Ajay Agrawal, Joshua Gans and Avi Goldfarb
  • Thinking, Fast and Slow by Daniel Kahneman
  • The Drunkard's Walk: How Randomness Rules Our Lives by Leonard Mlodinow
  • The Signal and the Noise: Why So Many Predictions Fail - but Some Don't by Nate Silver
  • Calculated Risks: How to Know When Numbers Deceive You by Gerd Gigerenzer
  • The Black Swan: The Impact of the Highly Improbable by Nassim Nicholas Taleb
  • Surfaces and Essences: Analogy as the Fuel and Fire of Thinking by Douglas Hofstadter and Emmanuel Sander
  • The Book of Why: The New Science of Cause and Effect by Judea Pearl and Dana Mackenzie
  • Rebooting AI: Building Artificial Intelligence We Can trust by Gary Marcus and Ernest Davis
  • Interpretable Machine Learning: A Guide for Making Black Box Models Explainable by Christoph Molnar


Machine Learning & Related Courses/Talks


Collaboration Tool


Google Colab Tutorials


Python


Open Data Repositories


ML Podcasts


Journals


Conferences Proceedings