
Machine Learning
CS 691A
Fall 2016
Prof. Gianfranco Doretto
West Virginia University, Department of Computer Science and
Electrical Engineering


Shortcuts: [eCampus][Class
Notes][Handouts][Assignments][Final
Project]

Time and Place
First lecture: August 18, 2016
Last lecture: December 6, 2016
 Tuesday, 2:00pm  3:15pm in ESBE G84
 Thursday, 2:00pm  3:15pm in ESBE G84
Office
Hours
Thursdays 3:30pm to 4:30pm or by
appointment (send email)


Important
Dates
 Midterm Exam: December 1, 2016
 Final Project: 5pm on Friday, December 9, 2016


Syllabus .pdf
What could be achieved by building automated systems that
improve their performance through experience? Robots would learn
how to navigate around based on experience gained by roaming
their environment. Cellphones would get better at recognizing
your voice, and executing and anticipating your commands.
Physicians would prescribe personalized therapies that best cure
diseases based on historical health records. Machine learning
addresses those, and many more problems through the study and
development of techniques enabling a system to automatically
learn from observed data.
The purpose of this graduate course is to provide an
introduction to the fundamental set of techniques and algorithms
that constitute machine learning as of today, while providing a
thorough grounding in the methodologies and theoretic
foundations. The course will also discuss recent applications of
machine learning, such as to computer vision, robotic control,
data mining, autonomous navigation, bioinformatics, speech
recognition, and text and web data processing.
The topics covered can be roughly summarized as follows:
 Basic statistics : Probabilities, MLE, MAP
 Instancebased Learning :
KNearest Neighbors, collaborative filtering
 Generative Models :
Naïve Bayes, linear discriminant analysis
 Linear Rules :
Linear regression, logistic regression, perceptron
 Support Vector Machines :
Optimal hyperplane, margin, kernels
 Statistical Learning Theory :
PAC learning, VC dimension
 Component Anlysis : PCA, Kernel PCA
 Clustering :
HAC, kmeans, mixture of Gaussians
 Hidden Markov Models: probabilistic
model, estimation, Viterbi
 Graphical Models :
directed, undirected, learning, inference
 Decision Trees :
attribute selection, pruning and overfitting
 Ensemble Learning : Bagging, boosting
Prerequisites
Students entering the class are expected to have a
preexisting working knowledge (at the level of an undergraduate
class) of multivariate calculus, linear algebra, probability and
statistics, and computer programming and algorithms, though the
class has been designed to allow students with a strong numerate
background to catch up and fully participate. Basic knowledge of
optimization is a plus. Please ask the instructor to take the
selfassessment exam to make sure you have the right background.


Schedule
and Class Notes
 08/19: Introduction [slides]
[slides
3/page]
 Course policies
 What is machine learning, and its applications
 Machine learning problem types
 Materials: M:1
 08/23: Basic statistics [slides]
[slides
3/page]
 Probabilities, independence
 Parameter estimation: maximum likelyhood, and a posteriori
 Hoeffding's bounds
 Materials: M:2.12.5, M:3.13.4, M:4.6.14.6.2, M:6.4.2 or
probability and statistics handout
 08/25: Naive Bayes [slides]
[slides
3/page]
 Zeroone Bayesian predictor
 Naive Bayes assumption, learning
 Bag of words model
 Gaussian Naive Bayes
 Materials: M:5.75.7.1.1, M:3.5, M:4.24.2.1
 08/28: Logistic Regression [slides]
[slides
3/page]
 MLE and MAP training
 Gradient descent
 Generative vs. Discriminative Classifiers
 Materials: M:1.4.6, M:8.18.2, M:8.3.18.3.2,
M:8.3.68.3.7, M:4.2.24.2.5, M:8.68.6.2
 09/06: Linear Regression [slides]
[slides
3/page]
 Least squares training
 Ridge regression and lasso
 Polinomial and nonlinear features
 Materials: M:1.4.5, M:7.17.3, M:7.5
 09/08: Instance Based Learning [slides]
[slides
3/page]
 Density estimation, binning, Parzen window, smooth kernels
 kNearest Neighbor
 WatsonNadaraya model, kernel regression, classification,
locally weighted regression
 Materials: M:1.4.11.4.2, M6.4.4, M14.7
 09/13: Perceptron and Kernels [slides]
[slides
3/page]
 Perceptron criterion, algorithm, convergence
 Kernels, Mercer theorem, properties
 Materials: B:4.1.7, M:14.114.2
 09/15: Optimization [slides]
[slides
3/page]
 Convexity
 Unconstrained optimization, gradient descent, Newton's
methods
 Constrained optimization, Lagrange duality, LP, QP
 Materials: optimization handouts Tutorial
1 and Tutorial
2
 09/22: Support Vector Classification [slides]
[slides
3/pages]
 Large margin separation, optimization, support vectors
 Soft margin formulation, dual problem, nonlinear
separation, risk and loss
 Materials: M:14.5, B:7.1
 09/29: Risk Minimization and Model Selection [slides]
[slides
3/page]
 Risk and loss, ERM, overfitting, estimation and
approximation errors
 Crossvalidation, regularized risk minimization,
information criteria
 Materials: Sections 1 through 4.2, and 9 of handout,
M:6.5
 10/04: Support Vector Regression [slides]
[slides
3/pages]
 Representer theorem, kernel ridge regression
 Sparse vector formulation
 Materials: M:15.4.7, B:7.1.4
 10/06: Dimensionality Reduction [slides]
[slides
3/page]
 10/13: Clustering [slides]
[slides
3/page]
 Hierarchical Clustering
 Partitional Clustering, Kmean, spectral clustering
 Materials: M:25.1, M:25.5, M:11.4.2.511.4.2.7, M11.5.2,
M:25.5, handout Spectral
Clustering
 10/20: Mixture Models and Expectation Maximization [slides]
[slides
3/page]
 Mixture models, density estimation, clustering, parameter
estimation
 EM, general formulation, convergence, applications
 Materials: M:11.1, M:11.2.1, M:11.2.3, M:11.3,
M:11.4.111.4.2, M:11.4.7, M:21.121.2
 10/28: Hidden Markov Models [slides]
[slides
3/page]
 Markov models
 ForwardBackward, Viterbi,
BaumWelch algorithms
 Materials: M:17.117.2.1, M:17.3,
M:17.417.5, M:18.1M18.2
 11/3: Decision Trees [slides]
[slides
3/page]
 Representation, prediction,
learning, pruning
 Materials: M:16.1, M:16.2
 11/10: Graphical Models [slides]
[slides
3/page]
 BN, and MRF representation
 Inference, variable elimination,
learning tree structures
 Materials: M:10.110.2, M:10.5,
M:19.119.3, M:19.4.1, M:10.3, M:20.3, M:10:4, M:26.126.3
 11/17: Ensemble Learning [slides]
[slides
3/page]
 Boosting, AdaBoost, analysis
 Bagging, random forests
 Materials: M:16., handout
AdaBoost, HTF:8.7, HTF:15.115.3
 11/29: Neural Networks [slides]
[slides
3/page]
 Structure, representational power
 Backpropagation, overfitting
 Materials: M:16.5
 12/6 Deep Learning [slides]
[slides
3/page]
 Convolutional Neural Networks
 Deep Belief Networks
 Materials: M:27.7, M:28.128.4


Assignments and Attendance
Homework assignments can be downloaded from eCampus.
All assignments are due at the beginning of the class on the
due date.
 Homework
1, due Thursday, September 29, at the beginning of the
class
 Homework
2, due Tuesday, October 27, at the beginning of the
class, download HW2_data
 Homework
3, due Tuesday, November 29, at the beginning of the
class, download HW3_data
 Homework
4, due Tuesday, December 6, at the beginning of the
class, download HW4_data
Programming Languages
In class examples will be given in Matlab, which should
provide an effective way to address the programming assignments
and final project needs.
Assignment
Policy
I expect you to try solving each assignment on your
own. However, when being stuck on a problem, I encourage you to
collaborate with other students in the class, subject to the
following rules:
 You
may discuss a problem with any student in this class, and
work together on solving it. This can involve brainstorming
and verbally discussing the problem, going together through
possible solutions, but should not involve one student
telling another a complete solution.
 Once
you solve the homework, you must write up your solutions on
your own, without looking at other people's writeups or
giving your writeup to others.
 In
your solution for each problem, you must write down the
names of any person with whom you discussed it. This will
not affect your grade.
 Do
not consult solution manuals or other people's solutions
from similar courses.
Late
Assignment Policy
Each student may use three “late days” for the whole course.
The late day count starts from the day and time of the due
date of each assignment. Fractional late days are taken into
account. Additional “late days” after the first three come at
a day cost of 20% of the assignment.
Attendance
Policy
Consistent with WVU guidelines, students absent from regularly
scheduled examinations because of authorized University
activities will have the opportunity to take them at an
alternate time. Makeup exams for absences due to any other
reason will be at the discretion of the instructor


Final
Project
Final project
directions


Grading
The course format will predominantly consist of lectures.
Students will have few simple programming assignments, as well
as homework problems. Each student will complete a final
project, and give a project presentation to the class.
Final grades will be based approximately on the following
distribution (extra points may be assigned based on class
participation):
 40%: Homework assignments (~4 assignments)
 30%: Midterm exam
 30%: Final project
Bonus points may be assigned based on class participation. In
addition, creative solutions to homework problems and
programming assignments are appreciated, and may be rewarded by
extra bonus points.
Grades will be assigned according to the following scale:
A=90100; B=8089; C=7079; D=6069; F= below 60.


Reference Material
Textbook
The main textbooks for the class are:
 Class notes and handouts
 M: Murphy, Machine Learning: A
Probabilistic Perspective, The MIT Press, 2012. (online)
Additional books that can serve as secondary reference on the
topics covered in class:
 B: Bishop, Pattern Recognition and
Machine Learning, Springer, 2006. (online)
 HTF: Hastie, Tibshirani, Friedman, The
Elements of Statistical Learning, Springer, 2009. (online)
 Duda, Hart, Stork, Pattern Classification, Wiley,
2000. (online)
 Mitchel, Machine Learning, McGrawHill, 1997. (online)
In addition, these are some books for further reading beyond
the scope of the course:
 Boyd, Vandenberghe, Convex Optimization, 2004 (online)
 Vapnik, The Nature of Statistical Learning Theory,
Springer, 2000. (online)
 Schoelkopf, Smola, Learning with Kernels, The MIT
Press, 2001. (online)
 Cristianini, ShaweTaylor, Support Vector Machines and
other kernelbased learning methods, Cambridge
University Press, 2000. (online)
 Joachims, Learning to Classify Text using Support
Vector Machines, Kluwer, 2002. (online)
 Manning, Schuetze, Foundations of Statistical Natural
Language Processing, MIT Press, 1999. (online)
 Manning, Raghavan, Schuetze, Introduction to
Information Retrieval, Cambridge, 2008. (online)
 Koller, Friedman, Probabilistic Graphical Models:
Principles and Techniques, The MIT Press, 2009. (online)
Very
Important Background Handouts
 Probability and Statistics
 Linear Algebra
 Optimization
 Matlab


Inclusivity and Academic Integrity
Inclusivity Statement
The West Virginia University community is committed to
creating and fostering a positive learning and working
environment based on open communication, mutual respect, and
inclusion.
If you are a person with a disability and anticipate needing any
type of accommodation in order to participate in this class,
please advise me and make appropriate arrangements with the
Office of Disability Services (2936700). For more information
on West Virginia University's Diversity, Equity, and Inclusion
initiatives, please see http://diversity.wvu.edu.
Integrity Statement
The integrity of the classes offered by any academic
institution solidifies the foundation of its mission and cannot
be sacrificed to expediency, ignorance, or blatant fraud.
Therefore, I will enforce rigorous standards of academic
integrity in all aspects and assignments of this course. For the
detailed policy of West Virginia University regarding the
definitions of acts considered to fall under academic dishonesty
and possible ensuing sanctions, please see the Student Conduct
Code at http://www.arc.wvu.edu/admissions/integrity.html. Should
you have any questions about possibly improper research
citations or references, or any other activity that may be
interpreted as an attempt at academic dishonesty, please see me
before the assignment is due to discuss the matter.
Page
layout by Thorsten
Joachims
