CS 517A - Machine Learning
- Spring 2008
Text book:
- Tom M. Mitchell, Machine
Learning, McGraw Hill, 1997
Location: McDonnell
Hall 362
Time: Tuesdays and Thursdays, 2:30 pm - 4:00pm
Instructor office hours: Tuesday and Thursday after class, Jolley
506, or by
appointment.
TAs and TA hours: Yunpeng Xu
(yunpengtsu (at) gmail.com or xuyp (at) cse.wustl.edu), Tu 7-8pm,
Jolley Hall 509
Brief description from the course
catelog: Formerly CS 527A. The field of machine learning
is concerned with the question of how to construct computer programs
that automatically improve with experience. Recently, many successful
machine learning applications have been developed, ranging from
data-mining programs that learn to detect fraudulent credit card
transactions, to information-filtering systems that learn users'
reading preferences, to autonomous vehicles that learn to drive. There
have also been important advances in the theory and algorithms that
form the foundation of this field. This course will provide a broad
introduction to the field of machine learning.
How the course will be taught
(which may affact how you take it): I will not use slides
for delivering lectures. Instead, I will talk over and write on
the board the main concepts, ideas, formulae and examples of the topics
to be discussed. This way the content can be delivered in a good
pace so that students can follow relatively easily while taking notes
and learn "on the fly". This means that the students are not
required to know the topics before the classes, but rather can learn
directly from the class lectures. This also implies that,
although not required, the students are better off attending the
classes. To reinforce the knowledge learned in the classroom, the
students will spend time on reading materials and homework assignments
for each of the main topics of the course and finish a course
project. To help with learning, lecture notes will be available
on line.
Schedule
Policies on homework, project, and
grading
Collaboration Policy
Project description
Schedule (Note: This schedule may be adjusted to meet
the needs of the course.)
******** Introduction and
background ********
Main topics:
Admin stuff; What is machine learning? ...
Number
of lectures: 1
Reading: Chapter
1
Lecture notes
******** Instance-based
learning ********
Main topics: k-nearest neighbor,
locally weighted averaging, locally weighted regression ...
Number
of lectures: 2
Reading: Chapter
8
Lecture notes
******** Decision trees ********
Main topics:
Basic concepts;
Entropy; Information gain; ID3; Bias; Overfitting and pruning;
continuous attributes; Gain ratio splits ...
Number
of lectures: 3
Reading: Chapter 3
Lecture notes
******** Multi-layer
perceptons (artificial neural networks) ********
Main topics:
linear and
non-linear units; Gradient descent; Multi-layer networks ...
Number
of lectures: 3
Reading: Chapter 4
Lecture notes
******** Support vector
machines ********
Topics:
Basic concept; Maximal margine classifier; Kernel functions; ...
Number
of lectures: 3
Reading: there
are many good tutorial materials on the web. one place is
here
Lecture notes
******** Evaluating hypotheses ********
Main topics:
Many different
quality measures; fussion matrix; ROC; Sampling techniques; Confidence
interval; Comparing
learning algorithms
...
Number
of lectures: 2
Reading: Chapter
5
Lecture notes
******** Bayesian
learning ********
Main topics:
Basic concepts and Bayes rule; Bayesian networks; Bayesian decision theory; MAP hypothesis;
Bayesian classifier ...
Number
of lectures: 3
Reading: Chapter 6 (6.1, 6.2, 6.4, 6.5,
6.7-6.10)
Lecture notes
******** Reinforcement learning
********
Main topics:
Basic concepts; model-based learning; Temporal difference learning ...
Number
of lectures: 3
Reading: Chapter 13
Lecture notes
******** Combining multiple
learners
********
Main topics:
Basic concepts; voting; Bagging; Boosting ...
Number
of lectures: 2
Reading: see lecture notes
Lecture notes
********
Course projects ********
Main topics:
Student project presentations
Number
of lectures: 4
Goto Top
Policies on homework, project
and grading
- Homework Assignments: There will be four sets of
homework
assignments, each of which covers two main topics of the total eight.
The assignments are due either in the instructor's office
in Jolley 506 by 2:15pm or in class in the beginning of the
class on the given due date. Any homework submitted in class after 3pm
will be given a 30% late penalty. If you arrive in class
after 3pm, wait until the end of the class to bring up your homework.
It
is VERY disruptive to have someone walking in late and coming to the
front of the class to submit homework. No assignments will
be accepted after the instructor leaves the classroom. You are strongly advised to get started
early so that you can get help if needed. These homeworks are not
designed to be done in just a few days. There may be more than a single
way
to solve a given problem. You are expected to spend several hours for a
two
week
homework assignment --- It will be very frustrating if you try to do
this all in the last few days. For some problems you may need to think
about them a while and then set them aside for a little.
- Project: There will be no exam for the course. Instead, we
will have a course project for everyone. (The technical detail will be
provided later.) A project will have two parts:
-
Reading some papers on a particular machine learning topic
that we
do not have time to cover or we do not go into detail in the class, and
then giving an in-depth presentation to describe and discuss the
problem, objectives, data used, procedure and
techniques adopted, your criticism on the existing work, and your
thoughts and suggestions on possible future research.
-
Designing an algorithm/method for a particular datamining
problem, most possibly from the paper(s) you read.
In addition to the presentation slides,
every student must submit a final report of his/her project. The
following items must be included and covered in detail: Problem
description, data and method used, detail of existing algorithms,
design and implementation of your own algorithm, algorithm
analysis and comparison, result analysis and future
work.
- Computation of the Final Grade: The following elements and
scores will go into the final grade:
- Homework: 60 points total.
- Project: 40 points total. 10 for presentation, 20
for your design and implementation and 10 for report.
The following scale will be used to
compute the final grade from your total points earned:
- A: 85-100
- B: 70-84
- C: 60-69
- D: 50-59
- F: < 50
Goto
Top
Policy on collaboration
When solving your homework problems and working on your project, you
may discuss HIGH-LEVEL approaches to the homework problems with
your classmates, HOWEVER, you are to work out all details of any
solutions discussed and write up the solution completely on your own.
In particular, when working with a student on an assigned homework
problem you should do so verbally -- Nothing should be written.
Remember to keep your discussion at a high-level so that everyone can
work out the details on their own. Also you must clearly
acknowledge anyone (except the instructor) with whom you discussed any
problem and say briefly what you discussed.
Please keep any discussions you have with other students to a small
group of no more than 3 students and be sure that each of you are
equally involved. If you just listen in and are then able to understand
and write up the solution you have missed at least half of the benefit
of the homework. It is really important to work through the process of
recognizing when you are heading the wrong way and learning how to work
through the problem solving process.
Violations of any of the above rules will be dealt with harshly!
The homework problems and projects are designed to help you learn the
material being taught. Being told the solution and understanding it is
VERY different from working through the process of actually finding a
solution. If you do not take an active role in the process of solving
the homework problems and project, then you won't get much out of it,
hence you won't learn the material.
Goto
Top
Created by Weixiong
Zhang,
January 2008.