ORIE 5260: Machine Learning in Finance

This is an archive of course materials for ORIE 5260, taught at Cornell Tech in 2018.

Course description

Machine learning is a field at the intersection of computer science and statistics that aims to develop computational systems that learn from data and improve with experience. Though its origins lie in the field of artificial intelligence, modern machine learning has transformed a huge variety of areas, such as biology, medicine, e-commerce, retail, marketing, operations, logistics, politics, journalism, and, of course, finance.

This course provides a general introduction to machine learning with a view towards applications in finance. The goal is to provide both a solid grounding in the foundations of machine learning as well as a conceptual map of the field and its relation to areas like statistics and optimization. The focus is on mathematical and conceptual understanding; the course will occasionally touch on implementation issues and financial examples, but will not emphasize either aspect in coursework.

Topics include linear regression, logistic regression, exponential families, generalized linear models, generative models, support vector machines, loss functions and regularization, sparsity, Bayesian methods, model selection, the EM algorithm, clustering, principal components analysis, and convex optimization and optimization algorithms.

Prerequisites

The course requires background in linear algebra, probability, and optimization at the level of MATH 2940, ORIE 5500, and ORIE 5300.

Course information

The course will meet Tuesdays, 11 AM - 12:15 PM and 1:15 PM - 2:30 PM. These times may be adjusted through the semester.
There is no course textbook; lecture notes, in the form of slides, will be posted online.
Office hours will be by appointment, though this may change through the semester. Course staff is available by email.

Course requirements and grading

The course grade will depend on two factors: attendance and problem sets (there will be no exams). Problem sets will generally take 2-3 weeks each depending on the particular material being covered, with a total of roughly 6 problem sets. Homework should be typed, preferably in LaTeX, and submitted to the TA by email.

Every student is expected to abide by the Code of Academic Integrity of Cornell University. In particular, you must work on the problem sets alone; you can discuss the problems with other students, but only at the level of a hallway discussion. You also should not consult external references. It is fine to look up standard mathematical results as long as they are not the subject of a given problem.

Syllabus

Introduction
Convex optimization
Supervised learning
- Linear regression
- Logistic regression
- Exponential families and generalized linear models
- Generative models for classification
- Support vector machines, duality, and kernelization
Model selection, regularization, and Bayesian methods
Optimization algorithms
Unsupervised learning

The syllabus may be adjusted through the course of the semester. Several diagrams throughout are due to Andrew Ng; Boyd and Vandenberghe; and Hastie, Tibshirani, and Friedman.

Homework

Problem Set 1 due February 13.
Problem Set 2, due February 27.
Problem Set 3, due March 13.
Problem Set 4, due March 27.
Problem Set 5, due April 26.
Problem Set 6, due May 8.

Readings

These readings will be posted intermittently through the semester and are entirely optional. Their goal is to give some exposure to the history, culture, and debates of machine learning, statistics, and data science, and to give additional perspective. Some are just included for historical interest and are not intended to be read cover to cover.

A. Halevy, P. Norvig, and F. Pereira. The unreasonable effectiveness of data. IEEE Intelligent Systems, 2009.
D. Mumford. The dawning of the age of stochasticity. From Mathematics towards the Third Millennium, 1999.
J. Gleick. Breakthrough in problem solving. The New York Times, 1984.
S. Stigler. Gauss and the invention of least squares. Annals of Statistics, 1981.
L. Breiman. Statistical modeling: the two cultures. Statistical Science, 2001.
T. Bayes. An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society, 1764.
B. Efron. Controversies in the foundation of statistics. American Mathematical Monthly, 1978.
D. Freedman. Some issues in the foundation of statistics. Foundations of Science, 1995.
D. Donoho. 50 years of data science. From Tukey Centennial Workshop, Princeton, NJ, 2015.
Z. Tufekci. YouTube, the great radicalizer. The New York Times, 2018.