AMS 380, Data Mining
Catalog Description:
This course will teach the basic ingredients of classical and contemporary statistical
data mining methods, including dimension reduction, model selection, pattern recognition,
and predictive modeling using traditional general linear models and generalized linear
models, and modern statistical learning methods, such as decision trees, random forests,
neural networks, etc. The course will teach how to employ and implement these methods.
Prerequisite: AMS 210 or MAT 211; and AMS 311
3 credits
Offered initially spring 2021; thereafter, spring, summer and fall.
Course Materials for Fall 2026:
Required Textbooks:
"An Introduction to Statistical Learning with Applications in Python (ISLP)" by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani; (Spring Texts in Statistics); 2023rd edition, Springer; ISBN: 978-3031387463
"Hands-On Machine Learning with Scikit-Learn, Keras and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems" by Aurelien Geron; 3rd edition, Publisher: O'Reilly Media, November 8, 2022; ISBN: 978-1098125974
Recommended Textbooks:
"Learning from Data, A Short Course" by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, Hsuan-Tien Lin; 2012; ISBN: 978-1-60049-006-4
"Probabilistic Machine Learning: An Introduction" by Kevin P. Murphy; The MIT Press; 2022; ISBN: 978-0262046824
"Dive into Deep Learning" by Aston Zhang, Zachary C. Lipton, Mu Li, Alexander J. Smola; 1st edition; Cambridge University Press, 2023; ISBN: 978-1009389433
"Introduction to Applied Linear Alegbra: Vectors, Matrices, and Least Squares" by Stephen Boyd, Lieven Vandenberghe; 1st edition; Cambridge University Press, 2018; ISBN: 978-1316518960
"scikit-learn User Guide"
SYLLABUS
- Some basic statistical tests
- Linear regression and classic variable selection
- Regularized linear regression
- General linear model
- Cluster analysis
- Principle Component Analysis
- Statistical Resampling methods
- Random Forests
- Neural Networks
Learning Outcomes for AMS 380, Data Mining:
1) Demonstrate understanding of classical and contemporary data mining methods including:
*Dimension reduction;
*Variable selection;
*Pattern recognition.
2) Demonstrate understanding of predictive modeling using:
*Traditional linear models;
*Generalized linear models.
3) Demonstrate understanding of modern statistical learning models, including:
*Classification and regression trees;
*Random forests;
*Neural networks.
4) Demonstrate mastery of using these statistical procedures with the programming
languages:
* Python.
