Disciplines/Subjects: Mathematics, Linear Algebra, Statistics, Machine Learning
Key Themes: Matrix Decomposition, Dimensionality Reduction, Statistical Modeling, Real-World Applications
This project explores the application of Principal Components Analysis (PCA) as a statistical tool for dimensionality reduction in real-world datasets. Starting with the foundational theory, learners learn the relationship between Singular Value Decomposition (SVD) and PCA, and how PCA can address common statistical dilemmas such as high dimensionality in data. Using Python, learners apply PCA to the "Prostate Cancer" dataset, exploring how the method extracts the most important components for predicting prostate-specific antigen (PSA) levels from various clinical measurements. Through this process, learners identify and analyze the principal components, evaluate the results, and compare the PCA-derived model with traditional linear regression models. The project emphasizes both the mathematical theory behind PCA and its practical application in data science. In addition, learners write their own PCA code from scratch using SVD, reflecting on the underlying algorithm and comparing their implementation to established Python instructions.
Habits of mind: Curiosity, Continuous Learning, Strive for Excellence
Transferable skills: Organizing and Representing Information, Identifying Patterns and Relationships, Modeling
Content Knowledge:
Understanding PCA as a method for dimensionality reduction and its application in machine learning.
Linking Singular Value Decomposition (SVD) theory to PCA.
Utilizing Python or Excel for statistical analysis, including loading vectors, biplots, and regression models.
Evaluating statistical models using metrics such as R-squared and residual plots.
Reflecting on PCA algorithms and implementing them through coding.