In the Fall of 2017, I developed a Sports Analytics course at UC-Berkeley that teaches the fundamentals of sports analytics and data science.
This post will consist of various forms of information about the course.
For starters, the course has been taught by Professor Alex Papanicolau, a graduate student in Berkeley’s School of Information. He has taught the course for three semesters and has further developed it past my initial work.
For this project, I regularly met with him and lead a team of five undergraduate students who developed various materials for the course.
Course Description: The principles of data science meet sports analytics. What makes a good hitter in baseball? How do you measure that? What are the flaws of plus/minus in basketball? Do Steph Curry or Klay Thompson ever get a hot hand? When should a coach go for it on 4th down? This course cover a wide range of topics on the analytical thinking behind the data revolution in sports and explore data science through the lens of sports analytics.
Course Objectives: This course will demystify the analytical thinking behind the data revolution in sports and learn data science through a wide range of topics. We will discuss the theory, development, and application of data science analytics in sports. Students will learn about measuring performance, inference, regression modeling, and the idiosyncracies and subtleties of data as well as common pitfalls. At the end of the course, students will have deeper appreciation of how the methods of data science applied to sports have broader applicability and will be well on their to way to engaging with data in their own projects, education, or career.
The course was aimed at students who were enrolled in or had completed UC-Berkeley’s Introduction to Data Science course.
It was also meant to provide a cursory look at some of the introductory topics of the field of sports analytics and data science (through a sports lens), based on various books and other courses on the subject.
Topics:
- 01 - Intro
- 02 - Pythagorean Expectation
- 03 - Measuring Performance
- 04 - Run Expectancy
- 05a - Linear Weights
- 05b - Efficiency
- 06a - Breakeven Probability
- 06b - Shooting Metrics
- 06c - Four Factor Model
- 07 - Football
- 08 - Spatial Analysis in Basketball
- 09a - WAR_Win Shares
- 09b - New Toolbox
- 10 - Regression Modeling
- 11a - Data
- 11b - The Hot Hand
- 12a - Regression to the Mean
- 12b - Ranking Systems
- 12c - Summary
The course also featured a project, which students were required to complete.
Groups of 3-4 students worked together and project ideas were provided. Groups chose an idea and wrote a proposal that elaborates a bit on the idea on what they hope to achieve.
Milestones: 1) Group formation and proposal (~Week 4) 2) Data acquisition (~Week 5) 3) Preliminary results (~Week 8) 4) Finals results, report, and presentation (~Week 14/End)
At the end of the semester, students completed a 3-5 page report, provided accompanying code or notebooks, and gave a short presentation for the class in the last week.
Course Links with the Materials:
-
https://github.com/ds-modules/SPORTS (see Notes for our best work)
-
https://github.com/ds-modules/SPORTS/blob/master/notes/Run%20Potential%2C%20Run%20Production%2C%20and%20RE24.ipynb
-
https://github.com/apapanico/LS88
-
https://github.com/apapanico/data8-sports-materials
It definitely was a fun process to develop a course and I am proud of the work we all put in.