Machine learning is an increasingly popular field in computer science that involves a lot of mathematics. Those getting started on reading top published papers in ML may find themselves like a deer in the headlights in terms of the math behind it all. In industry and research, it is important to not only have software development and computer science fundamentals but also math fundamentals (at the undergraduate engineer level, to start). The following is a 6 month mastery guide for the advanced mathematics behind machine learning algorithms, which is important whether you aspire to be a ML researcher or ML engineer.
The Superpower of Math
Developing strong math knowledge is like having a superpower for picking up any STEM subject. Mathematics is the language for scientists and engineers and acquiring the right foundation will prepare you for any topic in medicine, physics, engineering, finances, and many other STEM fields. You will find that a wide variety of fields use the same math fundamentals to explain seemingly exclusive phenomenon. For instance, eigendecomposition is widely utilized in signal processing, stress/strain modeling in civil engineering, portfolio optimization in finance, dimensionality reduction in data science, vaccination design in medicine, and many other examples.
ML Math Breakdown
I hope this guide will be useful to you in understanding the fantastic algorithms in machine learning. The math involved is going to be roughly the following:
- Linear Algebra (35%)
- Probability Theory and Statistics (25%)
- Multivariate Calculus (15%)
- Multivariate Statistics (8%)
- Convex Optimization(8%)
- Algorithms & Others (9%)
If no one has told you this before, mathematics is a subject that can only be mastered through practicing problems. It is easy to watch someone else solve problems and think that you know how to do it, too. When the moment of truth comes, you may find that watching others do math only leaves you with superficial understanding. The correct way to develop substantial intuition is by also struggling through the problems on your own. This goes for all of the topics on the list above. This guide assumes that you are already competent with calculus, at the minimum.
Month 1: Linear Algebra
Start with linear algebra in the first month. The best free resource for this is by MIT's Professor Gilbert Strang: Linear Algebra. The mechanics of calculating for vector and matrix operations will be the simpler parts of this course. The most important skill to develop is intuition and the ability to visualize these operations in 2D and 3D space. More specifically, you want to be able to visualize eigendecomposition and change of basis.
An amazing resource for visualizing linear algebra is Essence of Linear Algebra by "3blue1brown" on Youtube. A.k.a. Grant Sanderson, who studied math at Stanford. These videos are very short at about 10-15 minutes each, and there are 15 videos.
If you need a textbook with problems to work on, there is a free linear algebra textbook by Jeff Hefferon. You can also find problem sets for past semesters at MIT or other universities with free online resources.
Mastering linear algebra is going to help you in representing massive amounts of data as vectors and matrices as well as visualizing/transforming N-dimensional feature space.
An essential component of deep learning involves matrix algebra. If you get stuck on certain matrix manipulations that a researcher has done in a paper, I recommend keeping as reference a 66-page book called "Matrix Cookbook" by Kaare Brandt Petersen & Michael Syskind Pedersen.
Month 2: Probability Theory and Statistics
This month is going to be the hardest month. Probability theory is a very prominent part of understanding ML. Some of the key concepts to know include Bayes' Theorem, maximum likelihood estimation, distributions, inference, regression, so on and so forth. Probability theory is very unlike the rules you learn in deterministic mathematics such as calculus or algebra. Probability theory is the science of uncertainty. It is appropriate for real world data that is dynamic and random.
An excellent resource on this subject is a class on MIT OCW: EECS 6.041+6.431 Probabilistic Systems Analysis and Applied Probability by Professor John Tsitsiklis.
The textbook for this class is: Introduction to Probability by Bertsekas, Dimitri, and John Tsitsiklis.
Alternatively, if you are comfortable with this level of probability , then watch the more advanced Probability Foundations lectures by Professor Krishna Jagannathan. The companion textbook to this series is: Probability and Random Processes by Grimmett & Stirzaker.
If you going into industry, then do the series by Professor Tsitsiklis. If you are starting your PhD studies, then do the series by Professor Jagannathan.
Month 3: Multivariate Calculus
For ML, you really only need differential calculus, so you can delay integral calculus for a later time. For the first half of the month, focus on the 99 lectures by "3blue1brown" (average 5-8 minutes). The rest of the lectures are on integral calculus. In the remaining days, watch the first 20 lectures of "Multivariate Calculus" by Professor Denis Auroux or the first 6 multivariate calculus lectures by Adrian Banner. If you are behind, you can just do the lectures by 3blue1brown.
By this point, you will have covered 75% of the math for ML algorithms!
Month 4: Multivariate Statistics
Multivariate statistics involves a combination of linear algebra, probability, and statistics in higher dimension feature spaces. The material in this section covers some of the most well-known ML algorithms. This subject is quite vast with no specific syllabus. Since our goal is ML, we will just focus on a few topics. Complete one of these three tasks:
- Ch. 4, 7, 8 from the book Applied Multivariate Statistical Analysis by R. Johnson.
- Ch. 2, 3, 11 from the book An Introduction to Multivariate Statistical Analysis by T W Anderson
- Look for well-liked content on the internet on Multivariate Normal Distribution, Principle Component Analysis (PCA), Singular Value Decomposition (SVD), Covariance Matrix, and Gaussian Mixture Models.
You probably only need about half of the month to complete this part. Either take a break or start early on the next subject, which is going to be harder.
Month 5: Convex Optimization
This subject is going to require all of the math that you learned previously, as well as more rigorous multivariate calculus which was not covered as much in month 3. Convex optimization is about building an initial model with tunable parameters. The values of these parameters are estimated by optimization theory and determine how well the model represents empirical data. The best resource on this is by Professor Stephen Boyd of Stanford. Be sure to download the companion book by Boyd. You will probably not read this book all in one go. Pick and choose sections as needed or as you find interesting. This class is quite theoretical, appropriate for PhD candidates or researchers, but may be too much for ML engineers.
Alternatively, industry professionals may be interested in the more practical course, Applied Optimization for Wireless, by Professor Aditya Jagannatham. Students without an electrical engineering background may find the section on wireless communication bewildering. So, watch lectures 1-33 and 42-50 (average 20 minutes each). Then, read chapter 6 of Boyd's book and practice the programming for this chapter on Boyd's website.
Month 6 and Beyond
When you get here, you will have grasped the foundational mathematics behind machine learning. Now go back to the introductory ML classes you took previously. With the mathematical foundations behind you now, you may find that ML is a little more fun and intuitive than before! You will now see the probabilistic properties of data, change cost functions per the data to get better results, and make your own custom algorithms.
Start collecting research papers for your quiver. Many good ones can be found on arXiv as well as papers from top AI conferences, such as NIPS, AAAI, ICML, etc. Add to your reading list the top papers on the subjects that interest you. As you finish each, pick three from the listed resources from those papers to add to your reading list. Doing this regularly will exercise your ML/AI muscles. When you read each paper, test yourself on these three questions:
- Can you understand the jargon?
- Can you understand the math?
- Can you implement the math into code without relying on libraries?
If you can answer all three, then you know you are up to speed. Sometimes you may not be able to fully implement do to data constraints and other issues... But if you can understand the mathematical reasoning, then you can say that your training is complete. Don't focus on learning more frameworks or packages. Focus on learning concepts.
It is important to not try to use ML as the tool for every problem. Not every problem is a nail that can be solved by a hammer. You should not neglect your other computer science skills and brush up on data structures and algorithms as well.
For additional learning, the following are highly recommended for ML enthusiasts:
- Books and lectures by Professor Hastie and Professor Tibshirani from Stanford. More specifically, Elements of Statistical Learning is an excellent resource. Alternatively, a lighter book is Introduction to Statistical Learning.
- Pattern Recognition and Machine Learning by Christopher Bishop.
- Probabilistic Machine Learning: An Introduction by Kevin Patrick Murphy.
- All of Statistics by Larry Wasserman.