It’s often hard to get into new fields like machine learning or data science. One reason for it being hard, lies in the fact that we don’t even know what do not know.
This list is a compilation of sources that will help you, dear reader, to fill the gaps towards using data science and machine learning. Here, we will not define what these things are, or why exactly you need them. They are more geared towards something to come back to, when you find gaps in your education, while delving into machine learning.
Many readers will be students or come from underprivileged backgrounds. Here I have tried my best to curate free resources, or resources that in my opinion are worth the investment. You will see that some links are monetized, but these links were added after I have curated the content, you will not see any funny business here.
However, although they are arranged in the conventional “start here, then move on” way, as this is the easiest way to structure text, I believe this is not the best way to learn. I would suggest to you, to dive in deep and coming back to this resource, once you find roadblocks. You will find the fast.ai course is very interesting and will throw you right into the fascinating world of deep learning. Sometimes you will realize that maybe your math or your python skills are lacking, then this can be used as a central resource to find a way to fill the gap.
Or read it back to back, I’m not your mother after all.
Mathematics lies at the core of machine learning. The main components that you have to understand are linear algebra and calculus. Linear algebra is about understanding how neural networks themselves work and calculus is all about training neural networks. If you had troubles in school with math, like me, you may want to start at getting the basics right. Having a good foundation will make your life easier in every subsequent step.
Precalculus and General applications in Math
Khan Academy is fantastic at replacing the bad math education you endured in school. If you find you are out of your depth with the videos below, start where school left you with Khan Academy then use both 3Blue 1Brown and Khan Academy to learn the math necessary for modern machine learning.
This is the best introduction to vectors and matrix calculations I have ever seen.
Just as well this is the best overview of calculus I have ever seen.
Coding Focused Linear Algebra
This course focuses on translating math to the text editor. It’s good to understand how you can actually put vector logic into your computer. This course has written material and video content, as touching the code yourself is always the best way to learn.
Numerical optimization is a very general skill, but it lies at the basis of every machine learning application. Be it support vector machines or neural networks, they all minimize an objective function. These functions are extremely important if you want to start doing some work in machine learning. There is a nice introductory video series available. Additionally, you can check out the other two books detailing all of optimization and focusing on convex problems.
Programming in Python
I am a Python fan. It’s an extremely readable and high-level programming language. Python itself is very slow, but it can utilize almost any other programming language. When we use the deep learning framework Tensorflow in Python, it is just an abstraction of the real C++ Tensorflow that is extremely optimized and fast. Learning at least some Python will take you a long way in data science. Data has to be cleaned, transformed and feed into the hand-crafted algorithms. Learn the basics in Python and it will take you a long way. Pandas is a Python library that you will find very helpful, when working with data, therefore Pandas is included in this list.
Machine learning is just statistics at scale. Knowing statistical basics is very important to not make a fool of ourselves when throwing a learning machine at data. There is a free classic book and several free courses. I like the following to give specific insights into descriptive and inferential statistics as well as a general basic stats course.
Design thinking may be the odd one out. However, this thought process has helped me immensely in improving both my code and my interaction with colleagues. Ego is the Enemy is a book about stoic philosophy that isn’t directly Design thinking, but a good reminder to keep ourselves in check, even when we are doing cutting edge research.
Making your research reproducible is very important. However, just sharing your code often isn’t enough to make it easily reproducible.
Let’s be honest here, we scientists are mostly hacking together our code and sometimes we don’t know ourselves why it’s working. The following resources will help you create documentation and bundle your code together with the software to run and share. Jupyter is fantastic for exploring data with code. Docker is fantastic at bundling your code and the necessary software for it to be run by peers.
Data Science is the category where machine learning fits into. Analyzing data in various forms is sort of an art and takes practice. Kaggle stands uncommented, as it’s a resource you have to know in this field. And of course, following the Way of the Geophysicist.
There is classic machine learning that includes methods such as logistic regression and support vector machines. While deep learning is very shiny, often times a simpler model is preferable, learn these first.
All the rage at the moment. Three free courses, of which my favorite are the Fast.AI courses. The deep learning book is free and fantastic as well.
Cloud Computing and HPC on clusters is at the heart of deep learning. Many do not have a desktop PC anymore, therefore flocking to Google Cloud or AWS is natural. On Google Colab you will get a free GPU to use, but the setup can be tricky for all of them. These resources try to get you on your way.
Once you have these concepts down, you can dive off the deep end. Neural networks can mathematically be described as acyclic directed graphs. In the case of some recurrent neural networks, the topology is actually a cyclic directed graph. Additionally, a new type of network that is promised to be changing everything about neural nets are capsule networks. These are only possible due to dynamic routing and stray from the basic linear concepts that come with standard convolutional neural networks. To understand these concepts, graph theory comes in very handy.
Johns Hopkins Uni script on adaptive routing essential for caps nets and helpful for graph convolutional nets.
Useful Tips and Tricks
These are links that don’t quite fit in other categories, but are concepts and resources that are important.
Nash Equilibrium on Khan Academy – concept from game theory that is important for generative adversarial networks
Stay Up to Date
A field that evolves at this kind of speed is hard to keep up with. Here are three resources I use to learn about the newest flavors of deep learning.