The Deep Learning Series: History of Machine Learning

A brief introduction to the history of Machine Learning.


History of Machine Learning

Deep Learning has now reached a place where it is used widely, with industries investing huge amounts of time and money into it. While it is safe to say that the industry has been shifting to using a lot of Machine Learning algorithms, It has not been all Deep Learning.

Think of Deep Learning as a hammer to hit a problem with. We can use many different hammers, but we need to find the right hammer to solve our problem easier and more efficiently.

— Nikhil Dev Deshmudre

Deep Learning is not always the right approach to solve a problem. Think about the issues: sometimes we do not have a large enough dataset, we do not have the computing power, or there is a better algorithm.

If we have started by learning Deep Learning before learning Machine Learning, we would think of Deep Learning as the hammer, and every ML problem starts to look like a nail. We must find the right hammer. Although, we will not be going over a full course on Machine Learning (maybe later; who knows?), we will go over some classical algorithms to better understand where Deep Learning stands in the broader context of Machine Learning and Artificial Intelligence.

Early Machine Learning Algorithms

Naive-Bayes Theorem

This theorem uses the principles of statistics for Data Analysis. It is one of the earliest forms of ML and is still used today.

Naive-Bayes is a type of Machine Learning classifier based on applying Bayes Theorem while assuming that the features in the input are all independent ( A naive assumption). This analysis started even before the time of computers and was done manually, dating back to the 1950s.

A related model to the Bayes theorem is called logreg (Logistic Regression). This is actually a classification algorithm rather than a regression one. Usually, a Data-Scientist would use logreg to get a feel of the classification problem at hand.

Support Vector Machines (SVM’s)

As Neural Networks became more popular, a new approach called support vector machines came into the scene and destroyed the concept of Neural networks.

SVM’s solve classification problems by finding a good decision boundary between two sets of points. A decision boundary can be thought of as a line that separates training data into two spaces. To classify, we just need to check which side of the decision boundary the data point is located.

Decision Boundary — Towards Data Science

This technique of classifying problems may look good but in real life, our data-points are much more complicated and hard to do. A hack here would be to use Kernel functions

The function of the kernel is to take data as input and transform it into the required form. Different SVM algorithms use different types of kernel functions. These functions can be of different types. For example, linear, nonlinear, polynomial, radial basis function (RBF), and sigmoid.

— Data Flair (

At the time of SVM’s, they were state of the art and were widely popular, which were backed by extensive theory and backed by math. But over time, SVM’s proved hard to scale to large datasets and did not provide good results for Image Classification. SVM is what is called a shallow method, which means applying it to perceptual problems would require having to extract features manually(Also called Feature Engineering).

Decision Trees, Random Forests and Gradient Boosting Machines

Decision Trees

Decision Trees are flowchart like structures or Tree structures which let us classify input data points or predict outputs given the values of input. They are easy to visualize and interpret.

In particular, the Random Forest algorithm introduced a more robust, efficient, and practical take on decision tree learning. It involves building a large number of specialized decision trees and then ensembling their outputs. Random forests are known to be the second-best algorithm for a shallow task.

In 2014, Gradient Boosting Machines started being more favored over Random Forest Algorithms. Gradient Boosting Machine is a machine learning technique that is also based on ensembling weak prediction models, but the difference being this algorithm uses Gradient Boosting. Gradient Boosting is a way to improve the Machine Learning model by iteratively training new models that specialize in addressing the weak points of the previous models. It is probably one of the best algorithms for dealing with non-perceptual problems today.


Now that we have a basic understanding of Machine Learning algorithms, we are ready to dive deeper into Deep Learning. As always, Thank you for making it this far, and I hope to see you in the next article where we talk about the Mathematical Blocks of Neural Networks.

See you there!

I write about ML, Web Dev, and more topics. Subscribe to get new posts by email!

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

This blog and the website is open-source on Github.

At least this isn't a full screen popup

Subscribe to my newsletter to get new posts by email! I write about DL, Climate x Tech, and more topics.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.