Supervised Machine Learning

6 min readAug 29, 2021

Supervised Learning is a type of machine learning used to train models using labelled datasets. Labelled data is a type of data that has been tagged with one or more labels identifying certain characteristics, properties or classifications. In supervised learning, we provide supervision in terms of labels, and in this type of data, the mapping of the inputs with the correct outputs is known.

How Supervised Learning Works?

Let’s understand it by taking an example. Suppose the model needs to classify apple and mango images. For classification, the model is fed with a labelled dataset and using these labels, it tries to find patterns for the images. After training, when an unknown image is fed, the model will precisely recognize it as apple or mango.

Applications of Supervised Learning:

⇨ Spam Detection- Gmail filters a new email into Inbox (normal) or Junk folder (Spam) based on past information of spam.

⇨ Weather Prediction- The predictions made by weather apps at a given time are based on prior knowledge and analysis of weather over a period of time for a particular place.

⇨ Diagnosis- In the healthcare sector, it is used to detect diseases like diabetes, cancer, COVID, etc.

⇨ In fact, Netflix uses supervised learning algorithms to recommend to users the shows they may watch based on the viewing history and ratings by similar classes of users.

⇨ Fraud Detection

⇨ Image Classification

⇨ Speech and Object Recognition

⇨ Bioinformatics

Types Of Supervised Machine Learning:

Supervised Learning has been broadly classified into 2 types.

· Regression

· Classification

Classification can again be categorized into 3 types:

● Binary classification: The input variables are segregated into two groups.

● Multiclass/Multinomial classification: The input variables are classified into three or more groups.

● Multilabel classification: Multiclass is generalized as multilabel.

Must know Supervised Algorithms:

1. Linear Regression:

This type of algorithm assumes that there is a linear relationship between the Input (X) variable and the Output(Y) variable. The input and output variables are also called the independent and dependent variables respectively. In this type of regression, the data is modelled using a straight line which means we draw a straight line to segregate the data points. The accuracy of this type of regression is measured by loss, R squared etc.

2. Logistic Regression:

Logistic Regression predicts discrete values for the set of independent variables that have been passed to it. Data is modelled using the S-shaped logistic function (Sigmoid function). The algorithm predicts the probability of the new data, so its output lies in the range of 0 and 1. Some of the methods which can be used to calculate the accuracy are the F1 score, Precession Recall and Confusion matrix.

3. KNN (K-Nearest Neighbours):

KNN is a non-parametric algorithm because it uses no assumption about the dataset when the model is used. It uses two distance methods to classify each point. Two distance methods used in KNN are Euclidean Distance and Manhattan Distance. Euclidean Distance is calculated as the square root of the sum of the squared differences between a new point(x) and an existing point (y). Manhattan distance is the distance between real vectors and is calculated using the sum of their absolute differences.

In the first step, the K value is chosen by the user to tell the algorithm about the number of neighbours (surrounding data points) that should be considered while making a judgement. In the second step, the model checks the distance between the target example and every example in the dataset. The distances are then added to a list and then sorted. Further, the sorted list is checked and the labels for the top K elements are returned.

4. Support Vector Machine (SVM)

SVM algorithms are based on the statistical learning theory. Kernel functions are used and the algorithm creates a hyper-plane which is used to classify classes.

In the above picture, if the data point is present in the left margin then it is classified as a dog and if the data point is present in the right margin, then it is classified as a cat. New points are added in the space gap, to classify the points based on the gap in which it lies.

5. Decision Tree:

In order to build a tree, we use the CART algorithm, which stands for Classification and Regression Tree algorithm. In a Decision tree, there are two nodes- Decision Node and Leaf Node. Decision nodes are used to make a decision and have multiple branches, whereas Leaf nodes are the output of those decisions and do not contain any further branches. This algorithm can be useful for decision-making problems.

While using a Decision tree, we need to select the best attribute for the root node and sub-nodes, and for that, we use a technique which is known as Attribute selection measure or ASM. There are two popular techniques for ASM - Information Gain and Gini Index. Information gain is the measurement of changes in entropy after the segmentation of a dataset based on an attribute.

Decision Trees use the method of Information Gain to find out the feature that gives the best information and makes it the root node and continues the process till they can classify each instance of the dataset. Every branch in the Decision Tree represents a feature of the dataset.

6. Random Forest:

It is an ensemble learning method combining multiple decision trees to give the final result. It takes the average or majority of all the decision trees to give the output. A greater number of trees in the random forest leads to higher accuracy and prevents the problem of overfitting. Random Forest is a classifier that contains a number of decision trees on various subsets of the given dataset and takes the average to improve the accuracy.

7. Naïve Bayes:

Naïve Bayes helps to classify the data based upon the conditional probability values. It implements the Bayes theorem for the computation and uses class levels represented as feature values. The Naïve Bayes algorithm assumes that the features of the dataset are independent of each other.

The main advantage is that they work well on large datasets. In this algorithm, we use probability, also known as likelihood, whose formula is given below. Here in this formula, we know the probability of event A, knowing that event B has already occurred.

Comparison of Supervised Machine Learning with Unsupervised Machine Learning:

-Priyanka Kumari

IEEE-PCS Chapter, VIT Vellore

Supervised Machine Learning

Applications of Supervised Learning:

Types Of Supervised Machine Learning:

Must know Supervised Algorithms:

Comparison of Supervised Machine Learning with Unsupervised Machine Learning:

Written by IEEE-PCS, VIT-Vellore

No responses yet