Supervised Learning with Azure

Published in

School of ML

7 min readAug 4, 2020

Machine learning sounds cool, doesn’t it? I’m a biology student who didn’t have any idea about this branch of computer science. This lockdown gave me the time and strength to explore it. For those who need a layman intro to machine learning, I shall share an example. One day my dad asked me what do I keep studying? I didn’t know how to explain it to him. Words going on in my mind were normalization, overfitting, models, azure, etc. The next minute, he was trying to type a text to a friend by using google speech recognition on his phone. My next sentence was, that’s what I am studying dad! The science behind this process is what is called machine learning. It is a subset of artificial intelligence that focuses on creating programs that are capable of learning without explicit instruction.

The following article includes one of the basic concepts of machine learning i.e. Supervised Learning. Hope you all enjoy it!

1. Supervised Learning: Classification

The first type of supervised learning that we’ll look at is classification. Recall that the main distinguishing characteristic of classification is the type of output it produces:

In a classification problem, the outputs are categorical or discrete.
Within this broad definition, there are several main approaches, which differ based on how many classes or categories are used, and whether each output can belong to only one class or multiple classes. Let’s have a look.

Some of the most common types of classification problems include:

· Classification on tabular data: The data is available in the form of rows and columns, potentially originating from a wide variety of data sources.

· Classification on image or sound data: The training data consists of images or sounds whose categories are already known.

· Classification on text data: The training data consists of texts whose categories are already known.

As we know, machine learning requires numerical data. This means that with images, sound, and text, several steps need to be performed during the preparation phase to transform the data into numerical vectors that can be accepted by the classification algorithms.

The following images are just an introduction to the various algorithms with their major characteristics. No need to get overwhelmed! Learning about algorithms is a slow and steady process.

*One-vs-all method: A binary model is created for each of the multiple output classes. Each of these binary models for the individual classes is assessed against its complement (all other classes in the model) as though it were a binary classification issue. Prediction is then performed by running these binary classifiers and choosing the prediction with the highest confidence score.

In essence, an ensemble of individual models is created and the results are then merged, to create a single model that predicts all classes. Thus, any binary classifier can be used as the basis for a one-vs-all model.

*SMOTE (synthetic minority oversampling technique) is one of the most commonly used oversampling methods to solve the imbalance problem. It aims to balance class distribution by randomly increasing minority class examples by replicating them. SMOTE synthesizes new minority instances between existing minority instances.

2. Multi-Class Algorithms
a) Multi-class Logistic Regression
*Logistic Regression is a classification method used to predict the value of a categorical dependent variable from its relationship to one or more independent variables assumed to have a logistic distribution. If the dependent variable has only two possible values (success/failure), then the logistic regression is binary. If the dependent variable has more than two possible values (blood type given diagnostic test results), then the logistic regression is multinomial.

2 Key parameters to configure this algorithm are:
-Optimization tolerance: control when to stop the iterations. If the improvement between iterations is less than the specified threshold, the algorithm stops and returns the current model.

-Regularization weight: Regularization is a method to prevent overfitting by penalizing the models with extreme coefficient values. This factor determines how much to penalize the models at each iteration.

b) Multi-class Neural Network
Include the input layer, a hidden layer, and an output layer. The relationship between input and output is learned from training the neural network on input data.
3 key parameters include:
-The number of hidden nodes: Lets you customize the number of hidden nodes in the neural network.
-Learning rate: Controls the size of the step taken at each iteration before correction.
-The number of Learning Iterations: The maximum number of times the algorithm should process the training cases.

c) Multi-class Decision Forest
An ensemble of decision trees. Works by building multiple decision trees and then voting on the most popular output class.
5 key parameters include:
-Resampling method: This controls the method used to create the individual trees.
-The number of decision trees: This specifies the maximum number of decision trees that can be created in the ensemble.
-Maximum depth of the decision trees: This is a number to limit the maximum depth of any decision tree.
-The number of random splits per node: The number of splits to use when building each node of the tree.
-The minimum number of samples per leaf node: This controls the minimum number of cases that are required to create any terminal node in a tree.

3. Supervised Learning: Regression
In a regression problem, the output is numerical or continuous.

3.1 Introduction to Regression
Common types of regression problems include:

· Regression on tabular data: The data is available in the form of rows and columns, potentially originating from a wide variety of data sources.

· Regression on image or sound data: Training data consists of images/sounds whose numerical scores are already known. Several steps need to be performed during the preparation phase to transform images/sounds into numerical vectors accepted by the algorithms.

Regression on text data: Training data consists of texts whose numerical scores are already known. Several steps need to be performed during the preparation phase to transform the text into numerical vectors accepted by the algorithms.

Examples: Housing prices, Customer churn, Customer Lifetime Value, Forecasting (time series), and Anomaly Detection.

3.2 Categories of Algorithms

Common machine learning algorithms for regression problems include:

· Linear Regression

· Fast training, linear model

· Decision Forest Regression

· Accurate, fast training times

· Neural Net Regression

· Accurate, long training times

Numerical Outcome: Dependent variable
*Ordinary least squares method: Calculates error as a sum of the squares of distance from the actual value to the predicted line. It fits the model by minimizing the squared error. This method assumes a strong linear relationship between the inputs and the dependent variable.
*Gradient Descent: The approach is to minimize the amount of error at each step of the model training process.

The algorithm supports some of the same hyper-parameters discussed for multi-class decision forest algorithms such as the number of trees, maximum depth, etc.

Since it is a supervised learning method, it requires a tagged dataset that includes a label column which must be a numerical data type. The algorithm also supports the same hyper-parameters as the number of hidden nodes, learning rate, and the number of iterations that were included in a multi-class neural network algorithm.

*Regularization is one of the hyperparameters in machine learning which is the process of regularizing the parameters that restrict, regularizes, or reduces the coefficient estimates towards zero. This technique avoids the risk of overfitting by discouraging the learning of a more complex or flexible model.

4. Automate the training of Regressors
Key challenges in successfully training a machine learning model include:
-selecting features from the ones available in the datasets
-choosing the right algorithm for the task
-tuning the hyperparameters of the selected algorithm
-selecting the right evaluation metrics to measure the performance of the trained model
-the entire process is pretty iterative

The idea behind Automated ML is to enable the automated exploration of the combinations needed to successfully produce a trained model. It intelligently tests multiple algorithms and hyper-parameters in parallel and returns the best one. The next steps include the deployment of the model into production and further customization or refinement if needed to improve performance.

Material Reference:
Udacity Fundamental Course in Machine Learning for Microsoft Azure
https://docs.microsoft.com/en-us/azure/?product=featured
https://docs.microsoft.com/en-us/

Happy learning :)

Supervised Learning with Azure

Written by Shatakshi Pachori