Softmax Function in Deep Learning!

3 min readSep 15, 2022

Multi-class classification using Softmax activation function in Deep Learning

Softmax function is also known as Softargmax or normalized exponential function. It is a generalization of the logistic function to multiple dimensions. It is used in multinomial logistic regression & is often used as the last activation function of a neural network to normalize the output of a network to a probability distribution over predicted output classes, based on Luce’s choice axiom.

The standard unit of softmax function:

In simple words, it applies the standard exponential function to each element zi of the input vector z & normalizes these values by dividing the sum of all these exponentials, this normalization ensures that the sum of the components of the output vector sigma(z) is 1. K is the number of classes in the multi-class classifier.

Calculating the Softmax :

Imagine we have an array of three real values. These values could typically be the output of a machine learning model such as a neural network. We want to convert the values into a probability distribution.

First, we should calculate the exponential of each element of the input array.

These values do not look like probabilities yet. Note that in the input elements, although 7 isonly a little larger than 4, 1096 is much larger than 54 due to the effect of the exponential. We can obtain the normalization term, the bottom half of the softmax equation, by summing all three exponential terms:

Normalization term has been dominated by z1.

Finally, dividing by the normalization term, we obtain the softmax output for each of the three elements.

It is informative to check that we have three output values which are all valid probabilities, that is they lie between 0 and 1, and they sum to 1.

Note also that due to the exponential operation, the first element, the 7, has dominated the softmax function and has squeezed out the 4 and0 into very low probability values.

If you use the softmax function in a machine learning model, you should be careful before interpreting it as a true probability, since it has a tendency to produce values very close to 0 or 1. If a neural network had output scores of [7, 4, 0], like in this example, then the softmax function would have assigned 95% probability to the first class, when in reality there could have been more uncertainty in the neural network’s predictions. This could give the impression that the neural network prediction had high confidence when that was not the case.

Softmax Function in Deep Learning!

Written by Bhaumik Tyagi

No responses yet