The reason for the "zeros" lies in the data type of the inputs, which are of the "int" type. Converting the input to "float" solved the problem:
import numpy as np
#scores = np.array([1.0, 2.0, 3.0])
scores = np.array([
[1, 2, 3, 6],
[2, 4, 5, 6],
[3, 8, 7, 6]
])
def softmax(x):
x = x.astype(float)
if x.ndim == 1:
S = np.sum(np.exp(x))
return np.exp(x) / S
elif x.ndim == 2:
result = np.zeros_like(x)
M, N = x.shape
for n in range(N):
S = np.sum(np.exp(x[: , n]))
result[: , n] = np.exp(x[: , n]) / S
return result
else:
print("The input array is not 1- or 2-dimensional.")
s = softmax(scores)
print(s)
Note that I've added "x=x.astype(float)" to the first line of the function definition. This yields the expected output:
[ [0.09003057 0.00242826 0.01587624 0.33333333] [0.24472847 0.01794253 0.11731043 0.33333333] [0.66524096 0.97962921 0.86681333 0.33333333] ]
The problem in your code is how you instantiate the placeholder for the results that you're about to compute, that is
result = np.zeros_like(x)
because if x
is an array of integers, also result
is an array of integers and when you assign to it,
result[: , n] = np.exp(x[: , n]) / S
A possible solution, that you can use in your code as is, consists in instantiating an array of float, irrispective of the type of x
result = np.zeros(x.shape)
A small test,
In[32]: sm(np.array([
[1, 2, 3, 6],
[2, 4, 5, 6],
[3, 8, 7, 6]
]))
Out[32]:
array([
[0.09003057, 0.00242826, 0.01587624, 0.33333333],
[0.24472847, 0.01794253, 0.11731043, 0.33333333],
[0.66524096, 0.97962921, 0.86681333, 0.33333333]
])
In[33]:
Following the suggestion from n13 the function can be rewritten as
def sm(a):
s = np.exp(a)
if a.ndim < 3: return s / s.sum(0)
the sum() method takes an argument axis which allows us to restrict the sum to a given axis - columns maps to axis 0 in our case.
def softmax(x):
exp = np.exp(x) # exp just calculates exp
for all elements in the matrix
return exp / exp.sum(0) # sum axis = 0 argument sums over axis representing columns
The implementation uses shifting to avoid overflow. See [1] for more details.,The softmax function transforms each element of a collection by computing the exponential of each element divided by the sum of the exponentials of all the elements. That is, if x is a one-dimensional numpy array:,The softmax function is the gradient of logsumexp.,Compute the softmax transformation along the second axis (i.e., the rows).
softmax(x) = np.exp(x) / sum(np.exp(x))
>>> from scipy.special
import softmax
>>>
np.set_printoptions(precision = 5)
>>> x = np.array([ [1, 0.5, 0.2, 3], ...[1, -1, 7, 3], ...[2, 12, 13, 3] ]) ...
>>> m = softmax(x) >>> m array([ [4.48309e-06, 2.71913e-06, 2.01438e-06, 3.31258e-05], [4.48309e-06, 6.06720e-07, 1.80861e-03, 3.31258e-05], [1.21863e-05, 2.68421e-01, 7.29644e-01, 3.31258e-05] ])
>>> m.sum() 1.0
>>> m = softmax(x, axis = 0)
I use the softmax function constantly. It's handy anytime I need to model choice among a set of mutually exclusive options. In the canonical example, you have some metric of evidence, \(X = \{ X_1, X_2, ... X_n\} \), that an item belongs to each of \(N\) classes: \( C = \{C_1, C_2, ... C_n\} \). \(X\) can only belong to one class, and larger values indicate more evidence for class membership. So you need to convert the relative amounts of evidence into probabilities of membership within each of the classes.,I use this sort of function all the time to simulate how people make decisions based on evidence. But unfortunately, there is no built in numpy function to compute the softmax. For years I have been writing code like this:,The function works beautifully and has a nice safeguard against overflow in the exponential. And, if you're like me, including it will prevent you from writing a handful of one-off implementations of the softmax. I'll round this out with a few examples of its usage:,Of course, usually X and theta come from somewhere else. This works well if you are only simulating one decision: the softmax requires literally two lines of code and its easily readable. But things get thornier if you want to simulate many choices. For example, what if X is a matrix where rows correspond to the different choices, and the columns correspond to the options?
import numpy as np X = np.array([1.1, 5.0, 2.8, 7.3]) # evidence for each choice theta = 2.0 # determinism parameter ps = np.exp(X * theta) ps /= np.sum(ps)
X = np.array([ [1.1, 5.0, 2.2, 7.3], [6.5, 3.2, 8.8, 5.3], [2.7, 5.1, 9.6, 7.4], ]) # looping through rows of X ps = np.empty(X.shape) for i in range(X.shape[0]): ps[i,: ] = np.exp(X[i,: ] * theta) ps[i,: ] /= np.sum(ps[i,: ])
def softmax(X, theta = 1.0, axis = None): "" " Compute the softmax of each element along an axis of X. Parameters -- -- -- -- -- X: ND - Array.Probably should be floats. theta(optional): float parameter, used as a multiplier prior to exponentiation.Default = 1.0 axis(optional): axis to compute values along.Default is the first non - singleton axis. Returns an array the same size as X.The result will sum to 1 along the specified axis. "" " # make X at least 2 d y = np.atleast_2d(X) # find axis if axis is None: axis = next(j[0] for j in enumerate(y.shape) if j[1] > 1) # multiply y against the theta parameter, y = y * float(theta) # subtract the max for numerical stability y = y - np.expand_dims(np.max(y, axis = axis), axis) # exponentiate y y = np.exp(y) # take the sum along the specified axis ax_sum = np.expand_dims(np.sum(y, axis = axis), axis) # finally: divide elementwise p = y / ax_sum # flatten if X was 1 D if len(X.shape) == 1: p = p.flatten() return p
X = np.array([ [1.1, 5.0, 2.2, 7.3], [6.5, 3.2, 8.8, 5.3], [2.7, 5.1, 9.6, 7.4], ]) # softmax over rows softmax(X, theta = 0.5, axis = 0) >>> array([ [0.055, 0.407, 0.015, 0.413], [0.822, 0.165, 0.395, 0.152], [0.123, 0.428, 0.59, 0.435] ]) # softmax over columns softmax(X, theta = 0.5, axis = 1) >>> array([ [0.031, 0.22, 0.054, 0.695], [0.204, 0.039, 0.645, 0.112], [0.022, 0.072, 0.68, 0.226] ]) # softmax over columns, and squash it! softmax(X, theta = 500.0, axis = 1) >>> array([ [0., 0., 0., 1.], [0., 0., 1., 0.], [0., 0., 1., 0.] ])
X = np.random.uniform(size = (3, 3, 2)) >>> X array([ [ [0.844, 0.237], [0.364, 0.768], [0.811, 0.959] ], [ [0.511, 0.06], [0.594, 0.029], [0.963, 0.292] ], [ [0.463, 0.869], [0.704, 0.786], [0.173, 0.89] ] ]) softmax(X, theta = 0.5, axis = 2) >>> array([ [ [0.575, 0.425], [0.45, 0.55], [0.482, 0.518] ], [ [0.556, 0.444], [0.57, 0.43], [0.583, 0.417] ], [ [0.449, 0.551], [0.49, 0.51], [0.411, 0.589] ] ])
Execute func1d(a, *args, **kwargs) where func1d operates on 1-D arrays and a is a 1-D slice of arr along axis.,This function should accept 1-D arrays. It is applied to 1-D slices of arr along the specified axis.,This is equivalent to (but faster than) the following use of ndindex and s_, which sets each of ii, jj, and kk to a tuple of indices:,The output array. The shape of out is identical to the shape of arr, except along the axis dimension. This axis is removed, and replaced with new dimensions equal to the shape of the return value of func1d. So if func1d returns a scalar out will have one fewer dimensions than arr.
Ni, Nk = a.shape[: axis], a.shape[axis + 1: ] for ii in ndindex(Ni): for kk in ndindex(Nk): f = func1d(arr[ii + s_[: , ] + kk]) Nj = f.shape for jj in ndindex(Nj): out[ii + jj + kk] = f[jj]
Ni, Nk = a.shape[: axis], a.shape[axis + 1: ] for ii in ndindex(Ni): for kk in ndindex(Nk): out[ii + s_[..., ] + kk] = func1d(arr[ii + s_[: , ] + kk])
>>> def my_func(a):
...""
"Average first and last element of a 1-D array"
""
...
return (a[0] + a[-1]) * 0.5 >>>
b = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
]) >>>
np.apply_along_axis(my_func, 0, b)
array([4., 5., 6.]) >>>
np.apply_along_axis(my_func, 1, b)
array([2., 5., 8.])
>>> b = np.array([ [8, 1, 7], [4, 3, 9], [5, 2, 6] ]) >>> np.apply_along_axis(sorted, 1, b) array([ [1, 7, 8], [3, 4, 9], [2, 5, 6] ])
>>> b = np.array([ [1, 2, 3], [4, 5, 6], [7, 8, 9] ]) >>> np.apply_along_axis(np.diag, -1, b) array([ [ [1, 0, 0], [0, 2, 0], [0, 0, 3] ], [ [4, 0, 0], [0, 5, 0], [0, 0, 6] ], [ [7, 0, 0], [0, 8, 0], [0, 0, 9] ] ])
It works for a batch of inputs with a 2D array where n rows = n samples and n columns = n nodes. It can be implemented with the following code.,The above-mentioned Python code implementations are only pitched and tested for a batch of inputs. This means that the expected input is a 2D array with rows. It presents different samples and columns defining different nodes.,If we have to implement a code for the above softmax function example in Python, here’s how it will be done.,These methods are quite fast with TensorFlow and PyTorch. However, there is another method that can be used to accelerate the implementation of softmax in Python. It is with the help of Cupy (CUDA) which is an open-source array library that is used for GPU-accelerated computing with Python.
Note: If we look for softmax in the output layer in the Keras deep learning library with three-class classification activity, it might look like the example given below.
model.add(Dense(3, activation = ’softmax’))
It works for a batch of inputs with a 2D array where n rows = n samples and n columns = n nodes. It can be implemented with the following code.
import numpy as np def Softmax(x): '' ' Performs the softmax activation on a given set of inputs Input: x(N, k) ndarray(N: no.of samples, k: no.of nodes) Returns: Note: Works for 2 D arrays only(rows for samples, columns for nodes / outputs) '' ' max_x = np.amax(x, 1).reshape(x.shape[0], 1) # Get the row - wise maximum e_x = np.exp(x - max_x) # For stability return e_x / e_x.sum(axis = 1, keepdims = True)
This is the simplest implementation of softmax in Python. Another way is the Jacobian technique. An example code is given below.
import numpy as np def Softmax_grad(x): # Best implementation(VERY FAST) '' 'Returns the Jacobian of the softmax function for the given set of inputs. Inputs: x: should be a 2 d array where the rows correspond to the samples and the columns correspond to the nodes. Returns: jacobian '' ' s = Softmax(x) a = np.eye(s.shape[-1]) temp1 = np.zeros((s.shape[0], s.shape[1], s.shape[1]), dtype = np.float32) temp2 = np.zeros((s.shape[0], s.shape[1], s.shape[1]), dtype = np.float32) temp1 = np.einsum('ij,jk->ijk', s, a) temp2 = np.einsum('ij,ik->ijk', s, s) return temp1 - temp2
Here’s the code for the softmax derivative (Jacobian) NUMBA implementation.
from numba import njit @njit(cache = True, fastmath = True) def Softmax_grad(x): # Best implementation(VERY FAST) '' 'Returns the Jacobian of the softmax function for the given set of inputs. Inputs: x: should be a 2 d array where the rows correspond to the samples and the columns correspond to the nodes. Returns: jacobian '' ' s = Softmax(x) a = np.eye(s.shape[-1]) temp1 = np.zeros((s.shape[0], s.shape[1], s.shape[1]), dtype = np.float32) temp2 = np.zeros((s.shape[0], s.shape[1], s.shape[1]), dtype = np.float32) # Einsum is unsupported with Numba(nopython mode) # temp1 = np.einsum('ij,jk->ijk', s, a) # temp2 = np.einsum('ij,ik->ijk', s, s) for i in range(s.shape[0]): for j in range(s.shape[1]): for k in range(s.shape[1]): temp1[i, j, k] = s[i, j] * a[j, k] temp2[i, j, k] = s[i, j] * s[i, k] return temp1 - temp2
Here’s what the Cupy implementation looks like:
import cupy as cp def Softmax_cupy(x): '' ' Performs the softmax activation on a given set of inputs Input: x(N, k) ndarray(N: no.of samples, k: no.of nodes) Returns: Note: Works for 2 D arrays only(rows for samples, columns for nodes / outputs) '' ' max_x = cp.amax(x, 1).reshape(x.shape[0], 1) e_x = cp.exp(x - max_x) # For stability as it is prone to overflow and underflow # return e_x / e_x.sum(axis = 1, keepdims = True) # Alternative 1 return e_x / e_x.sum(axis = 1).reshape((-1, 1)) # Alternative 2 def Softmax_grad_cupy(x): # Best implementation(VERY FAST) '' 'Returns the Jacobian of the softmax function for the given set of inputs. Inputs: x: should be a 2 d array where the rows correspond to the samples and the columns correspond to the nodes. Returns: jacobian '' ' s = Softmax_cupy(x) a = cp.eye(s.shape[-1]) temp1 = cp.zeros((s.shape[0], s.shape[1], s.shape[1]), dtype = cp.float32) temp2 = cp.zeros((s.shape[0], s.shape[1], s.shape[1]), dtype = cp.float32) temp1 = cp.einsum('ij,jk->ijk', s, a) temp2 = cp.einsum('ij,ik->ijk', s, s) return temp1 - temp2
Understand the concept of "broadcasting",the sigmoid function and its gradient,broadcasting is extremely useful,Exercise: Implement the sigmoid function using numpy.
# # # START CODE HERE # # #(≈1 line of code) test = "Hello World" # # # END CODE HERE # # #
print("test: " + test)
test: Hello World
# GRADED FUNCTION: basic_sigmoid import math def basic_sigmoid(x): "" " Compute sigmoid of x. Arguments: x--A scalar Return: s--sigmoid(x) "" " # # # START CODE HERE # # #(≈1 line of code) s = 1 / (1 + math.exp(-x)) # # # END CODE HERE # # # return s
basic_sigmoid(3)
0.9525741268224334