Issue
I am reviving this github issue because I believe it is valid and needs to be explained. tf.keras has a masking layer with docs that reads
For each timestep in the input tensor (dimension #1 in the tensor), if all values in the input tensor at that timestep are equal to mask_value, then the timestep will be masked (skipped) in all downstream layers (as long as they support masking).
If any downstream layer does not support masking yet receives such an input mask, an exception will be raised.
# create padded zeros and change two valid entries.
inputs = np.zeros([1,5])
inputs[0,1] = 0.5
inputs[0,2] = 0.1
inputs = tf.Variable(inputs)
masked_inputs = tf.keras.layers.Masking(mask_value=0.0)(inputs)
with_masking = tf.keras.layers.Softmax()(masked_inputs)
without_masking = tf.keras.layers.Softmax()(inputs)
The two results are virtually identical
with_masking
<tf.Tensor: shape=(1, 5), dtype=float32, numpy=
array([[0.1737954 , 0.28654018, 0.19207363, 0.1737954 , 0.1737954 ]],
dtype=float32)>
without_masking
<tf.Tensor: shape=(1, 5), dtype=float64, numpy=array([[0.1737954 , 0.28654017, 0.19207362, 0.1737954 , 0.1737954 ]])>
Expected behavior
I expected to just take softmax of the valid entries, similiar to
#Assign one large value
inputs = np.zeros([1,2])
inputs[0,0] = 0.5
inputs[0,1] = 0.1
inputs = tf.Variable(inputs)
without_masking = tf.keras.layers.Softmax()(inputs)
without_masking
<tf.Tensor: shape=(1, 2), dtype=float64, numpy=array([[0.59868766, 0.40131234]])>
padded at the correct positions
with_masking
<tf.Tensor: shape=(1, 5), dtype=float32, numpy=
array([[0 , 0.59868766, 0.40131234, 0, 0 ]],
dtype=float32)>
To ignore 0's in a softmax function, we could switch out massively negative numbers?
Related: tensorflow - softmax ignore negative labels (just like caffe)
from tensorflow import __version__
__version__
'2.3.1'
Solution
I think this is already explained well in the Github issue you have linked. Underlying problem is that irrespective of whether an array is masked or not, softmax()
still operates on 0.0
values and returns a non-zero
value as mathematically expected (link).
The only way to get a zero output from a softmax()
is to pass a very small float value. If you set the masked values to the minimum possible machine limit for float64
, Softmax()
of this value will be zero.
To get machine limit on float64 you need tf.float64.min
which is equal to -1.7976931348623157e+308
. More info about machine limits on this post.
Here is an implementation for your reference on tf.boolean_mask
only, and the correct method of using tf.where
for creating the mask and passing it to softmax()
-
import tensorflow as tf
inputs = np.zeros([1,5])
inputs[0,1] = 0.5
inputs[0,2] = 0.1
inputs = tf.Variable(inputs)
#Returns only the elements that are not masked (2,)
with_boolmask = tf.boolean_mask(inputs, inputs!=0)
with_boolmask = tf.keras.layers.Softmax()(with_boolmask)
#Correct way to do it!
masked_inp = tf.where(inputs!=0, inputs, tf.float64.min) #<----
with_where = tf.keras.layers.Softmax()(masked_inp)
print('BOOLEAN MASK (NOT EXPECTED)')
print(with_boolmask)
print('')
print('MASKED INPUT - ')
print(masked_inp)
print('')
print('SOFTMAX OUTPUT')
print(with_where)
BOOLEAN MASK (NOT EXPECTED)
tf.Tensor([0.59868765 0.40131232], shape=(2,), dtype=float32)
MASKED INPUT -
tf.Tensor(
[[-1.79769313e+308 5.00000000e-001 1.00000000e-001 -1.79769313e+308
-1.79769313e+308]], shape=(1, 5), dtype=float64)
SOFTMAX OUTPUT
tf.Tensor([[0. 0.59868765 0.40131232 0. 0. ]], shape=(1, 5), dtype=float32)
Answered By - Akshay Sehgal
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.