Issue
I have a numpy array data
of shape - [1000, 20, 20, 5, 10]
which has both postive and negative values. I am trying to scale the data using minmaxscalar but the values are different after inverse_transform.
code-
reshaped_data = data.reshape((1000, 20*20*5*10))
scaler = preprocessing.MinMaxScaler()
encodes = scaler.fit_transform(reshaped_data)
encodes = encodes.reshape((1000, 20, 20, 5, 10))
decodes = scaler.inverse_transform(encodes.reshape((1000, 20*20*5*10)))
decodes = decodes.reshape((1000, 20, 20, 5, 10))
np.array_equal(data, decodes) #False
How to solve this?
Solution
Because of numerical rounding in the any or all of MinMaxScaler()
, fit_transform()
, and inverse_transform()
the values you will get back in decodes
will not be precisely identical to arbitrary precision to the original data. The rounding, however takes place in a very high significant digit, so rather than comparing the exact numerical values, you can use the np.where() function to see if there is deviation to within your allowed tolerance.
For example, if you are ok with the decoded values agreeing to within 1e-5
, you can perform what you need as:
tolerance = 1.0e-5
len(np.where(np.abs(data - decodes) > tolerance)[0]) == 0
The above line will be false when any element of data
disagrees with decodes
to within tolerance
. This should replace the line:
np.array_equal(data, decodes)
EDIT: for future reference, the best numpy-idiomatic way of achieving this is to use the np.allclose
function, like this:
rel_tolerance = 1.0e-5
np.allclose(data, decodes, rtol=rel_tolerance)
which evaluates to true if every element in data
and decodes
agrees to within the relative tolerance.
Answered By - TC Arlen
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.