Issue
I have been self-learning numpy, and according to the numpy manual, the numpy.sum will sum all the elements of an array or array-like. However, I have noticed if these arrays are in different lengths, numpy.sum would rather combine them than sum them.
For example:
array_a = [1,2,3,4,5,6] # Same length
array_b = [4,5,6,7,8,9]
np.sum([array_a, array_b])
60
array_a = [1,2,3,4,5] # Different length
array_b = [4,5,6,7,8,9]
np.sum([array_a, array_b])
[1, 2, 3, 4, 5, 4, 5, 6, 7, 8, 9]
Why in the latter, numpy.sum did not sum up all the elements as it is supposed to do?
Solution
In [128]: array_a = [1,2,3,4,5,6] # Same length
...: array_b = [4,5,6,7,8,9]
Here you give sum
a list:
In [129]: np.sum([array_a, array_b])
Out[129]: 60
What it does first is make array:
In [130]: np.array([array_a, array_b])
Out[130]:
array([[1, 2, 3, 4, 5, 6],
[4, 5, 6, 7, 8, 9]])
60 is the sum of all elements. You can also give sum
an axis number:
In [131]: np.sum([array_a, array_b],axis=0)
Out[131]: array([ 5, 7, 9, 11, 13, 15])
In [132]: np.sum([array_a, array_b],axis=1)
Out[132]: array([21, 39])
That's the normal, documented behavior.
ragged
In [133]: array_a = [1,2,3,4,5] # Different length
...: array_b = [4,5,6,7,8,9]
In [135]: x = np.array([array_a, array_b])
<ipython-input-135-5379fc40e73f>:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
x = np.array([array_a, array_b])
In [136]: x.shape
Out[136]: (2,)
In [137]: x.dtype
Out[137]: dtype('O')
In [138]: np.sum(x)
Out[138]: [1, 2, 3, 4, 5, 4, 5, 6, 7, 8, 9]
That is summing the lists - same as if we do:
In [139]: array_a + array_b
Out[139]: [1, 2, 3, 4, 5, 4, 5, 6, 7, 8, 9]
Despite the name, array_a
is NOT an array.
With object dtype, numpy
tries to apply the operator (here add) to the elements. Add for a list is concatenate.
If instead we make a ragged array from arrays:
In [140]: y = np.array([np.array(array_a), np.array(array_b)])
...
In [142]: y
Out[142]: array([array([1, 2, 3, 4, 5]), array([4, 5, 6, 7, 8, 9])], dtype=object)
In [143]: np.sum(y)
Traceback ...
ValueError: operands could not be broadcast together with shapes (5,) (6,)
It's trying to do
In [144]: np.array(array_a) + np.array(array_b)
When learning numpy
it's a good idea to focus on the numeric multidimensional arrays, and leave these ragged
object dtype arrays to later. There are nuances that aren't obvious from the "normal" array operations. Ragged arrays are very much like lists, and often are the result of user errors. Intentionally making ragged arrays is usually not a useful approach.
Answered By - hpaulj
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.