Issue
I need the mean along the time axis of array (1), using numpy.
The catch: it's not going to be the mean of all values along this axis, but rather a subset that starts at an index that is given in array (2).
The arrays I'm working with:
(array1) 3 axes: time, x, y
array([[[ 820, 820, 720, 720],
[ 860, 860, 500, 500],
[ 860, 860, 500, 500],
[ 860, 860, 500, 500]],
[[5980, 5980, 4760, 4760],
[7500, 7500, 7940, 7940],
[7500, 7500, 7940, 7940],
[7500, 7500, 7940, 7940]],
[[ 740, 740, 440, 440],
[1240, 1240, 1140, 1140],
[1240, 1240, 1140, 1140],
[1240, 1240, 1140, 1140]],
[[3200, 3200, 7600, 7600],
[ 900, 900, 400, 400],
[ 900, 900, 400, 400],
[ 900, 900, 400, 400]]])
(array2) 2 axes: x, y
array([[ 1, 2, 1, 1],
[ 1, 0, 3, 3],
[ 4, 0, 2, 2],
[ 4, 0, 1, 2]])
To illustrate the example further:
Values in array1 represent rainfall per day at locations x/y. Values in array2 represent from which day on the mean needs to be calculated for location x/y.
Looking at the first cell, we would exclude the first day from the calculation, as array2[0,0] = 1. So our result would be np.mean(array1[1:, 0, 0]) = 3306.67.
What I can't wrap my head around is how to specify the subset for each cell based on array 2. I know I can use np.mean along any axis, but how can I dynamically exclude values (slice the array) from the calculation?
Solution
arr1 = np.array(
[[[ 820, 820, 720, 720],
[ 860, 860, 500, 500],
[ 860, 860, 500, 500],
[ 860, 860, 500, 500]],
[[5980, 5980, 4760, 4760],
[7500, 7500, 7940, 7940],
[7500, 7500, 7940, 7940],
[7500, 7500, 7940, 7940]],
[[ 740, 740, 440, 440],
[1240, 1240, 1140, 1140],
[1240, 1240, 1140, 1140],
[1240, 1240, 1140, 1140]],
[[3200, 3200, 7600, 7600],
[ 900, 900, 400, 400],
[ 900, 900, 400, 400],
[ 900, 900, 400, 400]]]
)
arr2 = np.array(
[[ 1, 2, 1, 1],
[ 1, 0, 3, 3],
[ 3, 0, 2, 2],
[ 3, 0, 1, 2]]
)
what we're trying to do is slice the time axis of arr1
using the indices stored in arr2
, now python only allows slicing using :
which we can only pass while indexing literally, ie not using another array for indexing. so we need a round about way of doing it
one way could be to change all the values in arr1
,that would've been ignored if sliced, to 0
now to find the indices of the values to be ignored we do this
no_days = arr1.shape[0]
arr3 = np.arange(no_days)
arr3.shape = [-1,1,1]
arr3
>>> [[[0]],
[[1]],
[[2]],
[[3]]]
filter = arr3 < arr2
filter.shape
>>> (4, 4, 4)
arr3
is an array of indices of time axis. we compared it with arr2
and now we have boolean indices of values to be ignored in filter
and we can set them to 0
arr1[filter] = 0
arr1
>>> [[[ 0, 0, 0, 0],
[ 0, 860, 0, 0],
[ 0, 860, 0, 0],
[ 0, 860, 0, 0]],
[[5980, 0, 4760, 4760],
[7500, 7500, 0, 0],
[ 0, 7500, 0, 0],
[ 0, 7500, 7940, 0]],
[[ 740, 740, 440, 440],
[1240, 1240, 0, 0],
[ 0, 1240, 1140, 1140],
[ 0, 1240, 1140, 1140]],
[[3200, 3200, 7600, 7600],
[ 900, 900, 400, 400],
[ 900, 900, 400, 400],
[ 900, 900, 400, 400]]]
we might be tempted to use arr1.mean(axis= 0)
but in doing so also considers all the 0
s legitimate entries which effects the mean, instead of ignoring them
so instead we sum arr1
over time axis and devide it by no of elements that would've been in the slices
arr1.sum(axis= 0) / (no_days - arr2)
>>> [[3306.66666667, 1970. , 4266.66666667, 4266.66666667],
[3213.33333333, 2625. , 400. , 400. ],
[ 900. , 2625. , 770. , 770. ],
[ 900. , 2625. , 3160. , 770. ]]
if t < x*y
then the following would perform faster
arr1.sum(axis= 0) / (~filter).astype(int).sum(axis= 0)
Answered By - Hammad
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.