Tuesday, August 30, 2022

[FIXED] Get mean along axis but with different subset of that axis in each cell

August 30, 2022 multidimensional-array, numpy, numpy-ndarray, numpy-slicing, python No comments

Issue

I need the mean along the time axis of array (1), using numpy.

The catch: it's not going to be the mean of all values along this axis, but rather a subset that starts at an index that is given in array (2).

The arrays I'm working with:

 (array1) 3 axes: time, x, y
array([[[ 820,  820,  720,  720],
        [ 860,  860,  500,  500],
        [ 860,  860,  500,  500],
        [ 860,  860,  500,  500]],
       [[5980, 5980, 4760, 4760],
        [7500, 7500, 7940, 7940],
        [7500, 7500, 7940, 7940],
        [7500, 7500, 7940, 7940]],
       [[ 740,  740,  440,  440],
        [1240, 1240, 1140, 1140],
        [1240, 1240, 1140, 1140],
        [1240, 1240, 1140, 1140]],
       [[3200, 3200, 7600, 7600],
        [ 900,  900,  400,  400],
        [ 900,  900,  400,  400],
        [ 900,  900,  400,  400]]])
 (array2) 2 axes: x, y 
array([[  1,   2,   1,   1],
       [  1,   0,   3,   3],
       [  4,   0,   2,   2],
       [  4,   0,   1,   2]])

To illustrate the example further:

Values in array1 represent rainfall per day at locations x/y. Values in array2 represent from which day on the mean needs to be calculated for location x/y.

Looking at the first cell, we would exclude the first day from the calculation, as array2[0,0] = 1. So our result would be np.mean(array1[1:, 0, 0]) = 3306.67.

What I can't wrap my head around is how to specify the subset for each cell based on array 2. I know I can use np.mean along any axis, but how can I dynamically exclude values (slice the array) from the calculation?

Solution

arr1 = np.array(
    [[[ 820,  820,  720,  720],
      [ 860,  860,  500,  500],
      [ 860,  860,  500,  500],
      [ 860,  860,  500,  500]],
     
     [[5980, 5980, 4760, 4760],
      [7500, 7500, 7940, 7940],
      [7500, 7500, 7940, 7940],
      [7500, 7500, 7940, 7940]],
     
     [[ 740,  740,  440,  440],
      [1240, 1240, 1140, 1140],
      [1240, 1240, 1140, 1140],
      [1240, 1240, 1140, 1140]],
     
     [[3200, 3200, 7600, 7600],
      [ 900,  900,  400,  400],
      [ 900,  900,  400,  400],
      [ 900,  900,  400,  400]]]
)

arr2 = np.array(
    [[  1,   2,   1,   1],
     [  1,   0,   3,   3],
     [  3,   0,   2,   2],
     [  3,   0,   1,   2]]
)

what we're trying to do is slice the time axis of arr1 using the indices stored in arr2, now python only allows slicing using : which we can only pass while indexing literally, ie not using another array for indexing. so we need a round about way of doing it

one way could be to change all the values in arr1 ,that would've been ignored if sliced, to 0

now to find the indices of the values to be ignored we do this

no_days = arr1.shape[0]
arr3 = np.arange(no_days)
arr3.shape = [-1,1,1]
arr3

>>> [[[0]],

     [[1]],

     [[2]],

     [[3]]]

filter = arr3 < arr2
filter.shape

>>> (4, 4, 4)

arr3 is an array of indices of time axis. we compared it with arr2 and now we have boolean indices of values to be ignored in filter and we can set them to 0

arr1[filter] = 0
arr1

>>>   [[[   0,    0,    0,    0],
        [   0,  860,    0,    0],
        [   0,  860,    0,    0],
        [   0,  860,    0,    0]],

       [[5980,    0, 4760, 4760],
        [7500, 7500,    0,    0],
        [   0, 7500,    0,    0],
        [   0, 7500, 7940,    0]],

       [[ 740,  740,  440,  440],
        [1240, 1240,    0,    0],
        [   0, 1240, 1140, 1140],
        [   0, 1240, 1140, 1140]],

       [[3200, 3200, 7600, 7600],
        [ 900,  900,  400,  400],
        [ 900,  900,  400,  400],
        [ 900,  900,  400,  400]]]

we might be tempted to use arr1.mean(axis= 0) but in doing so also considers all the 0s legitimate entries which effects the mean, instead of ignoring them

so instead we sum arr1 over time axis and devide it by no of elements that would've been in the slices

arr1.sum(axis= 0) / (no_days - arr2)

>>>   [[3306.66666667, 1970.        , 4266.66666667, 4266.66666667],
       [3213.33333333, 2625.        ,  400.        ,  400.        ],
       [ 900.        , 2625.        ,  770.        ,  770.        ],
       [ 900.        , 2625.        , 3160.        ,  770.        ]]

if t < x*y then the following would perform faster

arr1.sum(axis= 0) / (~filter).astype(int).sum(axis= 0)

Answered By - Hammad

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, August 30, 2022

[FIXED] Get mean along axis but with different subset of that axis in each cell

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels