Issue
I have a multi-dimensional numpy array where one of the dimensions is age in years. I would like to convert this into 5 year increments.
So I have array with (10,2) where the first dimension represents age in years and the second dimension represents sex. I would like to calculate the mean for every 5 years for the two sexes individually.
import numpy as np
arr = np.array([[0,1], [2,3], [3,4], [4,5], [5,6], [7,8], [8,9], [9,10], [10,11], [11,12]])
arr.shape
mean_1st_5_yrs_female = np.mean([0,2,3,4,5])
mean_1st_5_yrs_male = np.mean([1,3,4,5,6])
mean_2nd_5_yrs_female = np.mean([7,8,9,10,11])
mean_2nd_5_yrs_male = np.mean([8,9,10,11,12])
arr = np.array([[mean_1st_5_yrs_female, mean_1st_5_yrs_male],[mean_2nd_5_yrs_female, mean_2nd_5_yrs_male]])
arr
How would I do this automatically in numpy?
Thank you.
Solution
Assuming your first (outer) dimension is a proper multiple of 5, you can do the following:
arr.reshape(-1, 5, 2).mean(axis=1)
array([[ 2.8, 3.8],
[ 9. , 10. ]])
The -1
fills in the remaining dimension, by basically calculating 10 // 5
in this specific example. The reshaped array then has the proper dimensions: one for the period (5 years), one for the sexes (2), and one dimension for the number of periods (2 in this example; the -1 in above). Then it's just a matter of averaging across the correct axis: since that is averages over a single period, separately for the two sexes, which is the second index.
A note in case you're not used to reshaping: you can't reshape across original dimensions, i.e., a shape of (-1, 2, 5)
wouldn't work (well, technically, it would work, but you'll find that if you then, for example, average along any of the axes, you won't get what you want, because anything from the original datastructure is now lost).
Answered By - 9769953
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.