Issue
I have following dataframe -df
:
crs Band1 level
lat lon
34.595694 32.929028 b'' 4.000000e+00 1000
32.937361 b'' 1.200000e+01 950
32.945694 b'' 2.900000e+01 925
34.604028 32.929028 b'' 7.000000e+00 1000
32.937361 b'' 1.300000e+01 950
... ... ...
71.179028 25.679028 b'' 6.000000e+01 750
71.187361 25.662361 b'' 1.000000e+00 725
25.670694 b'' 6.000000e+01 1000
25.679028 b'' 4.000000e+01 800
71.529028 19.387361 b'' 1.843913e-38 1000
[17671817 rows x 3 columns]
and two arrays:
lon1=np.arange(-11,47,0.25)
lat1=np.arange(71.5,34.5,-0.25)
These two arrays (lat1
, lon1
) produce coordinate pairs spaced with 0.25 deg.
Dataframe df
contains points (lat
, lon
) which are densely spaced within points defined with lon1
and lat1
arrays. What I want to do is:
- find(filter) all points from
df
within 0.125 deg from points defined withlat1
,lon1
- get
max
andmin
value oflevel
from this subdataframe and store them in separate array same size aslon1
andlat1
.
What I did so far is filter dataframe:
for x1 in lon1:
for y1 in lat1:
df3=df[(df.index.get_level_values('lon')>x1-0.125) & (df.index.get_level_values('lon')<x1+0.125)]
df3=df3[(df3.index.get_level_values('lat')>y1-0.125) & (df3.index.get_level_values('lat')<y1+0.125)]
But this has very slow performance. I believe there is a faster one. I have tagged scikit-learn also since probably can be done with it, but I lack experience with this packate. Any help is appreceated.
Solution
Before we start, let's convert your bins to be the start of each bin instead of the center:
lon1=np.arange(-11.125,47.125,0.25)
lat1=np.arange(71.625,34.125,-0.25)
Assign latitude and longitude bins for every row (note reversed order of lat1
, otherwise you need to pass ordered=False
to pd.cut()
).
df['latcat'] = pd.cut(df.index.get_level_values(0), lat1[::-1])
df['loncat'] = pd.cut(df.index.get_level_values(1), lon1)
For your example data we now have:
crs Band1 level latcat loncat
lat lon
34.595694 32.929028 b'' 4.000000e+00 1000 (34.375, 34.625] (32.875, 33.125]
32.937361 b'' 1.200000e+01 950 (34.375, 34.625] (32.875, 33.125]
32.945694 b'' 2.900000e+01 925 (34.375, 34.625] (32.875, 33.125]
34.604028 32.929028 b'' 7.000000e+00 1000 (34.375, 34.625] (32.875, 33.125]
32.937361 b'' 1.300000e+01 950 (34.375, 34.625] (32.875, 33.125]
71.179028 25.679028 b'' 6.000000e+01 750 (71.125, 71.375] (25.625, 25.875]
71.187361 25.662361 b'' 1.000000e+00 725 (71.125, 71.375] (25.625, 25.875]
25.670694 b'' 6.000000e+01 1000 (71.125, 71.375] (25.625, 25.875]
25.679028 b'' 4.000000e+01 800 (71.125, 71.375] (25.625, 25.875]
71.529028 19.387361 b'' 1.843913e-38 1000 (71.375, 71.625] (19.375, 19.625]
Now use groupby to get the min and max level in each region:
res = df.groupby([df.latcat.cat.codes, df.loncat.cat.codes])['level'].agg(['min', 'max'])
Which gives you:
min max
0 176 925 1000
147 147 725 1000
148 122 1000 1000
The first level of the index is the position in the reversed lat1
array, with -1 meaning "out of range" which some of your example data are. The second level is the position in the lon1
array.
To convert to matrices as requested:
minlevel = np.full((len(lat1), len(lon1)), np.nan)
maxlevel = np.full((len(lat1), len(lon1)), np.nan)
x = len(lat1) - res.index.get_level_values(0) - 1 # reverse to original order
y = res.index.get_level_values(1)
minlevel[x, y] = res['min']
maxlevel[x, y] = res['max']
Answered By - John Zwinck
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.