Issue
I have a GeoPandas dataframe with a Point geometry. The dataframe was created from a Pandas dataframe that contained separate columns for easting and northings using the following code:
import numpy as np
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point
df = pd.DataFrame({'id':[1,2,3,4,5],
'easting':[545200.0,np.nan,360800.0,270500.0,np.nan],
'northing':[130600.0,np.nan,510100.0,80350.0,np.nan]})
geodf = gpd.GeoDataFrame(df.drop(['easting','northing'],axis = 1),
crs = {'init': 'eps:277000'},
geometry = [Point(xy) for xy in zip(df['easting'],
df['northing'])])
phjCPHEastNrthGDF = gpd.GeoDataFrame(phjCPHEastNrthDF.drop(['easting','northing'],
axis = 1),
crs = {'init': 'epsg:27700'},
geometry = [Point(xy) for xy in zip(phjCPHEastNrthDF['easting'],
phjCPHEastNrthDF['northing'])])
The Pandas dataframe and the GeoPandas dataframe are printed as follows:
Original dataframe:
id easting northing
0 1 545200.0 130600.0
1 2 NaN NaN
2 3 360800.0 510100.0
3 4 270500.0 80350.0
4 5 NaN NaN
Geopandas dataframe:
id geometry
0 1 POINT (545200 130600)
1 2 POINT (nan nan)
2 3 POINT (360800 510100)
3 4 POINT (270500 80350)
4 5 POINT (nan nan)
In reality, the dataframe contains 250k+ points. I would like to be able to identify all the points which were created from easting and northing values of np.nan (but the original 'easting' and 'northing' columns are no longer available).
I've tried using .isna()
and .is_empty
but with no success:
print(geodf.loc[(geodf['geometry'].isna()),:])
print(geodf.loc[(geodf['geometry'].is_empty),:])
...both print empty dataframes.
Is there any way to identify the required geometry points?
Solution
geodf[geodf['geometry'].is_valid]
Result:
id geometry
0 1 POINT (545200.000 130600.000)
2 3 POINT (360800.000 510100.000)
3 4 POINT (270500.000 80350.000)
to get the Nan
-points use geodf[~geodf['geometry'].is_valid]
Update:
The above is valid for shapely versions up to 1.7.2. Starting from version 1.8 a point created from np.nan
values is considered empty and hence is_empty
works as expected:
id geometry
0 1 POINT (545200.000 130600.000)
1 2 POINT EMPTY
2 3 POINT (360800.000 510100.000)
3 4 POINT (270500.000 80350.000)
4 5 POINT EMPTY
Result of geodf[~geodf['geometry'].is_empty]
:
id geometry
0 1 POINT (545200.000 130600.000)
2 3 POINT (360800.000 510100.000)
3 4 POINT (270500.000 80350.000)
Answered By - Stef
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.