Issue
I am working through my first data project have a dataframe showing street intersections that looks like this:
from_station_name count
86 Canal St & Adams St 50575
152 Clinton St & Madison St 5990
157 Clinton St & Washington Blvd 45378
159 Columbus Dr & Randolph St 31370
252 Franklin St & Monroe St 30832
321 Kingsbury St & Kinzie St 30654
175 Daley Center Plaza 30423
89 Canal St & Madison St 27138
410 Michigan Ave & Washington St 25468
330 LaSalle St & Jackson Blvd 23021
I'm trying to find the latitude and longitude coordinates of the intersections and have that information placed in two additional columns of this dataframe.
I tried to write a loop using the Google Maps API like this:
def geocode_intersection(api_key, intersection):
base_url = "https://maps.googleapis.com/maps/api/geocode/json?"
params = {"address": intersection, "key": api_key}
response = requests.get(base_url, params=params)
if response.status_code == 200:
data = json.loads(response.text)
if data["status"] == "OK":
location = data["results"][0]["geometry"]["location"]
return location["lat"], location["lng"]
else:
return None
else:
return None
api_key = "MY_API_KEY"
# Initialize new columns
busiest_stations_subs["latitude"] = np.nan
busiest_stations_subs["longitude"] = np.nan
# Loop through the DataFrame
for index, row in busiest_stations_subs.iterrows():
intersection = row["from_station_name"]
result = geocode_intersection(api_key, intersection)
if result is not None:
latitude, longitude = result
busiest_stations_subs.at[index, "latitude"] = latitude
busiest_stations_subs.at[index, "longitude"] = longitude
And got many intersections correctly, but many are NaN like so:
580 Wabash Ave & Roosevelt Rd 15914 41.861202 -87.943284
544 State St & Kinzie St 15764 NaN NaN
191 Damen Ave & Pierce Ave 15546 41.909363 -87.677420
339 Lake Shore Dr & North Blvd 15520 28.107321 -82.729723
235 Fairbanks Ct & Grand Ave 15477 64.840051 -147.719976
424 Morgan St & Lake St 15307 NaN NaN
499 Sheffield Ave & Fullerton Ave 15129 41.925373 -87.653611
582 Wacker Dr & Washington St 15010 NaN NaN
619 Wilton Ave & Belmont Ave 14990 41.939913 -87.652890
122 Clark St & Armitage Ave 14881 41.918477 -87.636128
559 Streeter Dr & Grand Ave 14879 NaN NaN
329 LaSalle St & Illinois St 14412 41.890819 -87.632670
197 Dearborn Pkwy & Delaware Pl 14144 41.898748 -87.629835
What I learned was that the intersections with NaN have street names that are missing the N, E, W, S directional indicators of the actual street. I'm able to look on Google Maps and see the the missing directional indicators, however there are hundreds of thousands of rows in my data. Is there a way to automate looking at all the NaN and finding what the actual street names are and replacing the names in the dataframe? One issue was that a street like "Lake St" in Chicago has two versions, a "W Lake St" and an "E Lake St". I'm not exactly sure how to get around this problem. Any help or suggestions on how to approach this solution would be appreciated.
Solution
The addresses are ambiguous that the Geocoder could not return any results
The Geocoding Addresses Best Practices documentation says that:
In general, use the Geocoding API when geocoding complete addresses (for example, “48 Pirrama Rd, Pyrmont, NSW, Australia”). Use the Places API Place Autocomplete service when geocoding ambiguous (incomplete)
In your case, these addresses are incomplete and therefore the Geocoding API is working as intended. I tried geocoding the addresses that returned NaN
and confirmed that it is indeed returning ZERO_RESULTS
.
You could try either ways:
Improve search quality by querying non-ambiguous addresses
- If you have no automated way of doing this, then this would be a rigorous task (as you have mentioned in your question).
- For example, changing the address
State St & Kinzie St
by addingChicago
at the end (State St & Kinzie St Chicago
) will return a result:41.889261,-87.627984
instead ofZERO_RESULTS
.
Use Places API Place Autocomplete as mentioned by the documentation
- With this one, the cost of your requests can increase given that you are running a for loop and autocomplete could cost you more than geocoding (depending on how you implement it).
- But with this one, it is possible for the Autocomplete to return multiple results and you still need a checker if it is the correct one or not.
As it is still considered as best practice to geocode only unambiguous addresses, what I could only think of as a solution for now is to just clean-up your data.
With that said, I still hope this information helps!
Answered By - Yrll
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.