Issue
I am working on a project where my dataset looks like bellow:
Origin | Destination | Num_Trips |
---|---|---|
Hamburg | Frankfurt | 2 |
Hamburg | Cologne | 1 |
Cologne | Hamburg | 3 |
Frankfurt | Hamburg | 5 |
I am interested only on one way either "Hamburg - Frankfurt" or "Frankfurt - Hamburg" and add them as number of trips made between this two locations. How can i do this in pandas so that i have one of them in my dataset with the total number of trips made between the two points either side?
Final Table:
Origin | Destination | Num_Trips |
---|---|---|
Hamburg | Frankfurt | 7 |
Hamburg | Cologne | 4 |
Thanks :)
Solution
Here's a simple solution to your problem -
data = {
"Origin": ["Hamburg", "Hamburg", "Cologne", "Frankfurt"],
"Destination": ["Frankfurt", "Cologne", "Hamburg", "Hamburg"],
"Num_Trips": [2, 1, 3, 5]
}
df = pd.DataFrame(data)
df["Key"] = df[["Origin", "Destination"]].apply(lambda x: "|".join(set(x)), axis=1)
# Origin Destination Num_Trips Key
# Hamburg Frankfurt 2 Frankfurt|Hamburg
# Hamburg Cologne 1 Cologne|Hamburg
# Cologne Hamburg 3 Cologne|Hamburg
# Frankfurt Hamburg 5 Frankfurt|Hamburg
df.groupby("Key").agg({"Origin": "first",
"Destination": "first",
"Num_Trips": sum}).reset_index(drop=True)
# Origin Destination Num_Trips
# 0 Hamburg Cologne 4
# 1 Hamburg Frankfurt 7
Answered By - Prashant
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.