Issue
I have a column of string values in my pandas dataframe called 'travel_time' with values like:
1 hour 10 mins
34 mins
58 mins
1 hour 32 mins
12 mins
I would like to make a new column that converts these strings to minutes (integers) so that I can do calculations (average, min, max, binning, etc.) So for example '1 hour 10 mins' becomes 70, '34 mins' becomes 34, '58 mins' becomes 58, '1 hour 32 mins' becomes 92, '12 mins' becomes 12
I know there are functions in python that will allow me to remove non-numerical values from the strings but I'm not sure how to handle cases where travel_time is greater than 60 minutes. Any advice on how I could do this?
Solution
You can use df.applymap
to apply custom functions on your dataframe.
import pandas as pd
df = pd.DataFrame(['1 hour 10 mins', '34 mins', '58 mins', '1 hour 32 mins', '12 mins'])
timemap = {'mins': 1, 'hour': 60} # Express time units in minutes. Add as needed.
def transform(s):
n = 0
count = {}
# Split string by space and parse tokens.
for tok in s.split():
if tok in timemap: # Token is a time unit.
count[tok] = n
else:
try: # Token is an integer?
n = int(tok)
except ValueError: # Nope, not an integer. :(
raise RuntimeError(f'unknown token: {tok}')
# Add total.
return sum(timemap[t] * val for t, val in count.items())
print(df.applymap(transform))
Output:
0
0 70
1 34
2 58
3 92
4 12
If you want to apply the function to a specific column, then use df['the_column'].apply(transform)
.
Answered By - TrebledJ
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.