Issue
I've seen many questions and answers on how to convert column data in seconds into minutes, using pd.to_timedelta. Unfortunately, this works for me only when the data is entered manually, but not if it comes from a csv file.
This is the contents of the csv file:
"Time"
"s"
"0.193"
"0.697"
"1.074"
"1.579"
"6.083"
"65.460"
"120.730"
"121.116"
"121.624"
I attempt to convert the seconds column to minutes via:
>> df = pd.read_csv("sec.csv", header = [0,1])
>> print (df.dtypes)
Time s float64
dtype: object
>> df['Time'] = df['Time'].astype('float64')
>> print (df.dtypes)
Time s float64
dtype: object
>>> df['Time'] = pd.to_timedelta(df['Time'], 'min')
TypeError: arg must be a string, timedelta, list, tuple, 1-d array, or Series
However, I don't get an error if I enter the data manually without the csv:
>>> csvdata = {'Time' : [ 0.193, 0.697, 1.074, 1.579, 6.083 , 65.460 , 120.730 , 121.116, 121.624 ]}
>>> df = pd.DataFrame(data=csvdata)
>>> print (df.dtypes)
Time float64
dtype: object
>>> df['Time'] = pd.to_timedelta(df['Time'], 'min')
I understand the csv is text, but with or without 'astype' the data type shows float64, so not sure the difference why the csv would have an error converting seconds to minutes.
Solution
Some functions work with many columns (DataFrame
) (ie. astype
) but to_timedelta
needs single column (Series
)
In CSV you have multi-header Time,s
and you have to use df['Time','s']
or df[('Time','s')]
to get column (Series
)
(sometimes works also df['Time']['s']
- when you want to get column but not when you try to assign new value to column)
df[('Time','s')] = pd.to_timedelta(df[('Time','s')], 'min')
Full working code.
I use io.StringIO
to simulate file in memory.
text = '''"Time"
"s"
"0.193"
"0.697"
"1.074"
"1.579"
"6.083"
"65.460"
"120.730"
"121.116"
"121.624"'''
import pandas as pd
import io
df = pd.read_csv(io.StringIO(text), header = [0,1])
print("['Time'] :", type(df['Time']))
print("['Time','s']:", type(df['Time','s']))
print('\n--- before ---\n')
print(df)
#df[('Time','s')] = pd.to_timedelta(df[('Time','s')], 'min')
df['Time','s'] = pd.to_timedelta(df['Time','s'], 'min')
print('\n--- after ---\n')
print(df)
Result:
['Time'] : <class 'pandas.core.frame.DataFrame'>
['Time','s']: <class 'pandas.core.series.Series'>
--- before ---
Time
s
0 0.193
1 0.697
2 1.074
3 1.579
4 6.083
5 65.460
6 120.730
7 121.116
8 121.624
--- after ---
Time
s
0 0 days 00:00:11.580000
1 0 days 00:00:41.820000
2 0 days 00:01:04.440000
3 0 days 00:01:34.740000
4 0 days 00:06:04.980000
5 0 days 01:05:27.600000
6 0 days 02:00:43.800000
7 0 days 02:01:06.960000
8 0 days 02:01:37.440000
Answered By - furas
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.