Issue
I have a string column that sometimes has carriage returns in the string:
import pandas as pd
from io import StringIO
datastring = StringIO("""\
country metric 2011 2012
USA GDP 7 4
USA Pop. 2 3
GB GDP 8 7
""")
df = pd.read_table(datastring, sep='\s\s+')
df.metric = df.metric + '\r' # append carriage return
print(df)
country metric 2011 2012
0 USA GDP\r 7 4
1 USA Pop.\r 2 3
2 GB GDP\r 8 7
When writing to and reading from csv, the dataframe gets corrupted:
df.to_csv('data.csv', index=None)
print(pd.read_csv('data.csv'))
country metric 2011 2012
0 USA GDP NaN NaN
1 NaN 7 4 NaN
2 USA Pop. NaN NaN
3 NaN 2 3 NaN
4 GB GDP NaN NaN
5 NaN 8 7 NaN
Question
What's the best way to fix this? The one obvious method is to just clean the data first:
df.metric = df.metric.str.replace('\r', '')
Solution
Specify the line_terminator
:
print(pd.read_csv('data.csv', line_terminator='\n'))
country metric 2011 2012
0 USA GDP\r 7 4
1 USA Pop.\r 2 3
2 GB GDP\r 8 7
UPDATE:
In more recent versions of pandas (the original answer is from 2015) the name of the argument changed to lineterminator
.
Answered By - Mike Müller
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.