Issue
I have a .csv which contains 20+ columns. The problem is that I have a column which contains 0, 1, "(10,12), "(20,11)", 9 When trying to read it in a dataframe using read_csv the "" values are not parsed. What can I use to parse that (10,12) value in a cell?
I tried using all the read_csv options like quotechar, quoting and doublequote. Didn't help. I tried replacing " but that splits the value in 2 columns.
Solution
The issue with your example is that your quotes are unbalanced:
0, 1, "(10,12), "(20,11)", 9
Should be:
0, 1, "(10,12)", "(20,11)", 9
You can fix this by checking for ),
and adding the missing quote (example here using a regex). In addition, the space between the command the the quote is not accepted by the parse by default. To avoid this you have to pass skipinitialspace=True
to read_csv
:
import io
import re
import pandas as pd
with open('my_csv.csv') as f:
df = pd.read_csv(io.StringIO(re.sub(r'(\))\s*(,)', r'\1"\2', f.read())),
skipinitialspace=True,
header=None, # optional, only if you have no header
)
Output:
0 1 2 3 4
0 0 1 (10,12) (20,11) 9
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.