Issue
I'm trying to write dataframes to CSV. A lot of the incoming data is user-generated and may contain special characters. I can set escapechar='\\'
(for example), but then if there is a backslash in the data it gets written as "\"
which gets interpreted as an escaped double-quote as opposed to a string containing a backslash. How can I escape the escapechar (ie, how can I have to_csv write \\
by escaping the backslash?)
Example code:
import pandas as pd
import io, csv
data = [[1, "\\", "text"]]
df = pd.DataFrame(data)
sIo = io.StringIO()
df.to_csv(
sIo,
index=False,
sep=',',
header=False,
quoting=csv.QUOTE_MINIMAL,
doublequote=False,
escapechar='\\'
)
sioText = sIo.getvalue()
print(sioText)
Actual output:
1,"\",text
What I need:
1,"\\",text
The engineering use case that creates the constraints is that this will be some core code for moving data from one system to another. I won't know the format of the data in advance and won't have much control over it (any column could contain the escape character), and I can't control the escape character on the other side so the actual output will be read as an error. Hence the original question of "how do you escape the escape character."
For reference this parameter's definition in the pandas docs is:
escapecharstr, default None
String of length 1. Character used to escape sep and quotechar when appropriate.
Solution
Huh. This seems like an open issue with round-tripping data from pandas to csv. See this issue: https://github.com/pandas-dev/pandas/issues/14122, and especially pandas creator Wes McKinney's post:
This behavior is present in the csv module https://gist.github.com/wesm/7763d396ae25c9fd5b27588da27015e4 . From first principles seems like the offending backslash should be escaped. If I manually edit the file to be
"a" "Hello! Please \"help\" me. I cannot quote a csv.\\"
then read_csv returns the original input
I fiddled with R and it doesn't seem to do much better
> df <- data.frame(a=c("Hello! Please \"help\" me. I cannot quote a csv.\\"))> write.table(df, sep=',', qmethod='e', row.names=F) "a" "Hello! Please \"help\" me. I cannot quote a csv.\"
Another example of CSV not being a high fidelity data interchange tool =|
I'm as baffled as you that this doesn't work, but seems like the official position is... df[col]=df[col].str.replace({"\\": "\\\\"})
?
Answered By - Michael Delgado
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.