Issue
If I import this HTML file
pd_df = pd.read_html('./output.html')
pd_df[0]
the last field becomes a float, but it's a string (in example from 05269
to 5269.0
). I know I can apply pd_df = pd.read_html('./output.html',converters={'CAP': str})
, but my question is: is there a way to apply globally str casting to all fields, using read_html?
Because this is an example file, this is a simple example, but often I have a lot of fields and a global option would be great.
Solution
(edited)
What you can do is read it twice. Once to get the column names, and then another time with converters to make sure all columns are read as str
:
url = "https://gist.githubusercontent.com/aborruso/599153968878f452bd3c68f3de0f29c4/raw/1156d224a4290393409ceef285c238c09b6bd08e/input.html"
df = pd.read_html(url)[0]
converters = {c:lambda x: str(x) for c in df.columns}
df = pd.read_html(url, converters=converters)[0]
print(df)
# results in:
Beneficiario Comune CAP Provincia Importo
0 RNDFNC60E16 RIPACANDIDA 85020 POTENZA 09269
1 RNDFNC60E16 NaN NaN POTENZA 05269
print(df.dtypes)
# results in :
Beneficiario object
Comune object
CAP object
Provincia object
Importo object
dtype: object
Answered By - Roy2012
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.