Issue
I have CSV files which I read in in pandas with:
#!/usr/bin/env python
import pandas as pd
import sys
filename = sys.argv[1]
df = pd.read_csv(filename)
Unfortunately, the last line of these files is often corrupt (has the wrong number of commas). Currently I open each file in a text editor and remove the last line.
Is it possible to remove the last line in the same python/pandas script that loads the CSV to save having to take this extra non-automated step?
Solution
Pass on_bad_lines='skip'
and it will skip this line automatically
df = pd.read_csv(filename, on_bad_lines='skip')
The advantage of
on_bad_lines='skip'
is it will skip and not bork on any erroneous lines. But if the last line is always duff thenskipfooter=1
is better.Thanks to @DexterMorgan for pointing out that
skipfooter
option forces the engine to use the python engine which is slower than the c engine for parsing a csv.
and here is an old version (don't use - it is removed from pandas2.0):
df = pd.read_csv(filename, error_bad_lines=False)
Deprecated since version 1.3.0: The on_bad_lines parameter should be used instead to specify behavior upon encountering a bad line instead.
Answered By - EdChum
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.