Issue
I have a data file from columns A-G like below but when I am reading it with pd.read_csv('data.csv')
it prints an extra unnamed
column at the end for no reason.
colA ColB colC colD colE colF colG Unnamed: 7
44 45 26 26 40 26 46 NaN
47 16 38 47 48 22 37 NaN
19 28 36 18 40 18 46 NaN
50 14 12 33 12 44 23 NaN
39 47 16 42 33 48 38 NaN
I have seen my data file various times but I have no extra data in any other column. How I should remove this extra column while reading ? Thanks
Solution
df = df.loc[:, ~df.columns.str.contains('^Unnamed')]
In [162]: df
Out[162]:
colA ColB colC colD colE colF colG
0 44 45 26 26 40 26 46
1 47 16 38 47 48 22 37
2 19 28 36 18 40 18 46
3 50 14 12 33 12 44 23
4 39 47 16 42 33 48 38
NOTE: very often there is only one unnamed column Unnamed: 0
, which is the first column in the CSV file. This is the result of the following steps:
- a DataFrame is saved into a CSV file using parameter
index=True
, which is the default behaviour - we read this CSV file into a DataFrame using
pd.read_csv()
without explicitly specifyingindex_col=0
(default:index_col=None
)
The easiest way to get rid of this column is to specify the parameter pd.read_csv(..., index_col=0)
:
df = pd.read_csv('data.csv', index_col=0)
Answered By - MaxU - stop genocide of UA
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.