Issue
I read some data from a file. The first column is assigned 'object' type because of the XXX in the very first data row:
tips = pd.read_csv("tips.csv")
print(tips.head())
print(tips.info())
total_bill tip sex smoker day time size
0 xxx 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 total_bill 244 non-null object
1 tip 244 non-null float64
2 sex 244 non-null object
3 smoker 244 non-null object
4 day 244 non-null object
5 time 244 non-null object
6 size 244 non-null int64
So, this will fail because of that one XXX in the first row of data where a number should be:
tips['tip_pct'] = tips['tip'] / (tips['total_bill'] - tips['tip'])
How do I rewrite the above line to filter out the bad row, without actually changing the contents of the DataFrame?
Solution
You can wrap the column that has the 'xxx' in pd.to_numeric
using errors='coerce'
. This will convert string type values to NaN
so your operation can happen and your dataframe will be unchanged
tips['tip_pct'] = tips['tip'] / (pd.to_numeric(tips['total_bill'],errors='coerce') - tips['tip'])
total_bill tip sex smoker day time size p_pct
0 xxx 1.01 Female No Sun Dinner 2 NaN
1 10.34 1.66 Male No Sun Dinner 3 0.191244
2 21.01 3.50 Male No Sun Dinner 3 0.199886
3 23.68 3.31 Male No Sun Dinner 2 0.162494
4 24.59 3.61 Female No Sun Dinner 4 0.172069
Answered By - sophocles
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.