Thursday, February 3, 2022

[FIXED] How to filter rows based on cell contents in a row-based expression

February 03, 2022 pandas, python, python-3.x No comments

Issue

I read some data from a file. The first column is assigned 'object' type because of the XXX in the very first data row:

tips = pd.read_csv("tips.csv")
print(tips.head())
print(tips.info())

total_bill   tip     sex smoker  day    time  size    
0        xxx  1.01  Female     No  Sun  Dinner     2    
1      10.34  1.66    Male     No  Sun  Dinner     3    
2      21.01  3.50    Male     No  Sun  Dinner     3    
3      23.68  3.31    Male     No  Sun  Dinner     2    
4      24.59  3.61  Female     No  Sun  Dinner     4    
<class 'pandas.core.frame.DataFrame'>    
RangeIndex: 244 entries, 0 to 243    
Data columns (total 7 columns):    
 #   Column      Non-Null Count  Dtype      
---  ------      --------------  -----      
 0   total_bill  244 non-null    object     
 1   tip         244 non-null    float64    
 2   sex         244 non-null    object     
 3   smoker      244 non-null    object     
 4   day         244 non-null    object     
 5   time        244 non-null    object     
 6   size        244 non-null    int64

So, this will fail because of that one XXX in the first row of data where a number should be:

tips['tip_pct'] = tips['tip'] / (tips['total_bill'] - tips['tip'])

How do I rewrite the above line to filter out the bad row, without actually changing the contents of the DataFrame?

Solution

You can wrap the column that has the 'xxx' in pd.to_numeric using errors='coerce'. This will convert string type values to NaN so your operation can happen and your dataframe will be unchanged

tips['tip_pct'] = tips['tip'] / (pd.to_numeric(tips['total_bill'],errors='coerce') - tips['tip'])

  total_bill   tip     sex   smoker  day     time        size     p_pct
0        xxx  1.01  Female     No  Sun     Dinner           2       NaN
1      10.34  1.66    Male     No  Sun     Dinner           3  0.191244
2      21.01  3.50    Male     No  Sun     Dinner           3  0.199886
3      23.68  3.31    Male     No  Sun     Dinner           2  0.162494
4      24.59  3.61  Female     No  Sun     Dinner           4  0.172069

Answered By - sophocles

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, February 3, 2022

[FIXED] How to filter rows based on cell contents in a row-based expression

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels