Issue
I have a df that I'm trying populate a new column based on a calculation. Here is an example below.
import pandas as pd
import numpy as np
df = pd.DataFrame(
{"Quantity" :[4721,1647],
"Total" : [236.05,82.35]},
index = [1,2])
df["CPS Gross"]= (df["Total"]/df["Quantity"])
conditions = [df["CPS Gross"] == 0.05]
values = [0.03]
df["CPS Calc"] = np.select(conditions,values)
print(df)
Can someone explain to me why the second row does not meet the condition test but the first row does even though they both meet the criteria of 0.05?
Any help would be much appreciated
Solution
I cannot comment your original post so I'll leave you an answer.
The reason why the second row does not meet the condition is because floats are not a precise type. Despite the real result of the operation 82.35/1647 is 0.05, if you check it on Python you will see that the result it gives is 0.049999999999999996, that is not exactly 0.05. That's the reason why 82.35/1647 == 0.05
is False... because it is really False! (in Python)
One solution you could try is using some method to take in consideration this behaviour like math.isclose()
. It does not check if two values are exactly the same but if two values are close to each other in a certain range (called tolerance).
from math import isclose
...
conditions = [ws['New Gross'].apply(lambda x: isclose(x, 0.05, abs_tol=1e-8))]
values = [0.03]
df["CPS Calc"] = np.select(conditions, values)
If you check that you will see like now both rows are selected!
Answered By - rmoret
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.