Issue
I have a dataframe where I have very different kinds of entries (text, integers, floats, times, etc.) and I am trying to delete leading and trailing whitespaces from text entries so that my other code would work as expected. However, my code does not seem to work.
Here is a simple example of what I'm trying to do:
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.array(([np.nan, 2, 3], [4, 5, 6])), columns=["one", "two", "three"])
print(df1)
print("")
df2 = df1.map(lambda x: x.strip() if isinstance(x, str) else x)
print(df2)
print("")
print(df1==df2)
print("")
cell1 = df1.at[0, "one"]
cell2 = df2.at[0, "one"]
print(cell1, type(cell1))
print(cell2, type(cell2))
print(cell1==cell2)
When I run this code, the output is:
one two three
0 NaN 2.0 3.0
1 4.0 5.0 6.0
one two three
0 NaN 2.0 3.0
1 4.0 5.0 6.0
one two three
0 False True True
1 True True True
nan <class 'numpy.float64'>
nan <class 'numpy.float64'>
False
As you can see, df1
and df2
have exactly the same entires (NaN) but the code block print(cell1==cell2)
claims that these cells are different.
What is going on in here?
Solution
Thats how floats work, you can't compare directly NaN
s (Why is NaN not equal to NaN?)
Use Dataframe.equals
to compare the dataframes:
df1 = pd.DataFrame(
np.array(([np.nan, 2, 3], [4, 5, 6])), columns=["one", "two", "three"]
)
df2 = df1.map(lambda x: x.strip() if isinstance(x, str) else x)
print(df1.equals(df2))
Prints:
True
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.