Issue
Here is how I encountered the warning:
df.loc[a_list][df.a_col.isnull()]
The type of a_list
is Int64Index
, it contains a list of row indexes. All of these row indexes belong to df
.
The df.a_col.isnull()
part is a condition I need for filtering.
If I execute the following commands individually, I do not get any warnings:
df.loc[a_list]
df[df.a_col.isnull()]
But if I put them together df.loc[a_list][df.a_col.isnull()]
, I get the warning message (but I can see the result):
Boolean Series key will be reindexed to match DataFrame index
What is the meaning of this warning message? Does it affect the result that it returned?
Solution
Your approach will work despite the warning, but it's best not to rely on implicit, unclear behavior.
Solution 1, make the selection of indices in a_list
a boolean mask:
df[df.index.isin(a_list) & df.a_col.isnull()]
Solution 2, do it in two steps:
df2 = df.loc[a_list]
df2[df2.a_col.isnull()]
Solution 3, if you want a one-liner, use a trick found here:
df.loc[a_list].query('a_col != a_col')
The warning comes from the fact that the boolean vector df.a_col.isnull()
is the length of df
, while df.loc[a_list]
is of the length of a_list
, i.e. shorter. Therefore, some indices in df.a_col.isnull()
are not in df.loc[a_list]
.
What pandas does is reindex the boolean series on the index of the calling dataframe. In effect, it gets from df.a_col.isnull()
the values corresponding to the indices in a_list
. This works, but the behavior is implicit, and could easily change in the future, so that's what the warning is about.
Answered By - IanS
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.