Issue
Let's say we have a dataframe col df['Old'] that looks like this
0 NaN
1 NEWARK, NJ
And we apply this to fill out the values of a new col
df['New']=np.where(df['Old'].str.fullmatch('.*,...')==False,'Value','Else Value')
The result is
0 Else Value
1 Else Value
This makes sense for the second row, since the regex matches therefore the fullmatch evaluates to True and so we return the else value. But for the NaN, why is it also returning the else value? The NaN regex does not match so shouldn't the fullmatch evaluate to False and return Value?
Solution
This is because df['Old'].str.fullmatch('.*,...')
returns NaN
for the first value and NaN == False
is False
Reproducible example with intermediates:
df = pd.DataFrame({'Old': [float('nan'), 'NEWARK, NJ']})
df['match'] = df['Old'].str.fullmatch('.*,...')
df['match==False'] = df['Old'].str.fullmatch('.*,...') == False
df['New'] = np.where(df['Old'].str.fullmatch('.*,...') == False,
'Value', 'Else Value')
Output:
Old match match==False New
0 NaN NaN False Else Value
1 NEWARK, NJ True False Else Value
Workaround: first fillna
with an empty string:
df['New'] = np.where(df['Old'].fillna('').str.fullmatch('.*,...') == False,
'Value', 'Else Value')
Or with the boolean NOT (~
):
df['New'] = np.where(~df['Old'].fillna('').str.fullmatch('.*,...'),
'Value', 'Else Value')
Or reversing the values:
df['New'] = np.where(df['Old'].fillna('').str.fullmatch('.*,...'),
'Else Value', 'Value')
Output:
Old New
0 NaN Value
1 NEWARK, NJ Else Value
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.