Issue
I have the dataframe with NaN and values, now I want to compare two columns in the same dataframe whether each row values in null or not null. For examples,
- if the column a_1 have null values, column a_2 have not null values, then for that particular row, the result should be 1 in the new column a_12.
- If the values in both a_1(value is 123) & a_2(value is 345) is not null, and the values are not equal, then the result should be 3 in column a_12.
below is the code snippet I have used for comparison, for the scenario 1, I am getting the result as 3 instead of 1. Please guide me to get the correct output.
try:
if (x[cols[0]]==x[cols[1]]) & (~np.isnan(x[cols[0]])) & (~np.isnan(x[cols[1]])):
return 0
elif (np.isnan(x[cols[0]])) & (np.isnan(x[cols[1]])):
return 0
elif (~np.isnan(x[cols[0]])) & (np.isnan(x[cols[1]])):
return 1
elif (np.isnan(x[cols[0]])) & (~np.isnan(x[cols[1]])):
return 2
elif (x[cols[0]]!=x[cols[1]]) & (~np.isnan(x[cols[0]])) & (~np.isnan(x[cols[1]])):
return 3
else:
pass
except Exception as exc:
if (x[cols[0]]==x[cols[1]]) & (pd.notna(x[cols[0]])) & (pd.notna(x[cols[1]])):
return 0
elif (pd.isna(x[cols[0]])) & (pd.isna(x[cols[1]])):
return 0
elif (pd.notna(x[cols[0]])) & (pd.isna(x[cols[1]])):
return 1
elif (pd.isna(x[cols[0]])) & (pd.notna(x[cols[1]])):
return 2
elif (x[cols[0]]!=x[cols[1]]) & (pd.notna(x[cols[0]])) & (pd.notna(x[cols[1]])):
return 3
else:
pass
I have used pd.isna() and pd.notna(), also np.isnan() and ~np.isnan(), because for some columns the second method (np.isnan()) is working, for some columns its just throwing an error.
Please guide me to achieve the result as excepted.
Expected Output:
| a_1 | a_2 | result |
|-----------|---------|--------|
| gssfwe | gssfwe | 0 |
| <NA> | <NA> | 0 |
| fsfsfw | <NA> | 1 |
| <NA> | qweweqw | 2 |
| adsadgsgd | wwuwquq | 3 |
Output Got with the above code:
| a_1 | a_2 | result |
|-----------|---------|--------|
| gssfwe | gssfwe | 0 |
| <NA> | <NA> | 0 |
| fsfsfw | <NA> | 3 |
| <NA> | qweweqw | 3 |
| adsadgsgd | wwuwquq | 3 |
Solution
Going by the logic in your code, you'd want to define a function and apply it across your DataFrame
.
df = pd.DataFrame({'a_1': [1, 2, np.nan, np.nan, 1], 'a_2': [2, np.nan, 1, np.nan, 1]})
The categories you want map neatly to binary numbers, which you can use to write a short function like -
def nan_check(row):
x, y = row
if x != y:
return int(f'{int(pd.notna(y))}{int(pd.notna(x))}', base=2)
return 0
df['flag'] = df.apply(nan_check, axis=1)
Output
a_1 a_2 flag
0 1.0 2.0 3
1 2.0 NaN 1
2 NaN 1.0 2
3 NaN NaN 0
4 1.0 1.0 0
Answered By - Mortz
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.