Issue
index | key | M1 | M2 | M3 | M4 | M5 | M6 | M7 | M8 | M9 | M10 | Average | Count_G10 | Count_L10 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | a | 12 | 0 | 159 | 0 | 20 | 49 | 0 | 131 | 157 | 153 | 68.1 | 4 | 3 |
1 | b | 0 | 68 | 195 | 189 | 0 | 79 | 12 | 179 | 21 | 62 | 80.5 | 3 | 4 |
2 | c | 0 | 139 | 0 | 188 | 12 | 0 | 31 | 87 | 152 | 73 | 68.2 | 4 | 2 |
3 | d | 126 | 156 | 0 | 112 | 178 | 146 | 0 | 19 | 192 | 25 | 95.4 | 6 | 2 |
4 | e | 109 | 0 | 172 | 0 | 0 | 0 | 44 | 145 | 186 | 100 | 75.6 | 5 | 1 |
5 | f | 63 | 183 | 194 | 183 | 0 | 163 | 136 | 13 | 163 | 162 | 126 | 6 | 2 |
6 | g | 101 | 143 | 0 | 184 | 0 | 107 | 103 | 0 | 60 | 133 | 83.1 | 6 | 1 |
7 | h | 13 | 101 | 139 | 86 | 101 | 72 | 93 | 151 | 0 | 0 | 75.6 | 6 | 1 |
8 | i | 182 | 71 | 73 | 73 | 129 | 32 | 56 | 135 | 0 | 114 | 86.5 | 4 | 5 |
9 | j | 82 | 0 | 198 | 0 | 117 | 21 | 0 | 32 | 64 | 146 | 66 | 4 | 2 |
10 | k | 145 | 0 | 194 | 0 | 156 | 71 | 0 | 89 | 57 | 31 | 74.3 | 4 | 2 |
I would like to get the columns count_G10 and count_L10 where the logic for count_G10 is as follows: count of months(M1 to M10 columns) where value is >0 and ((value-average)/average) > 0.1
Similarly, count_L10 logic is: count of months(M1 to M10 columns) where value is >0 and ((value-average)/average) < -0.1
I have tried the following in Pandas:
Oct20_Nov21 = [202010,202011,202012,202101,202102,202103,202104,202105,202106,202107,202108,202109,202110,202111]
new_df3['G10%'] = new_df3[Oct20_Nov21].applymap(lambda x : 1 if (((x-new_df3.avg_w_s)/new_df3.avg_w_s) > 0.1).any() else 0).values.sum(axis=1)
I get the following error:
The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Let me know what am I missing here. Thank you
Solution
Don't use applymap
as it is both very slow and will try to perform operations on scalar values and Series leading to the error shown.
Instead it is best to perform the operations at the DataFrame level.
In the most verbose way we could do:
m_df = df.filter(like='M')
df['Count_G10'] = (
m_df.gt(0) &
m_df.sub(df['Average'], axis=0).div(df['Average'], axis=0).gt(0.1)
).sum(axis=1)
df['Count_L10'] = (
m_df.gt(0) &
m_df.sub(df['Average'], axis=0).div(df['Average'], axis=0).lt(-0.1)
).sum(axis=1)
key | M1 | M2 | M3 | M4 | M5 | M6 | M7 | M8 | M9 | M10 | Average | Count_G10 | Count_L10 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
a | 12 | 0 | 159 | 0 | 20 | 49 | 0 | 131 | 157 | 153 | 68.1 | 4 | 3 |
b | 0 | 68 | 195 | 189 | 0 | 79 | 12 | 179 | 21 | 62 | 80.5 | 3 | 4 |
c | 0 | 139 | 0 | 188 | 12 | 0 | 31 | 87 | 152 | 73 | 68.2 | 4 | 2 |
d | 126 | 156 | 0 | 112 | 178 | 146 | 0 | 19 | 192 | 25 | 95.4 | 6 | 2 |
e | 109 | 0 | 172 | 0 | 0 | 0 | 44 | 145 | 186 | 100 | 75.6 | 5 | 1 |
f | 63 | 183 | 194 | 183 | 0 | 163 | 136 | 13 | 163 | 162 | 126 | 6 | 2 |
g | 101 | 143 | 0 | 184 | 0 | 107 | 103 | 0 | 60 | 133 | 83.1 | 6 | 1 |
h | 13 | 101 | 139 | 86 | 101 | 72 | 93 | 151 | 0 | 0 | 75.6 | 6 | 1 |
i | 182 | 71 | 73 | 73 | 129 | 32 | 56 | 135 | 0 | 114 | 86.5 | 4 | 5 |
j | 82 | 0 | 198 | 0 | 117 | 21 | 0 | 32 | 64 | 146 | 66 | 4 | 2 |
k | 145 | 0 | 194 | 0 | 156 | 71 | 0 | 89 | 57 | 31 | 74.3 | 4 | 2 |
First filter
to select only the M columns (There are many other ways to select only the desired columns which would also work):
m_df = df.filter(like='M')
M1 M2 M3 M4 M5 M6 M7 M8 M9 M10
0 12 0 159 0 20 49 0 131 157 153
1 0 68 195 189 0 79 12 179 21 62
2 0 139 0 188 12 0 31 87 152 73
3 126 156 0 112 178 146 0 19 192 25
4 109 0 172 0 0 0 44 145 186 100
5 63 183 194 183 0 163 136 13 163 162
6 101 143 0 184 0 107 103 0 60 133
7 13 101 139 86 101 72 93 151 0 0
8 182 71 73 73 129 32 56 135 0 114
9 82 0 198 0 117 21 0 32 64 146
10 145 0 194 0 156 71 0 89 57 31
Then use the comparison operations gt
and lt
to do the value comparison operations.
Step 1: check is for values gt
0:
m_df.gt(0)
M1 M2 M3 M4 M5 M6 M7 M8 M9 M10
0 True False True False True True False True True True
1 False True True True False True True True True True
2 False True False True True False True True True True
3 True True False True True True False True True True
4 True False True False False False True True True True
5 True True True True False True True True True True
6 True True False True False True True False True True
7 True True True True True True True True False False
8 True True True True True True True True False True
9 True False True False True True False True True True
10 True False True False True True False True True True
Evaluate: ((value - average) / average). Both operations need to align on axis=0
.
Step 2: Subtract
m_df.sub(df['Average'], axis=0)
M1 M2 M3 M4 M5 M6 M7 M8 M9 M10
0 -56.1 -68.1 90.9 -68.1 -48.1 -19.1 -68.1 62.9 88.9 84.9
1 -80.5 -12.5 114.5 108.5 -80.5 -1.5 -68.5 98.5 -59.5 -18.5
2 -68.2 70.8 -68.2 119.8 -56.2 -68.2 -37.2 18.8 83.8 4.8
3 30.6 60.6 -95.4 16.6 82.6 50.6 -95.4 -76.4 96.6 -70.4
4 33.4 -75.6 96.4 -75.6 -75.6 -75.6 -31.6 69.4 110.4 24.4
5 -63.0 57.0 68.0 57.0 -126.0 37.0 10.0 -113.0 37.0 36.0
6 17.9 59.9 -83.1 100.9 -83.1 23.9 19.9 -83.1 -23.1 49.9
7 -62.6 25.4 63.4 10.4 25.4 -3.6 17.4 75.4 -75.6 -75.6
8 95.5 -15.5 -13.5 -13.5 42.5 -54.5 -30.5 48.5 -86.5 27.5
9 16.0 -66.0 132.0 -66.0 51.0 -45.0 -66.0 -34.0 -2.0 80.0
10 70.7 -74.3 119.7 -74.3 81.7 -3.3 -74.3 14.7 -17.3 -43.3
Step 3: Divide
m_df.sub(df['Average'], axis=0).div(df['Average'], axis=0)
M1 M2 M3 M4 M5 M6 M7 M8 M9 M10
0 -0.823789 -1.000000 1.334802 -1.000000 -0.706314 -0.280470 -1.000000 0.923642 1.305433 1.246696
1 -1.000000 -0.155280 1.422360 1.347826 -1.000000 -0.018634 -0.850932 1.223602 -0.739130 -0.229814
2 -1.000000 1.038123 -1.000000 1.756598 -0.824047 -1.000000 -0.545455 0.275660 1.228739 0.070381
3 0.320755 0.635220 -1.000000 0.174004 0.865828 0.530398 -1.000000 -0.800839 1.012579 -0.737945
4 0.441799 -1.000000 1.275132 -1.000000 -1.000000 -1.000000 -0.417989 0.917989 1.460317 0.322751
5 -0.500000 0.452381 0.539683 0.452381 -1.000000 0.293651 0.079365 -0.896825 0.293651 0.285714
6 0.215403 0.720818 -1.000000 1.214200 -1.000000 0.287605 0.239471 -1.000000 -0.277978 0.600481
7 -0.828042 0.335979 0.838624 0.137566 0.335979 -0.047619 0.230159 0.997354 -1.000000 -1.000000
8 1.104046 -0.179191 -0.156069 -0.156069 0.491329 -0.630058 -0.352601 0.560694 -1.000000 0.317919
9 0.242424 -1.000000 2.000000 -1.000000 0.772727 -0.681818 -1.000000 -0.515152 -0.030303 1.212121
10 0.951548 -1.000000 1.611036 -1.000000 1.099596 -0.044415 -1.000000 0.197847 -0.232840 -0.582773
Step 4: Compare (gt or lt depending)
m_df.sub(df['Average'], axis=0).div(df['Average'], axis=0).gt(0.1)
M1 M2 M3 M4 M5 M6 M7 M8 M9 M10
0 False False True False False False False True True True
1 False False True True False False False True False False
2 False True False True False False False True True False
3 True True False True True True False False True False
4 True False True False False False False True True True
5 False True True True False True False False True True
6 True True False True False True True False False True
7 False True True True True False True True False False
8 True False False False True False False True False True
9 True False True False True False False False False True
10 True False True False True False False True False False
Step 5: Find where both conditions are True with logical AND (&
)
(
m_df.gt(0) &
m_df.sub(df['Average'], axis=0).div(df['Average'], axis=0).gt(0.1)
)
M1 M2 M3 M4 M5 M6 M7 M8 M9 M10
0 False False True False False False False True True True
1 False False True True False False False True False False
2 False True False True False False False True True False
3 True True False True True True False False True False
4 True False True False False False False True True True
5 False True True True False True False False True True
6 True True False True False True True False False True
7 False True True True True False True True False False
8 True False False False True False False True False True
9 True False True False True False False False False True
10 True False True False True False False True False False
Step 6: Count the number of True values in each row with sum (True
is 1
and False
is 0
(the additive identity) which is why sum works to count number of True
values)
(
m_df.gt(0) &
m_df.sub(df['Average'], axis=0).div(df['Average'], axis=0).gt(0.1)
).sum(axis=1)
0 4
1 3
2 4
3 6
4 5
5 6
6 6
7 6
8 4
9 4
10 4
dtype: int64
An almost identical process occurs for Count_L10
the only difference is checking .lt(-0.1)
instead of .gt(0.1)
.
This operation can be greatly simplified by extracting and reusing common operations and refactoring the expression:
m_df = df.filter(like='M')
# Shared Condition
m = m_df.gt(0)
# Values
v = m_df.div(df['Average'], axis=0) - 1
df['Count_G10'] = (m & v.gt(0.1)).sum(axis=1)
df['Count_L10'] = (m & v.lt(-0.1)).sum(axis=1)
Both conditions use the check for values greater than 0 so we can keep that in a variable (m
) to use multiple times. Both expressions compare against the same expression ((value - average) / average) we can also store this in a variable v
.
The expression ((value - average) / average) can also be simplified to just ((value / average) - 1).
Since ((v - a) / a) = ((v/a) - (a / a)) = ((v/a) - 1)
This will reduce overall computation time, at the expense of some readability, but produces the same results:
key | M1 | M2 | M3 | M4 | M5 | M6 | M7 | M8 | M9 | M10 | Average | Count_G10 | Count_L10 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
a | 12 | 0 | 159 | 0 | 20 | 49 | 0 | 131 | 157 | 153 | 68.1 | 4 | 3 |
b | 0 | 68 | 195 | 189 | 0 | 79 | 12 | 179 | 21 | 62 | 80.5 | 3 | 4 |
c | 0 | 139 | 0 | 188 | 12 | 0 | 31 | 87 | 152 | 73 | 68.2 | 4 | 2 |
d | 126 | 156 | 0 | 112 | 178 | 146 | 0 | 19 | 192 | 25 | 95.4 | 6 | 2 |
e | 109 | 0 | 172 | 0 | 0 | 0 | 44 | 145 | 186 | 100 | 75.6 | 5 | 1 |
f | 63 | 183 | 194 | 183 | 0 | 163 | 136 | 13 | 163 | 162 | 126 | 6 | 2 |
g | 101 | 143 | 0 | 184 | 0 | 107 | 103 | 0 | 60 | 133 | 83.1 | 6 | 1 |
h | 13 | 101 | 139 | 86 | 101 | 72 | 93 | 151 | 0 | 0 | 75.6 | 6 | 1 |
i | 182 | 71 | 73 | 73 | 129 | 32 | 56 | 135 | 0 | 114 | 86.5 | 4 | 5 |
j | 82 | 0 | 198 | 0 | 117 | 21 | 0 | 32 | 64 | 146 | 66 | 4 | 2 |
k | 145 | 0 | 194 | 0 | 156 | 71 | 0 | 89 | 57 | 31 | 74.3 | 4 | 2 |
Setup used:
import pandas as pd
df = pd.DataFrame({
'key': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k'],
'M1': [12, 0, 0, 126, 109, 63, 101, 13, 182, 82, 145],
'M2': [0, 68, 139, 156, 0, 183, 143, 101, 71, 0, 0],
'M3': [159, 195, 0, 0, 172, 194, 0, 139, 73, 198, 194],
'M4': [0, 189, 188, 112, 0, 183, 184, 86, 73, 0, 0],
'M5': [20, 0, 12, 178, 0, 0, 0, 101, 129, 117, 156],
'M6': [49, 79, 0, 146, 0, 163, 107, 72, 32, 21, 71],
'M7': [0, 12, 31, 0, 44, 136, 103, 93, 56, 0, 0],
'M8': [131, 179, 87, 19, 145, 13, 0, 151, 135, 32, 89],
'M9': [157, 21, 152, 192, 186, 163, 60, 0, 0, 64, 57],
'M10': [153, 62, 73, 25, 100, 162, 133, 0, 114, 146, 31],
'Average': [68.1, 80.5, 68.2, 95.4, 75.6, 126.0, 83.1, 75.6, 86.5, 66.0,
74.3]
})
Answered By - Henry Ecker
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.