Issue
I have a dataframe:
df = pd.DataFrame(np.random.randint(0,100,size=(15, 4)), columns=list('ABCD'))
I would like to create another BOOL column or YES/NO column based on the sum of column A and B > 150
I am trying a generator kind of solution:
df['Truth'] = ['Yes' for i in df.columns.values if (df.A+df.B > 150)]
I know this does not work but I keep getting another error
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
How do I code this and what does this error mean?
Solution
How to get a column of Boolean values:
(df.A + df.B) > 150
generates apandas.Series
of Boolean values. Assign it to a column name.
import pandas as pd
import numpy as np
# sample data
np.random.seed(2)
df = pd.DataFrame(np.random.randint(0, 100, size=(15, 4)), columns=list('ABCD'))
# create the Boolean column
df['Truth'] = (df.A + df.B) > 150
# display(df)
A B C D Truth
0 40 15 72 22 False
1 43 82 75 7 False
2 34 49 95 75 False
3 85 47 63 31 False
4 90 20 37 39 False
5 67 4 42 51 False
6 38 33 58 67 False
7 69 88 68 46 True
8 70 95 83 31 True
9 66 80 52 76 False
10 50 4 90 63 False
11 79 49 39 46 False
12 8 50 15 8 False
13 17 22 73 57 False
14 90 62 83 96 True
What does this error mean:
- What is shown in the question is a list-comprehension, not a generator.
(df.A + df.B)
returns apandas.Series
, which can be compared to a value like150
- The issue with the list comprehension is
if (df.A+df.B > 150)
, which causes theValueError
because there is a series, not just a single Boolean.
- The issue with the list comprehension is
- Another issue is
df.columns.values
is just a list of the column names. - See Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all() for further details on the error.
Answered By - Trenton McKinney
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.