Issue
I've this issue with this function, which must return the proportion of each feature in one column.
Here's some data much easy to make up an idea:
df2 = pd.DataFrame({'X': ['A', 'A', 'B' , 'C'], 'Y': [1, 0, 0 , 1], 'Z': [1, 0, 1 , 1]})
df2['X'].value_counts()
When I count the values I get
A 2
B 1
C 1
Now, I need to get the proportion for each value of "X"
for freq in df2['X'].value_counts():
#print(freq)
print(freq/df2['X'].value_counts().sum())
The result below :
0.5
0.25
0.25
Perfect,
Now I must apply to my Dataframe and get a new column. Below the function:
def get_proportion(df):
for freq in df2['X'].value_counts():
return (freq/df2['X'].value_counts().sum())
df2["A"]=df2.apply(get_proportion, axis=1)
result:
X Y Z A
0 A 1 1 0.5
1 A 0 0 0.5
2 B 0 1 0.5
3 C 1 1 0.5
I should get
X Y Z A
0 A 1 1 0.5
1 A 0 0 0.5
2 B 0 1 0.25
3 C 1 1 0.25
What's wrong ?
If I set an argument
df2["A"]=df2.apply(get_proportion(df2), axis=1)
I get an error
TypeError: 'numpy.float64' object is not callable
Thank you if you can help.
Solution
df2["A"] = df2.X.apply(lambda x: (df2["X"].value_counts() / len(df2))[x])
len(df2)
is number of rows of the dataframedf2
,(df2["X"].value_counts() / len(df2))
is a series with relative occurences of elements in the column"X"
.
Answered By - MarianD
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.