Issue
I am trying to count the number of occurrences of a NumPy array by having the first filter and then counting the second column of occurrences.
DataSet information:
data_dict = {
'Outlook' : ['Sunny', 'Sunny', 'Overcast', 'Rainy', 'Rainy', 'Rainy', 'Overcast', 'Sunny', 'Sunny','Rainy', 'Sunny', 'Overcast', 'Overcast', 'Rainy']
,'Temperature': ['Hot', 'Hot', 'Hot', 'Mild', 'Cool', 'Cool', 'Cool', 'Mild', 'Cool', 'Mild','Mild','Mild', 'Hot', 'Mild']
,'Humidity' : ['High', 'High', 'High', 'High', 'Normal', 'Normal', 'Normal', 'High','Normal','Normal', 'Normal', 'High', 'Normal', 'High']
,'Wind': ['False', 'True', 'False', 'False', 'False', 'True', 'True', 'False', 'False', 'False', 'True', 'True', 'False', 'True']
,'label': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No']
}
Resulting DataFrame:
Outlook Temperature Humidity Wind label
0 Sunny Hot High False No
1 Sunny Hot High True No
2 Overcast Hot High False Yes
3 Rainy Mild High False Yes
4 Rainy Cool Normal False Yes
...
I would like to get the following:
Outlook No Yes All
Sunny 2 3 5
Overcast 4 0 4
Rain 3 2 5
Here is my code attempt (however it summarizes each column separately):
result = np.where(df.columns.values == 'label')
result1 = np.where(df.columns.values == 'Outlook')
lst = rows[:, [result, result1]]
uni, data = np.unique(lst, return_counts=True)
Solution
You can use a pivot table:
pd.pivot_table(
df,
values="Day",
index="Outlook",
columns="label",
aggfunc="count",
margins=True,
fill_value=0,
)
the result is:
Day
label No Yes All
Outlook
Overcast 0 4 4
Rainy 2 3 5
Sunny 3 2 5
All 5 9 14
The documentation is here
Answered By - itogaston
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.