Issue
This is a really tricky statistics that I want to produce. My dataframe
contains information about true classes and prediction results of a machine learning model, for trips and corresponding trips' segments. The problem can best be explained with example, so I give the following example df
:
df = pd.DataFrame(
{'trip': [25, 25, 25, 25, 25, 25, 25, 25, 25, 54, 54, 54, 54,73,73,73,75,75],
'segment': [0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2,0,0,1,1,3],
'class': [3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1,2,2,2,1,1],
'prediction': [0, 0, 3, 3, 3, 4, 4, 2, 2, 0, 0, 1, 1,4,2,4,0,2]
}
)
df
trip segment class prediction
0 25 0 3 0
1 25 0 3 0
2 25 0 3 3
3 25 0 3 3
4 25 0 3 3
5 25 1 3 4
6 25 1 3 4
7 25 1 3 2
8 25 1 3 2
9 54 2 1 0
10 54 2 1 0
11 54 2 1 1
12 54 2 1 1
13 73 0 2 4
14 73 0 2 2
15 73 1 2 4
16 75 1 1 0
17 75 3 1 2
From the given df
, I would like to produce statistics of model's predictions at trip
and segment
levels, using prediction
's majority votes, considering the actual class
a trip
or segment
belongs to.
Segment's statistics
So considering the above df
, I would like to produce the below segment's statistics (explanation given below):
class total-segments correctly-predicted accuracy-rate
0 - - -
1 3 1 0.33
2 2 1 0.5
3 2 1 0.5
4 - - -
- no segment of
class
0
, so the dash. - there are 3 distinct segments of
class
type1
(segment2
of trip54
and segments1
&3
of trip75
). Of all the 3, only one (segment2
of trip54
has majority votes of itsprediction
correct, so1
correctly-predicted
and0.33
(i.e.1/3
) accuracy-rate. - there're 2 segments belonging to
class
type2
( segments0
&1
of trip73
). Segment0
has majority votes correct, so1
correctly-predicted
and0.5
(i.e.1/2
) accuracy-rate. - there're 2 segments of
class
3
(segments0
&1
of trip25
). Segment0
has majority votes correct, so1
correctly-predicted
and0.5
(i.e.1/5
) accuracy-rate. - no segment of
class
type4
.
Trip-level statistics
Similarly, considering the class
type of distinct trips in df
and their prediction
, I want to produce the following trip-level statistics (also explained below):
class total-trips correctly-predicted accuracy-rate
0 - - -
1 2 1 0.5
2 1 0 0.0
3 1 1 1.0
4 - - -
- no trip belongs to
class
0
. - 2 trips of
class
type1
(trip54
&75
). 1 trip was predicted correct (majority votes of trip54
), so1
correctly-predicted
trip, and0.5
accuracy-rate
. - 1 trip of
class
2
(trip73
). Its majority votes prediction is incorrect, so0
correctly-predicted
trip, and0.0
accuracy-rate
. - 1 trip of
class
3
(trip25
). Its majority votes prediction is correct (3
), so1
correctly-predicted
trip, and1.0
accuracy-rate
. - no trip of
class 4
.
Please forgive the long grammar, but this is a problem that one can understand only when well-explained.
Solution
You can do it this way. you can comment all but the first line and then uncomment one by one to see what is happening with the command line.
res_seg = (
df['class'].eq(df['prediction'])
.groupby([df['class'],df['segment']]).mean()
.ge(0.5)
.groupby(level='class').agg(['size','sum'])
.rename(columns={'size':'total_segments','sum':'correctly_predicted'})\
.assign(accuracy_rate = lambda x: x['correctly_predicted']/x['total_segments'])
.reindex(range(5), fill_value='-')
.reset_index()
)
print(res_seg)
# class total_segments correctly_predicted accuracy_rate
# 0 0 - - -
# 1 1 3 1 0.333333
# 2 2 2 1 0.5
# 3 3 2 1 0.5
# 4 4 - - -
and similar for the trips, you would have to change the df['segment']
to df['trip']
in the groupby and maybe the name of the columns in the rename
as well as the assign
Answered By - Ben.T
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.