Issue
I would like to calculate an approximation of the data Distribution Function in a system of equations
:
F(t) := P(X <= t) ~ sum_i_frequency(observation_i <=t) / total_observation =: f(t)
My data:
List_Goals: [1, 2, 2, 1, 2]
Matches played: 5
For example, if a football club, in the last 5 matches, scores [1, 2, 2, 1, 2], it means that:
- 0 goals, scored 0 times (nothing);
- 1 or less goals, scored 2 times (1, 1);
- 2 or fewer goals, scored 5 times (1, 2, 2, 1, 2);
If the goals scored are <= 0 events, then i will have 0/5 = 0;
If the goals scored are <= 1 events, then I will have 2/5 = 0.4;
If the goals scored are <= 2 events, then I will have 5/5 = 1;
f(0) = 0/5 = 0;
f(1) = 2/5= 0.4;
f(2) = 5/5= 1;
System: {f(0) = F(0) } therefore 0/5= 0
{f(1) = F(1) } therefore 2/5= 0.4
{f(2) = F(2) } therefore 5/5= 1
In this case i have three equations in one unknown, but obviously the system must be set up with a number of equations ranging from F(0)
to F(max_goals_scored)
. This way I would have max_goals_scored + 1 equations
. I should therefore start the solution starting from the max_goals_scored+1 equations
and increase, for example, if I had a number 6 in List_Goals (List_Goals: [1, 2, 6, 1, 2], the functions would be with a maximum of 6: F(0), F(1), F(2), F(3), F(4), F(5), F(6)
How can I automate everything in Python? I accept any library
Solution
Use bincount
:
import numpy as np
Goals = [1, 2, 2, 1, 2]
f = np.bincount(Goals)/len(Goals)
f
array([0. , 0.4, 0.6])
F = f.cumsum()
array([0. , 0.4, 1. ])
Note that f
is the probability mass function while F
is the cummulative mass function. The two are not the same
Answered By - Onyambu
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.