Issue
Say I needed to find the sample standard deviation of the data set sample A
where
sampleA = [34.6, 40.7, 37.5, 45.8, 41.4, 44.2, 44.5, 51.8, 47.5, 45.4, 36.4, 46.2, 43.0, 43.3, 42.0] # mass (g)
. By using the np.std()
function we obtain the result 4.309
. However, this is incorrect, as n = 15
for sample A
. Resulting in a necessary change in the std. dev formula due to Student's t-distribution.
The correct function would look something like this:
def sam_stddev(data_set):
sum = 0
for n in data_set:
sum += (n - sam_mean(data_set)) ** 2
S_x = ((1/(len(data_set) - 1) * sum)) ** 0.5
return S_x
By using this function, we obtain the correct result of 4.460
. Now, obviously I can just use this defined function, but I was wondering whether there is some kind of modifier that I can use, maybe something like np.std(sampleA, "n" = 15)
that would allow me to do this by default. Alternatively, is there another library I should be using that has this built-in? I have looked at the numpy.std() documentation at https://numpy.org/doc/stable/reference/generated/numpy.std.html, but honestly, I'm inexperienced with how to actually read that.
Solution
By default numpy divides by 1/N
. If you want as 1/(N - 1)
then you have to set the ddof
param to 1
np_arr.std(ddof=1)
4.460119579861807
Answered By - the_ordinary_guy
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.