Issue
I have a dictionary that looks like this (although is much larger):
{100: 8,
110: 2,
1000: 4
2200: 3,
4000: 1
11000: 1,
}
Each pair consists of value:number of occurrences in my dataset. I need to calculate the median of my dataset. Any hints/ideas how to do it?
I am using Python 3.6
EDIT:
I don't want to create a list (because of the size of my dataset). The size of the list was actually the very reason to use a dictionary instead. So, I am looking for another way.
Solution
So, not finding a satisfying answer, this is what I have come up with:
from collections import OrderedDict
import statistics
d = {
100: 8,
110: 2,
1000: 4,
2200: 3,
4000: 1,
11000: 1,
}
# Sort the dictionary
values_sorted = OrderedDict(sorted(d.items(), key=lambda t: t[0]))
index = sum(values_sorted.values())/2
# Decide whether the number of records is an even or odd number
if (index).is_integer():
even = True
else:
even = False
x = True
# Compute median
for value, occurences in values_sorted.items():
index -= occurences
if index < 0 and x is True:
median_manual = value
break
elif index == 0 and even is True:
median_manual = value/2
x = False
elif index < 0 and x is False:
median_manual += value/2
break
# Create a list of all records and compute median using statistics package
values_list = list()
for val, count in d.items():
for count in range(count):
values_list.append(val)
median_computed = statistics.median(values_list)
# Test the two results are equal
if median_manual != median_computed:
raise RuntimeError
I have tested it with different datasets and compared the results with the median computed by statistics.median() and the results were the same.
Answered By - Jan Pisl
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.