Issue
I have some data as a list of tuples, each containing two floats. Let's call them (A,B)
. A can be any positive float and B can be positive or negative.
The goal here is to take the mean of all the B values in ranges of A values (0-5, 5-10, etc.), and display those means on a matplotlib chart (something that looks like a histogram, but isn't). In other words, for the tuple (6.5, 2.1), the B value 2.1 would be included in the mean for all B values whose paired A value is in the 5-10 range.
The solution I have penciled out is to construct bins based on the range of A, then somehow sort the values of B into those bins based on the value of A. Create the bins with:
A_vals = [pairs[0] for pairs in tuple_list]
bins = range(min(A_vals), max(A_vals) + bin_width, bin_width)
But that's where I get stumped. My first thought was to create either a dictionary with the bin range as the key and a list of B values as the values, or create a list of lists where each sublist has the same index as the index of each bin. However, even going over the logistics of that on paper suggested that solution to be so complex, inefficient, and inelegant that it probably wouldn't work and it would almost certainly choke on a large dataset.
Solution
import numpy as np
data = [(6.5, 2), (3, 3), (4, 4), (5, 6.5), (7, 1), (11, 5.5)]
data = np.array(data)
edge = data[:,0].max() // 5 * 5
bins = np.arange(0, edge, 5)
# bins contains just the lower edges
indices = np.digitize(data[:,0], bins)
unique = set(indices)
means = np.empty(len(unique), dtype=float)
for index in unique:
bind = np.where(indices == index)
means[index-1] = data[bind,1].mean()
print("lower edges:", bins, "upper edges:", bins+5)
print("means for each interval:", means)
lower edges: [0. 5.] upper edges: [ 5. 10.]
means for each interval: [3.5 3.75]
Answered By - 9769953
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.