Issue
I am currently trying to generate a numpy array with random data normal = np.round(np.random.normal(loc=0.0,scale=1000,size=(size)),1).astype(int)
, with seed = np.random.seed(0)
and then categorize them in an equidistant way such as:
d=10
data = np.ndarray.flatten(np.asarray(normal,dtype=int))
interval = np.divide(np.max(data)-np.min(data),d)
intervals = np.arange(np.min(data),np.max(data),interval,dtype=int)
for x in range(len(data)):
for z in range(d-1):
if data[x] >= intervals[z] and data[x] < intervals[z+1]:
data[x] = z
elif data[x] > intervals[-1]:
data[x] = d-1
Ideally when I do this, I would expect the values in my data array to be replaced by values from 0-9, but whenever I run this, I end up with values from 4-8. Anyone have an idea what I might be doing wrong or how to improve this method?
Interval is the delta value to be used and Intervals are the actual boundary values for the respective intervals.
Solution
You are overwriting the data as you check for the right interval. Introduce a different data array you fill as you go along, leaving the source data untouched:
import numpy as np
import matplotlib.pyplot as plt
size = 1000
seed = np.random.seed(0)
normal = np.round(np.random.normal(loc=0.0, scale=1000, size=(size)), 1).astype(int)
d=10
data = np.ndarray.flatten(np.asarray(normal, dtype=int))
interval = np.divide(np.max(data)-np.min(data), d)
intervals = np.arange(np.min(data), np.max(data), interval, dtype=int)
data_d = np.zeros_like(data) # added
for x in range(len(data)):
for z in range(d-1):
if data[x] >= intervals[z] and data[x] < intervals[z+1]: # check original data
data_d[x] = z # fill new array so not to overwrite
elif data[x] > intervals[-1]:
data_d[x] = d-1
plt.hist(data_d)
plt.show()
which produces (all values taken):
Answered By - GrimTrigger
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.