Saturday, July 30, 2022

[FIXED] How to categorize numerical data in numpy array iteratively?

July 30, 2022 arrays, for-loop, numpy, python No comments

Issue

I am currently trying to generate a numpy array with random data normal = np.round(np.random.normal(loc=0.0,scale=1000,size=(size)),1).astype(int), with seed = np.random.seed(0) and then categorize them in an equidistant way such as:

d=10
data = np.ndarray.flatten(np.asarray(normal,dtype=int))

interval = np.divide(np.max(data)-np.min(data),d)
intervals = np.arange(np.min(data),np.max(data),interval,dtype=int)

        for x in range(len(data)):
            for z in range(d-1):
                if data[x] >= intervals[z] and data[x] < intervals[z+1]:
                    data[x] = z
                elif data[x] > intervals[-1]:
                    data[x] = d-1

Ideally when I do this, I would expect the values in my data array to be replaced by values from 0-9, but whenever I run this, I end up with values from 4-8. Anyone have an idea what I might be doing wrong or how to improve this method?

Interval is the delta value to be used and Intervals are the actual boundary values for the respective intervals.

Solution

You are overwriting the data as you check for the right interval. Introduce a different data array you fill as you go along, leaving the source data untouched:

import numpy as np
import matplotlib.pyplot as plt

size = 1000
seed = np.random.seed(0)
normal = np.round(np.random.normal(loc=0.0, scale=1000, size=(size)), 1).astype(int)

d=10
data = np.ndarray.flatten(np.asarray(normal, dtype=int))

interval = np.divide(np.max(data)-np.min(data), d)
intervals = np.arange(np.min(data), np.max(data), interval, dtype=int)

data_d = np.zeros_like(data) # added

for x in range(len(data)):
    for z in range(d-1):
        if data[x] >= intervals[z] and data[x] < intervals[z+1]: # check original data
            data_d[x] = z # fill new array so not to overwrite
        elif data[x] > intervals[-1]:
            data_d[x] = d-1
            
plt.hist(data_d)
plt.show()

which produces (all values taken):

Answered By - GrimTrigger

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, July 30, 2022

[FIXED] How to categorize numerical data in numpy array iteratively?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels