Issue
I got the following numpy array named 'data'. It consists of 15118 rows and 2 columns. The first column mostly consist of 0.01 steps, but sometimes there is a step in between (shown in red) which I would like to remove/filter out.
I achieved this with the following code:
# Create array [0, 0.01 .... 140], rounded 2 decimals to prevent floating point error
b = np.round(np.arange(0,140.01,0.01),2)
# New empty data array
new_data = np.empty(shape=[0, 2])
# Loop over values to remove/filter out data
for x in b:
Index = np.where(x == data[:,0])[0][0]
new_data = np.vstack([new_data,data[Index]])
I feel like this code is far from optimal and I was wondering if anyone knows a faster/better way of achieving this?
Solution
Here's a solution using pandas for resampling, you can probably achieve the same result in pure numpy but there are a number of floating point and rounding error pitfalls you are going to face, maybe it's better to let a trusted library do the work for you.
Let's say arr
is your data array and assume your index to be in fractions of seconds. You can convert your array to a dataframe with a timedelta index:
df = pd.DataFrame(arr[:,1], index=arr[:,0])
df.index = pd.to_timedelta(df.index, unit="s")
Than resampling it's pretty easy, 10ms
is the frequency you want, first()
should give you the expected result dropping everything but the records at 10ms
ticks, but feel free to experiment with other functions
df = df.resample("10ms").first()
Eventually you could get back to your array with something like:
np.vstack([pd.to_numeric(df.index, downcast="float").values / 1e9,
df.values.squeeze()]).T
Answered By - filippo
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.