Issue
I want to iterate over a large list wherein I need to do some computations using n
elements before the Nth
index of the large list. I've solved it using the following code snippet.
mylist = [1,2,3,4,5,6,7,8,9,10,11,12,13,14]
for i in range(len(mylist)):
j=i+3
data_till_i = mylist[:j]
current_window = data_till_i[-3:]
print(current_window)
I get the following from the above code snippet:
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]
[4, 5, 6]
[5, 6, 7]
[6, 7, 8]
[7, 8, 9]
[8, 9, 10]
[9, 10, 11]
[10, 11, 12]
[11, 12, 13]
[12, 13, 14]
[12, 13, 14]
[12, 13, 14]
Is there any one liner or more efficient way to do the exact same thing that will take less computation time? As my list size is very large (list has length > 100K
), I'm worried about time complexity.
Thank you.
UPDATE:
My actual list is in following format:
[('string_attribute',1659675302861,3544.0), ('string_attribute', 1659675304443, 3544.0).........]
Here, the string_attribute
is some attribute that is same for all the time and can be excluded from the computation.
SOLUTION:
from numpy.lib.stride_tricks import sliding_window_view
dummyList = [(1,'a'),(2,'b'),(3,'c'),(4,'d'),(5,'e'),(6,'f'),(7,'g'),(8,'h'),(9,'i'),(10,'j')]
rolling_window=sliding_window_view(dummyList, window_shape = 3, axis=0)
print(rolling_window)
The output is:
[[['1' '2' '3']
['a' 'b' 'c']]
[['2' '3' '4']
['b' 'c' 'd']]
[['3' '4' '5']
['c' 'd' 'e']]
[['4' '5' '6']
['d' 'e' 'f']]
[['5' '6' '7']
['e' 'f' 'g']]
[['6' '7' '8']
['f' 'g' 'h']]
[['7' '8' '9']
['g' 'h' 'i']]
[['8' '9' '10']
['h' 'i' 'j']]]
Solution
What you're after is called a rolling window operation. If you want to work on list
type specifically, there is a shorter formulation using islice
as proposed here:
window_size = 3
for i in range(len(mylist) - window_size + 1):
print(mylist[i: i + window_size])
If your data is numerical, as in the example, I'd rather propose to use numpy
as this will give you much better performance! Using the proposal from here, your example becomes:
from numpy.lib.stride_tricks import sliding_window_view
sliding_window_view(np.array(mylist), window_shape = 3)
To give you a feeling for the timing, we can turn the options above into functions, create a much longer list, and compare the timing using timeit
e.g. in Jupyter:
def rolling_window_using_iterator(list_, window_size):
result = []
for i in range(len(list_) - window_size + 1):
result.append(list_[i: i + window_size])
return result
def rolling_window_using_numpy(list_, window_size):
return sliding_window_view(np.array(list_), window_shape = 3)
long_list = list(range(10000000))
%timeit rolling_window_using_iterator(long_list, 3)
%timeit rolling_window_using_numpy(long_list, 3)
prints (on my machine):
1.8 s ± 22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
422 ms ± 967 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Answered By - carlo_barth
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.