Issue
I have the following lists:
a = [ 1, 6, 76, 15, 46, 55, 47, 15, 72, 58, ..] # there could be more than 10 elements in each
b = [17, 48, 22, 7, 35, 19, 91, 85, 49, 35, ..]
c = [46, 8, 53, 49, 28, 82, 30, 86, 57, 9, ..]
d = [82, 12, 24, 60, 66, 17, 13, 69, 28, 99, ..]
e = [ 1, 53, 17, 82, 21, 20, 88, 10, 82, 41, ..]
I want to write a function which takes any number of those list (could be all, could be only a
and c
for example) as its argument and selects the leftmost unique 10 elements equally from every list. For example, I will show in pictures with.
The initial data we have (length of 10 assumption).
We look at the first elements of every row and see a
and e
have same values. We randomly select let's say e
, remove that element and shift it to the left and get this
Here we see that there is again overlap, 17 is appearing already and we shift e
one more time
Again similar problem and we shift it one last time
Finally, we can select the first two elements of each list and there will be no duplicates
[1, 6, 17, 48, 46, 8, 82, 12, 21, 53]
It could be that more than one list could have identical values, same rules should apply.
I came with this which and for solving randomness I decided to shuffle the list before using it:
def prepare_unique_array(
arrays: list = [],
max_length: int = 10,
slice_number: int = 2
):
unique_array = []
for array in arrays:
for i in range(slice_number):
while not len(unique_array) == max_length:
if array[i] not in unique_array:
unique_array.append(array[i])
else:
while array[i+1] in unique_array:
i += 1
unique_array.append(array[i+1])
return unique_array
Which gives the desired result given those initial values, but anything changes and it does not work.
maybe there is a numpy approach which does it faster and easier as well.
I will appreciate any guide/help
Solution
Using cycle
and iter
to pick one element from each iterable, alternately:
from itertools import cycle
def uniques_evenly(n, *iterables):
its = cycle(iter(seq) for seq in iterables)
seen = set()
it = next(its)
for _ in range(n):
x = next(it)
while x in seen:
x = next(it) # pick next unique number
seen.add(x)
yield x
it = next(its) # switch to next iterator
Note that this will crash if one of the iterators is too short.
Testing:
a = [ 1, 6, 76, 15, 46, 55, 47, 15, 72, 58, 37756, 712, 666]
b = [17, 48, 22, 7, 35, 19, 91, 85, 49, 35, 42]
c = [46, 8, 53, 49, 28, 82, 30, 86, 57, 9]
d = [82, 12, 24, 60, 66, 17, 13, 69, 28, 99]
e = [ 1, 53, 17, 82, 21, 20, 88, 10, 82, 41, 216]
print( list(uniques_evenly(10, a,b,c,d,e)) )
# [1, 17, 46, 82, 53, 6, 48, 8, 12, 21]
Explanations
We use iter()
to transform a list into an iterator. An iterator is something that "consumes" values and returns them, at every call of next()
. For instance:
l = [3, 4, 7] # l is a list
i = iter(l) # i is an iterator
print( next(i) )
# 3
print( next(i) )
# 4
print( next(i) )
# 7
print( next(i) )
# raises exception StopIteration
Then, we use itertools.cycle
to alternate between the five iterators. cycle
returns an infinite iterator that cycles between the items in the list we gave it. We gave it a list of five iterators:
its = cycle(iter(seq) for seq in iterables)
This is the same thing as if we had written:
its = cycle([iter(a), iter(b), iter(c), iter(d), iter(e)]
Here is a demonstration with only two iterators instead of 5:
a = ['hello', 'world']
b = [5, 12]
its = cycle([iter(a), iter(b)])
it = next(its) # it is now the iterator on a
print( next(it) ) # 'hello'
it = next(its) # it is now the iterator on b
print( next(it) ) # 5
it = next(its) # it cycles back to a
print( next(it) ) # 'world'
it = next(its) # it is now b
print( next(it) ) # 12
it = next(its) # it is a
print( next(it) ) # raises exception StopIteration
So this is essentially what happens in uniques_evenly
.
In addition, we use a set seen
to remember the elements we have already seen. Sets are cool because testing for membership is a fast operation: if x in seen:
is a constant-time operation, it doesn't matter how big seen
is.
Now it
is one of the five iterators, say the iterator on list d
. We pick the next element in d
with x = next(it)
; then we run the loop:
while x in seen:
x = next(it)
Which means: if x
is an element already seen previously, then pick the next element in d
. Keep picking the next element until you find an element that wasn't seen previously. Once we've finally picked an element x
that hasn't been seen previously, we add it to set seen
so it won't be picked again in the future, and then:
yield x
This is a bit special. It means uniques_evenly
is not a function; if it was a function, we would have used the keyword return
and not yield
. Instead, uniques_evenly
is a generator. Instead of using this syntax with yield
, the other alternative would have been to declare a list result
, initially empty, then instead of yield x
, I would have written result.append(x)
; then at the very end of the function, return result
. Then uniques_evenly
would have been a function that returns a list.
The difference between a function that returns a list, and a generator, is a bit subtle. Basically a generator behaves like a lazy list, whose elements are computed and produced only when needed.
When testing my code, I immediately converted the generator to a list anyway:
print( list(uniques_evenly(10, a,b,c,d,e)) )
So the difference doesn't matter. If you're more comfortable with having a variable result
and using result.append(x)
instead of yield x
, then returning result
at the end of the function, you can do it this way instead.
Answered By - Stef
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.