Issue
I have a list of tuples with each tuple carrying the following information:
(start_position,end_position,list of strings)
A sample list is given as follows:
aList = [(6, 9, ['ataH']),
(4, 9, ['svataH']),
(0, 9, ['vEvasvataH']),
(2, 5, ['vasu', 'vasU']),
(1, 3, ['Eva', 'eva']),
(0, 1, ['vA', 'vE'])]
I need to find all sequences of tuples such that each sequence has to cover all positions from the start_position
to end_position
, in this case from 0
to 9
. In a sequence, say a
, adjacent tuples need to satisfy the constraints that a[i+1][0] - a[i][1] <= 1
.
Summarily, the output should be as follows:
[[(0, 1, ['vA', 'vE']), (2,5,['vasu', 'vasU']), (6, 9, ['ataH']) ],
[(0, 1, ['vA', 'vE']), (1, 3, ['Eva', 'eva']), (4, 9, ['svataH'])],
[(0, 9, ['vEvasvataH'], [7])]]
I have used the following code to achieve the same.
maxVal = max(aList,key=lambda item:item[1])[1]
diffSets = list()
for i,item in enumerate(aList):
if maxVal == item[1]:
_temp = [item]
currStart = item[0]
for j,stuff in enumerate(aList):
if i != j:
if currStart == stuff[1] or currStart == stuff[1]+1:
_temp.append(stuff)
currStart = stuff[0]
diffSets.append(_temp) #the output needs to be reversed to get the sequence in the order as shown above
Is there a more efficient and faster way to achieve the same, say using itertools
?
Solution
Here's how I would go about it (not to say that it's necessarily faster). First, you can start by sorting the data based on both start and end. This will mean when we look at combinations later we won't have to backtrack to other values in our result (we will know that the start of entry[i]
must be less than or equal to the start of entry[i+1]
)
import itertools
import operator
data = [
(6, 9, ['ataH']),
(4, 9, ['svataH']),
(0, 9, ['vEvasvataH']),
(2, 5, ['vasu', 'vasU']),
(1, 3, ['Eva', 'eva']),
(0, 1, ['vA', 'vE'])
]
data = sorted(data, key=operator.itemgetter(0, 1))
start = min(data, key=operator.itemgetter(0))[0]
end = max(data, key=operator.itemgetter(1))[1]
Now we have the data sorted, and know our start and end values.
To find all of the combinations of the data for any size of subset, we can use this trick: https://stackoverflow.com/a/5898031/3280538
def all_combinations(data):
for i in range(len(data)):
yield from itertools.combinations(data, r=i+1)
Here we use yield
to avoid creating expensive containers. Now we write another method that uses these combinations, and for each one checks it's validity:
def valid_combinations(data):
for comb in all_combinations(data):
if comb[0][0] != start or comb[-1][1] != end:
continue
for i in range(len(comb) - 1):
if not 0 <= comb[i+1][0] - comb[i][1] <= 1:
break
else:
yield comb
Here I'm using a neat trick with the for-loop. The else
block on the loop will only execute if the for loop completes naturally and does not break, and if we don't break then we know each item is valid.
All together:
import itertools
import operator
def all_combinations(data):
for i in range(len(data)):
yield from itertools.combinations(data, r=i+1)
def valid_combinations(data):
data = sorted(data, key=operator.itemgetter(0, 1))
start = min(data, key=operator.itemgetter(0))[0]
end = max(data, key=operator.itemgetter(1))[1]
for comb in all_combinations(data):
if comb[0][0] != start or comb[-1][1] != end:
continue
for i in range(len(comb) - 1):
if not 0 <= comb[i+1][0] - comb[i][1] <= 1:
break
else:
yield comb
Getting the results:
from pprint import pprint
pprint(list(valid_combinations(
[
(6, 9, ['ataH']),
(4, 9, ['svataH']),
(0, 9, ['vEvasvataH']),
(2, 5, ['vasu', 'vasU']),
(1, 3, ['Eva', 'eva']),
(0, 1, ['vA', 'vE'])
]
)))
[((0, 9, ['vEvasvataH']),),
((0, 1, ['vA', 'vE']), (1, 3, ['Eva', 'eva']), (4, 9, ['svataH'])),
((0, 1, ['vA', 'vE']), (2, 5, ['vasu', 'vasU']), (6, 9, ['ataH']))]
Answered By - flakes
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.