Issue
I have an array of 16 columns and 22 rows.
please advice how to find all possible combinations with the following rules:
- I can't choose more than 1 value per row
- sum of column's indexes should be equal to 16 (1+1+14,15+1,2+2+2+2+2+6 etc.)
- sum of values in selected cells should be between 39 and 42
For example I've choose 5 combinations:
- 1st: Sum of values(8.4+4.8+5.7+4+3.6+3.8=41.3) Sum of column values = 3+3+2+2+2+2+2=16
- 2nd: sum of values(36.7) Sum of column values = 3+3+3+3+3+1 = 16
- 3rd: sum of values (40.8) sum of column values = 3+3+3+7 = 16
- 4th: sum of values (41.7) sum of column values = 3+3+3+7 = 16
- 5h sum of values(41) sum of column values = 16 = 16
- etc.
I've found how to get all possible column combinations: import itertools
def summs(answer, *dig):
res = []
for i in range(1, answer + 1):
for j in itertools.combinations_with_replacement(list(dig), i):
if sum(list(j)) == answer:
res.append(j)
return res
Now I have 231 different column indexes combinations each sum is equal to 16.
[(1, 15), (2, 14), (3, 13), (4, 12), (5, 11), (6, 10), (7, 9), (8, 8), (1, 1, 14), (1, 2, 13), (1, 3, 12), (1, 4, 11), (1, 5, 10), (1, 6, 9), (1, 7, 8), (2, 2, 12), (2, 3, 11), (2, 4, 10), (2, 5, 9)... etc.
Now I'm trying to find all row sum combinations for above column index combinations:
for example for (1,15) array (2 columns, 22 rows) I should find all sum combinations, except the same row sums (row 1 and 2, 2 and 3 etc)
for (1, 5, 10) its array of 3 columns, 22 rows
and so on until (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) which is array of 16 columns, 22 rows
red colors are numbers grater than 42
other 5 colors are possible combinations with a sum between 39 and 42
col1 | col2 | col3 | col4 | col5 | |
---|---|---|---|---|---|
row1 | 2 | 4 | 7 | 9 | 11 |
row2 | 3 | 6 | 8 | 11 | 14 |
row3 | 2 | 5 | 7 | 10 | 12 |
row4 | 3 | 6 | 9 | 11 | 14 |
row5 | 2 | 4 | 6 | 8 | 10 |
row6 | 2 | 4 | 5 | 7 | 9 |
- Condition 1, sum of column index is 5: Possible combinations are [(5),(1, 4), (2, 3), (1, 1, 3), (1, 2, 2), (1, 1, 1, 2), (1, 1, 1, 1, 1)], this means that you can pick value from 5th column or 1 value from col1 & 1 value from col4, or 2 values from col1, 1 value col3, 1 value col1 and 2values from col2, 3 values from col1 and 1 value from col2, etc
- Condition 2: find all possible sum combinations between column's items except cells numbers in same row(for example, for (1,4): 2+9 in the same row need to skip it,2+11, 2+10, 2+11,2+8,2+7; 3+9,3+11 in the same row, teed to skip it 3+10, 3+11,3+8,3+7 etc.)
- Condition 3:sum of items should not exceed 11 for example, so I should accept only 2+8, 2+7 and 3+7 from the previous step
Output should be like (row5,col5), (row6,col5), [(row1,col1)&(row5,col4)], [(row1,col1)&(row6,col4) etc
Solution
It's very inefficient to generate all possible combinations of row/columns and then check the sums for the generated list. The number of required tests is extremely large. For example, if you want to test just 3 columns (2, 5, 9) in all possible rows, then you can select rows in 22!/(3!*(22-3)!) = 1540 ways. And if you want to test 11 columns (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 6) in all possible rows, then you can select rows in 22!/(11!*11!) = 705432 ways. So, this way you'll need to test dozens of millions of combinations.
Instead, you can make a recursive solution, failing quickly when you know that a solution is not possible:
import pandas as pd
def find_combinations(df, from_row, current_sum, current_column_sum,
target_sum_min, target_sum_max, target_column_sum, cells_list):
"""
:param df: the dataframe
:param from_row: zero-based row in which to perform the search
:param current_sum: current sum of selected cells above this row
:param current_column_sum: current sum of column numbers
:param target_sum_min: minimum required sum of cells
:param target_sum_max: maximum required sum of cells
:param target_column_sum: required sum of column numbers
:param cells_list: list of selected cells above this row
"""
if from_row >= len(df) or \
current_sum > target_sum_max or \
current_column_sum >= target_column_sum :
return
max_column = max(len(df.columns), target_column_sum - current_column_sum)
for column in range(max_column):
number = df.iloc[from_row][column]
new_sum = current_sum + number
if new_sum > target_sum_max:
break
if target_sum_min <= new_sum <= target_sum_max and \
current_column_sum + column + 1 == target_column_sum:
print([*cells_list, (from_row + 1, column + 1)])
find_combinations(df, from_row + 1, new_sum, current_column_sum + column + 1,
target_sum_min, target_sum_max, target_column_sum,
[*cells_list, (from_row + 1, column + 1)])
find_combinations(df, from_row + 1, current_sum, current_column_sum,
target_sum_min, target_sum_max, target_column_sum,
cells_list)
df = pd.read_csv("data.csv", header=None)
find_combinations(df, 0, 0.0, 0, 39.0, 42.0, 16, [])
PS. I used OCR to extract text data from your image, so my data.csv file just contains data without any header line.
Answered By - Alex Sveshnikov
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.