Issue
I have a problem where I have a large binary numpy array (1000,2000). The general idea is that the array's columns represent time from 0 to 2000 and each row represents a task. Each 0 in the array represents a failure and each 1 represents success.
What I need to do is select 150 tasks(row axis) out of 1000 available and maximize the total successes (1s) over unique columns. It does not have to be consecutive and we are just looking to maximize success per time period (just need 1 success any additional is extraneous). I would like to select the best "Basket" of 150 tasks. The subarray rows can be taken anywhere from the 1000 initial rows. I want the optimal "Basket" of 150 tasks that lead to the most success across time (columns). (Edited for Additional Clarity)
A real basic example of what the array looks like :
array([[0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0],
[0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0],
[1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0],
[1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0],
[1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0]])
I have successfully created a Monte Carlo simulation using randomly generated baskets of tasks in NumPy and then going through the array and summing. As you can imagine this takes a while and given the large number of potential combinations it is inefficient. Can someone point me to a algorithm or way to set this problem up in a solver like PuLP?
Solution
Try this:
n = 150
row_sums = np.sum(x, axis=1)
top_n_row_sums = np.argsort(row_sums)[-n:]
max_successes = x[top_n_row_sums]
This takes the sum of every row, grabs the indices of the highest n
sums, and indexes into x
with those row indices.
Note that the rows will end up sorted in ascending order of their sums across the columns. If you want the rows in normal order (ascending order by index), use this instead:
max_successes = x[sorted(top_n_row_sums)]
Answered By - ddejohn
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.