Issue
I have 2 lists:
texts = ["this is string 1", "this is string 2", "this is string 3"]
answers = ["this", "string 1", "string 3"]
I need to create a function that returns as many possible answers from answers
(and their corresponding text) that exist in (exact string match) exactly one of the returned texts. That is, each of the returned answers must be in exactly one of the returned texts.
It should return a list of lists, where each sublist has the [answer, corresponding text]
.
For example, in this example, only "string 1"
appear in the first text ("this is string 1"
), only "string 3"
appear in the last text ("this is string 3"
), and the first answer ("this"
) appear in all of the texts, so it shouldn't be returned.
Another example:
texts = ["this is string 1 this is string 2", "this is string 1", "this is string 3"]
answers = ["this", "string 1", "string 3"]
The returned output in this case can either be
[["string 1", "this is string 1"],["string 3", "this is string 3"]]
Or
[["string 1", "this is string 1 this is string 2"],["string 3", "this is string 3"]]
As "string 1" does not appear in the returned text "this is string 3".
I currently have the following code:
def find_unique_answers(texts, answers):
result = []
for answer in answers:
text_found = None
for text in texts:
if answer in text:
if text_found is None:
text_found = text
else:
text_found = None
break
if text_found is not None:
result.append([answer, text_found])
return result
texts = ["this is string 1", "this is string 2", "this is string 3"]
answers = ["this", "string 1", "string 3"]
output = find_unique_answers(texts, answers)
print(output) # Output: [['string 1', 'this is string 1'], ['string 3', 'this is string 3']]
But, if I change the lists to
texts = ["this is string 1 this is string 2", "this is string 1", "this is string 3"]
answers = ["this", "string 1", "string 3"]
But the output is [['string 3', 'this is string 3']]
Not entirely sure what I'm doing wrong.
Solution
Edit: with the new explanations, I think I know understand the requirements better. It's an optimization problem, which is quite harder.
This is a brute-force approach that tries all combinations of answers. It's not completely stupid, but still takes exponential time. Given your comment on number of answers and texts it should be ok, though.
It always returns the largest amount of answers possible.
def find_all_answer_combinations(texts, answers, used_texts=()):
if not texts or not answers:
# Nothing to do, return an empty combination.
yield ()
return
# Pick an answer to process.
answer, *rest_answers = answers
# Find solutions that *don't* use the selected answer.
yield from find_all_answer_combinations(texts, rest_answers, used_texts)
if any(answer in text for text in used_texts):
# The answer already appears in a returned text, so we can't use it again.
return
# Find texts that have this answer.
matched_texts = [text for text in texts if answer in text]
# Only continue checking texts that don't contain the answer,
# otherwise we'd have two or more matches for this answer.
other_texts = [text for text in texts if answer not in text]
# Try generating combinations with each of the matched texts.
for matched_text in matched_texts:
# Assuming we use `matched_text`, generate combinations
# for the rest of answers/texts.
for result in find_all_answer_combinations(other_texts, rest_answers, used_texts+(matched_text,)):
yield (answer, matched_text), *result
def find_best_answers(texts, answers):
# Finds the combination with most answers.
return max(find_all_answer_combinations(texts, answers), key=len)
texts = ["this is string 1", "this is string 2", "this is string 3"]
answers = ["this", "string 1", "string 3"]
print(find_best_answers(texts, answers)) # (('string 1', 'this is string 1'), ('string 3', 'this is string 3'))
texts = ["this is string 1 this is string 2", "this is string 1", "this is string 3"]
answers = ["this", "string 1", "string 3"]
print(find_best_answers(texts, answers)) # (('string 1', 'this is string 1 this is string 2'), ('string 3', 'this is string 3'))
texts = ["this is string 1 this is string 2", "tis is string 1", "this is string 3"]
answers = ["tis", "string 1", "string 3"]
print(find_best_answers(texts, answers)) # (('string 1', 'this is string 1 this is string 2'), ('string 3', 'this is string 3'))
And this is a greedy algorithm that finds one combination of answers/texts pairs without conflicts. No guarantee on number of answers/texts returned:
def find_unique_answers(texts, answers):
result = []
forbidden_texts = []
used_texts = []
# Reversed order is a hack to get better results with the given examples.
for answer in reversed(answers):
# If the answer already matches a returned text, including
# it here would result in two matching returned texts. So, we skip.
if any(answer in text for text in used_texts): continue
# Find all matching texts, ignoring the "forbidden ones"
# that would conflict with used answers.
matches = [text for text in texts if answer in text and text not in forbidden_texts]
# No valid text, nothing to do. Skip.
if not matches: continue
# Arbitraryly select the first matching text.
selected_text = matches[0]
# We got ourselves a valid pair!
result.append((answer, selected_text))
# Remember the text we just used.
used_texts.append(selected_text)
# And add the other valid texts to "forbidden_texts",
# so that future answers don't add extra matches to this one.
forbidden_texts.extend(matches)
return result
texts = ["this is string 1", "this is string 2", "this is string 3"]
answers = ["this", "string 1", "string 3"]
output = find_unique_answers(texts, answers)
print(output) # Output: [('string 3', 'this is string 3'), ('string 1', 'this is string 1')]
texts = ["this is string 1 this is string 2", "this is string 1", "this is string 3"]
answers = ["this", "string 1", "string 3"]
output = find_unique_answers(texts, answers)
print(output) # [('string 3', 'this is string 3'), ('string 1', 'this is string 1 this is string 2')]
Answered By - BoppreH
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.