Issue
I've been trying to run txtai in hopes of getting semantic search working in ElasticSearch. My main goal is to be able to use this to query against tickets in a help desk system and return tickets that are similar to my query.
Example Query: What operating system should I use?
This would return a list of results (similar to what stackoverflow does when typing in the title of my question).
In using txtai, I've noticed that it is abysmally slow. Requesting for one result and my response time is almost 10 seconds vs the "instantaneous" speed of ElasticSearch returning 50 results. Perhaps there is something I am missing on how this should perform.
I'll share the test code I'm currently working with:
from txtai.pipeline import Similarity
from elasticsearch import Elasticsearch, helpers
# Connect to ES instance
es = Elasticsearch(hosts=["http://localhost:9200"], timeout=60, retry_on_timeout=True)
def ranksearch(query, limit):
results = [text for _, text in search(query, limit * 10)]
return [(score, results[x]) for x, score in similarity(query, results)][:limit]
def search(query, limit):
query = {
"size": limit,
"query": {
"query_string": {"query": query}
}
}
results = []
for result in es.search(index="articles", body=query)["hits"]["hits"]:
source = result["_source"]
results.append((min(result["_score"], 18) / 18, source["title"]))
return results
similarity = Similarity("valhalla/distilbart-mnli-12-3")
limit = 1
query = "Bad News"
print(ranksearch(query, limit))
Any help is appreciated.
Solution
The following answer is a summary of a discussion on GitHub. The full discussion can be found here: https://github.com/neuml/txtai/issues/319
From the post I made there:
GPUs do make a huge difference but you can get decent runtime performance with CPUs.
The similarity pipeline is probably a bit heavy for the T410. Even the smallest models are 1GB in size, which is still a large model. This appears to be a use case for a smaller embeddings similarity model. There will be a tradeoff on accuracy but it could be the right mix of performance to accuracy.
The example code below modifies your example to compute similarity using an embeddings model. For reference, this model is 90MB.
from txtai.embeddings import Embeddings from elasticsearch import Elasticsearch, helpers # Connect to ES instance es = Elasticsearch(hosts=["http://localhost:9200"], timeout=60, retry_on_timeout=True) def ranksearch(query, limit): results = [text for _, text in search(query, limit * 10)] return [(score, results[x]) for x, score in embeddings.similarity(query, results)][:limit] def search(query, limit): query = { "size": limit, "query": { "query_string": {"query": query} } } results = [] for result in es.search(index="articles", body=query)["hits"]["hits"]: source = result["_source"] results.append((min(result["_score"], 18) / 18, source["title"])) return results import time start = time.time() embeddings = Embeddings({"path": "sentence-transformers/all-MiniLM-L6-v2"}) print(f"Load time {time.time() - start}") start = time.time() ranksearch("Bad News", 1) print(f"Query 1 {time.time() - start}") start = time.time() ranksearch("Good News", 1) print(f"Query 2 {time.time() - start}")
Answered By - David Mezzetti
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.