Issue
Hi I will try to keep on track but I've done a lot of research and now I just lost. I could really use some expertise here. Below is the situation:
Preface
This is a follow up question from my question here. The issue there was that my cypher
queries were taking 1 second
at the minimum to return a response. Even queries like RETURN 123
also took 1 second
. Which lead to the conclusion Neo4j Bolt Driver for Python is slower than an actual http
call to neo4j
.
I can back this up with research from GitHub Issues and this from stackoverflow
The problem statement
Each time my code runs, it generates upto 10 Cypher
queries and all those have to be fired and then operations need to be performed based on the results.
The issue is using
Bolt
the queries take1 second
to execute and withHTTP
I am stuck. Since I want to useQuery Parameters
to make the query faster since now it's notBolt
as eachhttp
call now takes30ms
, multiply that by 10 {since I have 10 queries} and you have a very poor performing python API to fetch user relations. '
Where am I stuck
- A confirmation that yes, the
Bolt
driver is slow and that I am not doing anything wrong. Since all the posts I've seen are dated a year back - My query has
OR
andAND
conditions, how can I write those using parameters inneo4j
REST
Calls. - Is there some other
graph
database I should look towards? - Is there any way I can fire up to 10 queries and get a response time below
200ms
?
Other reasons to think I am missing something:
- The legend has it,
neo4j
is the most populargraph database
. How is it possible with such drivers? - Over 1 year of reported issues with
BOLT drivers
and they still haven't fixed these issues.
Sample Request
curl -X POST \
http://localhost:7474/db/data/cypher \
-H 'Authorization: Basic bmVvNGo6Y29kZQ==' \
-H 'Cache-Control: no-cache' \
-H 'Content-Type: application/json' \
-d '{
"query" : "MATCH (ct:city)-[:CHILD_OF]->(st:state) WHERE (st.name_wr = {st}) AND (ct.name_wr= {ct}) RETURN st, ct",
"params":
{
"st" : "california",
"ct" : "san francisco"
}
}'
but what if I want to add a clause that either st
should be California
OR it can be Alaska
AND ct
must be san francisco
, how do I do that with the parameters in REST
EDIT:
I replicated the script and below is the verdict:
58 transactions, tps 0.97 maxdelay 1.08
The curl
sample request is the one that fire from postman. The code that I am using can be found from the linked question (in the preface).
Solution
EDIT
Well to be honest the issue was with the IP
I was using localhost
and resolving the localhost
was taking time. As soon as I switched to 127.0.0.1
it started working perfectly fine.
Marking this as the answer as this answer helped to actually benchmark the two approaches that lead to the discovery of the issue in host resolution
I think there must be something wrong with your setup. I've been using the python bolt driver for a while now, and for simple queries, I don't think I've ever seen a 1 second delay. I don't know what you code looks like, or your network delay, but I wrote a quick example to look at the delays I see in my local network (which has very low latency). Using Neo4j 3.2.9 and python driver 1.5.3.)
#!/usr/bin/python
from __future__ import print_function
import sys
import time
from neo4j.v1 import GraphDatabase, basic_auth
ip = '10.10.10.10'
runtime = 60.0
querystr = 'RETURN 123'
runstart = time.time()
maxdelay = 0
cnt = 0
#driver = GraphDatabase.driver("bolt+routing://%s:7687" % ip,
driver = GraphDatabase.driver("bolt://%s:7687" % ip,
auth=basic_auth("neo4j", "password"))
while time.time() - runstart < runtime:
start = time.time()
session = driver.session(access_mode='READ')
ret = session.run(querystr)
session.close()
result = ret.data()
cnt += 1
delay = time.time() - start
if delay > maxdelay:
maxdelay = delay
if delay > 0.1:
print('Large delay seen cnt %s delay %0.2f' % (cnt, delay))
print('%d transactions, tps %0.2f maxdelay %0.2f' % (cnt, cnt/runtime, maxdelay))
I get the output:
117360 transactions, tps 1956.00 maxdelay 0.06
This means the average read took about half a millisecond, and the max was 60ms.
I would look at network latency and issues with resources on both your client and server side.
Answered By - nortoon
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.