Issue
I am in the process of migrating to google cloud spanner for an Django application, on GAE, and found the response times to be very long, in the order of 2-3 seconds for and indexed query which takes 10 ms when running in the spanner query editor.
It is also only the first query that takes this long, and any subsequent queries takes around 100 ms.
I encounter the same issue when running a bare minimum query with the python client, I've attached the code below:
from google.cloud import spanner_v1
client = spanner_v1.Client()
i = client.instance("my-instance")
db = i.database("my-database")
start = time.time()
with db.snapshot() as snapshot:
QUERY = (
"""
SELECT
*
FROM
dates
WHERE
date_id = 4507
""")
result = snapshot.execute_sql(QUERY)
for row in result:
print(row)
end = time.time()
print((end-start)*1000)
which in this case takes 1154 ms. Running the query again (without reinitialising the database object) takes 110ms.
Ill include some outputs from a line-profiler run on the function corresponding to db.snapshot():
Timer unit: 0.001 s
Total time: 1.05145 s
File: /Users/opt/anaconda3/envs/dev/lib/python3.8/site-packages/google/cloud/spanner_v1/database.py
Function: __enter__ at line 859
Line # Hits Time Per Hit % Time Line Contents
==============================================================
859 def __enter__(self):
860 """Begin ``with`` block."""
861 1 1051.4 1051.4 100.0 session = self._session = self._database._pool.get()
862 1 0.0 0.0 0.0 return Snapshot(session, **self._kw)
Running this python==3.8.0 and google-cloud-spanner==3.11.1 on a Mac.. Any help or ideas where to troubleshoot is much appreciated
Solution
I think that what you are seeing are down to the following:
- The first query will be slower than the rest, as the first query will have to wait for the session pool that is used by the Python client to be created, and for at least one session to have been created and returned by Cloud Spanner. All subsequent queries will reuse sessions from the pool, which is why you are seeing a lot lower values from the second query on.
- The total execution time for the second query will depend on the physical distance between your client and the Cloud Spanner instance and the network quality. I assume that the query is selecting one row based on the primary key (i.e.
date_id
is the only column in the primary key of the table). The client should preferably be running on a Google App Engine instance that is in the same physical region as your Cloud Spanner instance (e.g. both could be in the regionus-central1
, and not one inus-east1
and one inus-central1
).
If I understand you correctly, the numbers that you provided are based on executions locally from your own development machine. I think that a large part of the execution time difference that you are seeing (compared to the GCP Console) for the second and further queries, are due to network latency between your local machine and Cloud Spanner.
It would be interesting to see if you get better numbers if you execute the test query from a GAE instance or a simple Google Compute Virtual Machine that is in the same physical region as the Cloud Spanner database.
Answered By - Knut Olav Løite
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.