Issue
In Big Query, I have a table with 608 GB of data, 50 million rows, and 2651 columns. I'm trying to load it into Jupyter Lab as a pandas dataframe before doing any modeling. I'm saving the query's results into a pandas dataframe as a destination using %%bigquery. However, because of the big size, I'm getting an error. I followed the documentation here and a couple of stackoverflow discussions (this) that suggested using LIMIT and setting query.allow large results = True
. However, I am unable to determine how I can apply them to my specific problem.
Kindly please advise.
Thanks.
Solution
If you want to use configuration.query.allowLargeResults
and set it to true, you should add a destination table object.
Set allowLargeResults
to true in your job configuration.
If you are using python, you can see this example using allow_large_results
and set it to true.
from google.cloud import bigquery
# Construct a BigQuery client object.
client = bigquery.Client()
# TODO(developer): Set table_id to the ID of the destination table.
# table_id = "your-project.your_dataset.your_table_name"
# Set the destination table and use_legacy_sql to True to use
# legacy SQL syntax.
job_config = bigquery.QueryJobConfig(
allow_large_results=True, destination=table_id, use_legacy_sql=True
)
sql = """
SELECT corpus
FROM [bigquery-public-data:samples.shakespeare]
GROUP BY corpus;
"""
# Start the query, passing in the extra configuration.
query_job = client.query(sql, job_config=job_config) # Make an API request.
query_job.result() # Wait for the job to complete.
print("Query results loaded to the table {}".format(table_id))
If you are querying via API
"configuration": { "query": { "allowLargeResults": true, "query": "select uid from [project:dataset.table]" "destinationTable": [project:dataset.table] } }
Using allow_large_results
has limitations. These are the limitations:
- You must specify a destination table.
- You cannot specify a top-level ORDER BY, TOP, or LIMIT clause.
- Window functions can return large query results only if used in conjunction with a PARTITION BY clause.
You can see this official documentation.
Answered By - Raul Saucedo
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.