Issue
I am loading data from various sources (csv, xls, json etc...) into Pandas dataframes and I would like to generate statements to create and fill a SQL database with this data. Does anyone know of a way to do this?
I know pandas has a to_sql
function, but that only works on a database connection, it can not generate a string.
Example
What I would like is to take a dataframe like so:
import pandas as pd
import numpy as np
dates = pd.date_range('20130101',periods=6)
df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))
And a function that would generate this (this example is PostgreSQL but any would be fine):
CREATE TABLE data
(
index timestamp with time zone,
"A" double precision,
"B" double precision,
"C" double precision,
"D" double precision
)
Solution
If you only want the 'CREATE TABLE' sql code (and not the insert of the data), you can use the get_schema
function of the pandas.io.sql module:
In [10]: print pd.io.sql.get_schema(df.reset_index(), 'data')
CREATE TABLE "data" (
"index" TIMESTAMP,
"A" REAL,
"B" REAL,
"C" REAL,
"D" REAL
)
Some notes:
- I had to use
reset_index
because it otherwise didn't include the index - If you provide an sqlalchemy engine of a certain database flavor, the result will be adjusted to that flavor (eg the data type names).
Answered By - joris
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.