Issue
As a personal project, I am trying to load all of the VA pay data into a database. I wrote a code that will loop through a table that stores the state abbreviations and read this to populate the url for each file.
I use panda to read the url into a DF and then insert it into a database.
I was only able to successfully load the first file, but it looks like the loop stops after that.
How can I identify the issue?
#portion of script starts below
states = list()
states = cur.execute('SELECT abbr FROM ref_states')
for st in states:
test = st[0]
url = 'https://www.va.gov/OHRM/pay/2023/SSR/'+test+'.txt'
# Open file as a dataframe
data = pd.read_csv (url)
df = pd.DataFrame(data)
# Insert DataFrame to Table
for row in df.itertuples():
#define values that will be inserted
values = (row[1],row[2],row[3],row[4],row[5],row[6],row[7],row[8],row[9],row[10])
#print(values)
cur.execute(insert_query, values)
conn.commit()
I wrote a loop that generate the URL for each file that I want to load.
I plan on looping through each file, read it into a DF, insert it into a DB, and loop onto the next file.
Solution
Here is an example how you can load all CSV from the URL, concatenate all dataframes to one final dataframe and store it under the table my_data
:
import sqlite3
import pandas as pd
# create sample memory sqlite DB
conn = sqlite3.connect(":memory:")
cursor = conn.cursor()
create_table_query = """
CREATE TABLE ref_states (
abbr TEXT
)
"""
cursor.execute(create_table_query)
conn.commit()
insert_query = "INSERT INTO ref_states (abbr) VALUES ('AK'), ('AL'), ('AS');"
cursor.execute(insert_query)
# select abbreviation from ref_states
cursor.execute("SELECT abbr FROM ref_states")
rows = cursor.fetchall()
all_dfs = []
for row in rows:
url = f"https://www.va.gov/OHRM/pay/2023/SSR/{row[0]}.txt"
df = pd.read_csv(url)
print(df.head(3))
print()
all_dfs.append(df)
# create final dataframe and save it to DB:
final_df = pd.concat(all_dfs)
final_df.to_sql("my_data", conn, if_exists="replace")
# test it:
cursor.execute("SELECT * FROM my_data")
rows = cursor.fetchall()
for row in rows:
print(row)
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.