Issue
I am receiving .xlsx file and need to update Sqlite3 table, code below works but its kind a slow and have feeling that I am doing something wrong. Please advice how to speed up update process? Thanks in advance. step 1)First using regex to split data into 3 Data-Frames step 2)Cleaning the data and creating dictionary step 3)Update Sqlite3 table while iterating thru dictionary in double four loop
import pandas as pd
import sqlite3
def clean(data):
df = data[['loc', 'date']].reset_index(drop = True)#Filtering columns that i need
df['date'] = df['date'].dt.isocalendar().week #Change column values to weeks
return df
def update_cycle_counting(df):
#Regex to filter data
m = df[df['loc'].str.contains('A-[a-zA-Z]\d{2}-\d{3}-\d{2}.\d{2}|E[a-zA-Z]\d{3}-\d{4}|M[a-zA-Z]\d{3}-\d{4}|SAFE\d*')]
j1 = df[df['loc'].str.contains('C-[a-zA-Z]\d{2}-\d{3}-\d{2}.\d{2}')]
j2 = df[df['loc'].str.contains('B-[a-zA-Z]\d{2}-\d{3}-\d{2}.\d{2}')]
#Assign cleaned data to new variables
m = clean(m)
j1 = clean(j1)
j2 = clean(j2)
#Creating dictionary to loop thru
wh = {'m':m, 'j1':j1, 'j2':j2}
#Create path and connect to database
path ='count.db'
conn = sqlite3.connect(path)
#Loop dict table names == dict.keys
for k,v in wh.items():
#Updating rows
for i, row in v.iterrows():
cur = conn.cursor()
cur.execute(f'UPDATE {k} SET "{row[1]}"= 1 WHERE "loc" = "{row[0]}";')
conn.commit()
cur.close()
conn.close()
Solution
Your code doesn't work because you use f-string
in wrong way.
If you want to create query like
UPDATE test SET '1'= '1' + 1 WHERE bin = 'c'
then you have to add some { }
and ' '
like
f"UPDATE test SET '{row[2]}' = '{row[2]}' + 1 WHERE bin = '{row[1]}';
But I don't know why you use SQLite
if you can do it directly in DataFrame
for index, row in upd.iterrows():
df.loc[ df['bin'] == row['bin'], row['data']] += 1
Minimal working code:
import pandas as pd
df = pd.DataFrame({
'bin': ['a', 'b', 'c', 'd', 'e'],
'1': [0, 0, 0, 0, 0],
'2': [0, 1, 0, 0, 0],
'3': [0, 0, 0, 0, 0],
"type": ['x', 'x', 'x', 'x', 'x']
})
upd = pd.DataFrame({'bin': ['b', 'c'], 'data': ['2', '3']})
print('--- before ---')
print(df)
for index, row in upd.iterrows():
df.loc[ df['bin'] == row['bin'], row['data']] += 1
print('--- after ---')
print(df)
Result:
--- before ---
bin 1 2 3 type
0 a 0 0 0 x
1 b 0 1 0 x
2 c 0 0 0 x
3 d 0 0 0 x
4 e 0 0 0 x
--- after ---
bin 1 2 3 type
0 a 0 0 0 x
1 b 0 2 0 x
2 c 0 0 1 x
3 d 0 0 0 x
4 e 0 0 0 x
Answered By - furas
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.