Issue
I have a dataframe that I generate on a periodic basis that looks similar to below.
key_hash,hash_1,hash_2,date_updated,date_created,number_1,number_2,float_1,float_2
18f1a22863a51189370b88001715fa042f94f0305fd3ca17de48077b0bd9c3e7,0c71efcd973b0e06498ba31ad5146d4ef0265640657f86674f889b480a1d9824,f04930f26fb349f9b88cca78f95de10419ae5ef91ff7604ba94f7f17dc185e0f,11/5/2021,11/11/2021,33,24,0.2,0.43
38c01443127ffd97fe394172db254ee54a1c42ee32129a9d8c30328e53cfd970,61a93fd8a8746124ad2a8651b201368cf9cf6acbe989abdff883d1a94761ba91,7bbae6a90ca5bce7c26c1c1a1eae9904652bc4606bb05c6450e08ea5da410214,1/29/2022,12/9/2021,30,73,0.68,0.46
58717249dd6365fd9169cd15f595d121fa4f9f927660c210658ef1c592790863,56e4ac16a3d9c358674684651777a9b9f094000ab8fda6b0321ccbd5538c76d3,b54a469453d3890b00963e741d1925f8a4d32730fc22d8acabe5f3663997d895,2/16/2022,11/29/2021,8,6,0.25,0.66
684d09261e9c9fb52340062ef66906b74420605dbc786dc79903be4d75fb91d6,60d62e98e524c879ed90c271a854a63f320fc4c93cb6779c9e3b044b3cab4b74,ca4cf107d936381ff5ee2a9e08ec247b612d67c1cff5d3b7f2717512e9d598a0,11/23/2021,6/20/2022,25,5,0.29,0.07
72d98b8e582fb8797bfc2e6ea843773e0af1821bfb59855fd200939837a61185,5a87d4fdf31fb0619e3571c7fa3e6ff8de998a6bffa9c618610073de284ad106,0f8609994efa03a77609a592d25235fa8c7b5f25f01ff746c6c3f577eb0359a8,6/3/2022,2/17/2022,87,73,0.13,0.45
I also have a sql db with the same dataset but over a longer period of time.
My goal is to lookup data in the sql db based on the key_hash and store it in my dataframe, conduct some basic functions against that data such as threshold comparisons, and update my sql database based on the new dataframes metrics.
I have been able to connect through read_sql but am currently unsure how to conduct the query based on the key_hash column.
Any assistance or direction would be awesome, thank you.
Solution
Ended up making this work for me.
df = pd.read_csv("my_csv")
items = "\"" + "\",\"".join(map(str, df.key_hash.tolist())) + "\""
query = f"SELECT * FROM <dbname> WHERE key_hash in ({items});"
sql_df = pd.read_sql(query, connector)
combined = pd.merge(df, sql_df, how='left' on=['key_hash']
Answered By - johnnyb
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.