Issue
I have 2 dataframes so i can either change the first one to match with the second or the second one to match with the first.
This is a snippet of dataframe 1
YYYYMMDD | FG |
---|---|
20140101 | 78 |
20140102 | 72 |
{'YYYYMMDD': {0: 20140101, 1: 20140102, 2: 20140103, 3: 20140104, 4: 20140105}, 'FG': {0: 78, 1: 72, 2: 89, 3: 68, 4: 56}, 'TG': {0: 74, 1: 90, 2: 88, 3: 82, 4: 60}, 'RH': {0: 65, 1: 2, 2: 59, 3: 4, 4: 0}, 'NG': {0: ' 6', 1: ' 4', 2: ' 6', 3: ' 6', 4: ' 5'}}
This is a snippet of dataframe 2
event_id | date |
---|---|
87680 | 2012-01-01 |
87681 | 2012-02-01 |
{'event_id': {0: 87680, 1: 87681, 2: 87682, 3: 87683, 4: 87684}, 'registered_crimes': {0: 442.0, 1: 370.0, 2: 355.0, 3: 275.0, 4: 307.0}, 'crime': {0: 'Diefstal/inbraak woning', 1: 'Diefstal/inbraak woning', 2: 'Diefstal/inbraak woning', 3: 'Diefstal/inbraak woning', 4: 'Diefstal/inbraak woning'}, 'region': {0: 'Rotterdam', 1: 'Rotterdam', 2: 'Rotterdam', 3: 'Rotterdam', 4: 'Rotterdam'}, 'date': {0: '2012-01-01', 1: '2012-02-01', 2: '2012-03-01', 3: '2012-04-01', 4: '2012-05-01'}}
As you can see, both dataframes have a column which specifies the date, i want the row values of these columns to match in their format, since i am creating a database later on, where i want to join the tables on the dates. i have tried a lot, but i keep on getting error messages.
I wanted to change the row values of dataframe 1 of the YYYYMMDD column in either yyyy-mm-dd string format to match df2 date column, or change the row values of dataframe 2 of the date column into yyyymmdd string format to match df1. I keep getting errors. For some reason, if i change the YYYYMMDD row values with the help of to_datetime
, the dates do not match the original ones in the csv file, it just starts from 2014-01-01 and keeps repeating only changing the timestamp after.
Solution
Based on the snippet of data, you can use the below code which uses pd.to_datetime. I have explicitly specified the format here:
df2['date'] = pd.to_datetime(df2['date'], format="%Y-%m-%d")
df1['YYYYMMDD'] = pd.to_datetime(df1['YYYYMMDD'], format="%Y%m%d")
This will convert both columns to the same date-time format, and then you use it to merge/combine the dataframes
Answered By - Suraj Shourie
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.