Issue
I have 21 list pairs (date, number of items), there are 21 types of items. I would like to add all of this data to a pandas dataframe with 23 columns (the date, number of item a, number item b ,...,number of item u, total items). in some cases a day will only have one type of item, on other days there could be item a, b, and f for example.
My though was to create a blank dataframe, then append each list with the date in the first column and the "item number" in a new column for each item then somehow sort the dataframe to match the days. for example:
df=pd.DataFrame(columns='date','itemA','itemB','itemC','itemD','itemE','itemF','itemG','itemH','itemI','itemJ','itemK','itemL','itemM','itemN','itemO','itemP','itemQ','itemR','itemS','itemT','itemU','total')
For instance day jan 1 2020 might have 20 of item a 40 of item c and 5 of item m. I imagine that when first appended this data would be on 3 separate rows with data in column a and b, column a and d, column a and n. would there be a way for the pandas dataframe to recognize that the date in column a for all 3 rows are the same and consolidate the data so that it was on one row with data in column a and b and d and n?
Lastly how could I create the last column of total items/day (columns b-v) into a final column?
Solution
import pandas as pd
# input data according to this comment
# https://stackoverflow.com/questions/72520487/#comment128113673_72520940
itemAdates = ['1/1/20', '1/2/20', '1/3/20', '1/4/20']
itemAcounts = [4, 10, 3, 6]
itemBdates = ['1/1/20', '1/3/20', '1/4/20']
itemBcounts = [9, 5, 6]
itemCdates = ['1/2/20', '1/3/20', '1/4/20']
itemCcounts = [2, 6, 7]
# parsing the data into 1 big list of (date, item_name, item_count)
data = [
*[(date, 'itemA', item_count) for date, item_count in zip(itemAdates, itemAcounts)],
*[(date, 'itemB', item_count) for date, item_count in zip(itemBdates, itemBcounts)],
*[(date, 'itemC', item_count) for date, item_count in zip(itemCdates, itemCcounts)],
]
# parsing the big list into a dictionary with
# new_data = {date:[('date', date), (item_name, item_count), (item_name, item_count), ...]}
new_data = {}
for date, item_name, item_count in data:
new_data[date] = new_data.get(date, [('date', date)]) + [(item_name, item_count)]
# converting the list of tuples into dict and appending it into the df_list
df_list = []
for date_values in new_data.values():
df_list.append(dict(date_values))
# we sort our columns with the sequence of this list
# NOTE: the date must be in the first position
sorted_columns = ['date','itemA','itemB','itemC']
# we create a dataframe from the list of dictionaries
# we fill the empty items with zeros
df = pd.DataFrame(df_list, columns=sorted_columns).fillna(0)
# convert to integers
df[sorted_columns[1:]] = df[sorted_columns[1:]].applymap(int)
# we make a new column 'Total' that summs all the items in each day
# NOTE: the [1:] is to ignore the first column which has the date
df['Total'] = df.apply(lambda row: sum(row[1:]), axis=1)
output:
date | itemA | itemB | itemC | Total |
---|---|---|---|---|
1/1/20 | 4 | 9 | 0 | 13 |
1/2/20 | 10 | 0 | 2 | 12 |
1/3/20 | 3 | 5 | 6 | 14 |
1/4/20 | 6 | 6 | 7 | 19 |
Answered By - Alberto Hanna
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.