Thursday, June 9, 2022

[FIXED] how to add multiple lists while adding multiple columns into pandas dataframe

June 09, 2022 pandas, python No comments

Issue

I have 21 list pairs (date, number of items), there are 21 types of items. I would like to add all of this data to a pandas dataframe with 23 columns (the date, number of item a, number item b ,...,number of item u, total items). in some cases a day will only have one type of item, on other days there could be item a, b, and f for example.

My though was to create a blank dataframe, then append each list with the date in the first column and the "item number" in a new column for each item then somehow sort the dataframe to match the days. for example:

df=pd.DataFrame(columns='date','itemA','itemB','itemC','itemD','itemE','itemF','itemG','itemH','itemI','itemJ','itemK','itemL','itemM','itemN','itemO','itemP','itemQ','itemR','itemS','itemT','itemU','total')

For instance day jan 1 2020 might have 20 of item a 40 of item c and 5 of item m. I imagine that when first appended this data would be on 3 separate rows with data in column a and b, column a and d, column a and n. would there be a way for the pandas dataframe to recognize that the date in column a for all 3 rows are the same and consolidate the data so that it was on one row with data in column a and b and d and n?

Lastly how could I create the last column of total items/day (columns b-v) into a final column?

Solution

import pandas as pd

# input data according to this comment
# https://stackoverflow.com/questions/72520487/#comment128113673_72520940

itemAdates = ['1/1/20', '1/2/20', '1/3/20',  '1/4/20']
itemAcounts = [4, 10, 3, 6]

itemBdates = ['1/1/20', '1/3/20', '1/4/20']
itemBcounts = [9, 5, 6]

itemCdates = ['1/2/20', '1/3/20', '1/4/20']
itemCcounts = [2, 6, 7]

# parsing the data into 1 big list of (date, item_name, item_count)
data = [
    *[(date,  'itemA', item_count) for date, item_count in zip(itemAdates, itemAcounts)],
    *[(date,  'itemB', item_count) for date, item_count in zip(itemBdates, itemBcounts)],
    *[(date,  'itemC', item_count) for date, item_count in zip(itemCdates, itemCcounts)],
]

# parsing the big list into a dictionary with 
# new_data = {date:[('date', date), (item_name, item_count), (item_name, item_count), ...]}
new_data = {}
for date, item_name, item_count in data:
    new_data[date] = new_data.get(date, [('date', date)]) + [(item_name, item_count)]

# converting the list of tuples into dict and appending it into the df_list
df_list = []
for date_values in new_data.values():
    df_list.append(dict(date_values))

# we sort our columns with the sequence of this list
# NOTE: the date must be in the first position
sorted_columns = ['date','itemA','itemB','itemC']

# we create a dataframe from the list of dictionaries
# we fill the empty items with zeros
df = pd.DataFrame(df_list, columns=sorted_columns).fillna(0)

# convert to integers
df[sorted_columns[1:]] = df[sorted_columns[1:]].applymap(int)

# we make a new column 'Total' that summs all the items in each day
# NOTE: the [1:] is to ignore the first column which has the date
df['Total'] = df.apply(lambda row: sum(row[1:]), axis=1)

output:

date	itemA	itemB	itemC	Total
1/1/20	4	9	0	13
1/2/20	10	0	2	12
1/3/20	3	5	6	14
1/4/20	6	6	7	19

Answered By - Alberto Hanna

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, June 9, 2022

[FIXED] how to add multiple lists while adding multiple columns into pandas dataframe

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels