Issue
This is a bit of a follow-up to an earlier question I had asked a little over a year ago
I have a "master" dataframe that has the product codes and names of a lot of materials along with their monthly consumption. something like:
product code | Name | Consumption A | Consumption B | Consumption C | Consumption D |
---|---|---|---|---|---|
123 | AA | 100 | 120 | 130 | 140 |
456 | BB | 5 | 7 | 9 | 11 |
789 | CC | 12 | 5 | 33 | 89 |
134 | AD | 4 | 17 | 37 | 57 |
467 | BD | 1 | 3 | 5 | 7 |
179 | ED | 6 | 19 | 30 | 61 |
426 | FD | 8 | 5 | 2 | 13 |
And I also have a second table that has a list of some product codes that are "alternatives" or substitutes, like:
Product Code | Alt Code |
---|---|
123 | 134 |
123 | 179 |
123 | 426 |
456 | 467 |
(The difference from the previous one is that now, the same "Product Code" can have multiple different "Alt Code"s)
How can I can use this second dataframe to process the first one such that it becomes:
product code | Name | Consumption A | Consumption B | Consumption C | Consumption D |
---|---|---|---|---|---|
123 / 134 / 179 / 426 | AA / AD / ED / FD | 118 | 161 | 199 | 271 |
456 / 467 | BB / BD | 6 | 10 | 14 | 18 |
789 | CC | 12 | 5 | 33 | 89 |
where the product codes and names have been concatenated into the same cell, the quantities summed, and the "duplicates" of the alternatives deleted?
Unlike last time, there are now places where a single "main" code has multiple alternatives, but they will always be listed with the main code in the first column, and the alternative in the "alt code" column. I would ideally like to merge all alternative rows into a single one using the same bit of code.
This is the code I have been trying to use, based on the answer I got last time:
if alt_name != "":
altf = pd.read_excel(io=alt_name)
group = df['Material'].map(lambda x: altf.set_index('Material')['Alt Material'].get(x, x))
d = {c: 'sum' for c in df.columns}
out = (df
.astype({'Material': str})
.groupby([group], as_index=False)
.agg({**d, **{'Material': ' / '.join, 'Description': ' / '.join}})
)
df = out
alt_name is the name of the excel sheet with the columns, saved as "Material" and "Alt Material".
when I try running it with a table with multiple alternatives, I get the following error:
Traceback (most recent call last):
File "/Users/[Location]/function_code_v2.py", line 285, in <module>
trendcharts(file_name, zone, combinezones, month, numMonths, percent_wanted, trendgen, alt_name, save_directory)
File "/Users/[Location]/function_code_v2.py", line 106, in trendcharts
.agg({**d, **{'Material': ' / '.join, 'Description': ' / '.join}})
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/[Location]/.venv/lib/python3.12/site-packages/pandas/core/groupby/generic.py", line 1445, in aggregate
result = op.agg()
^^^^^^^^
File "/Users/[Location]/.venv/lib/python3.12/site-packages/pandas/core/apply.py", line 175, in agg
return self.agg_dict_like()
^^^^^^^^^^^^^^^^^^^^
File "/Users/[Location]/.venv/lib/python3.12/site-packages/pandas/core/apply.py", line 406, in agg_dict_like
return self.agg_or_apply_dict_like(op_name="agg")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/[Location]/.venv/lib/python3.12/site-packages/pandas/core/apply.py", line 1388, in agg_or_apply_dict_like
result_index, result_data = self.compute_dict_like(
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/[Location]/.venv/lib/python3.12/site-packages/pandas/core/apply.py", line 480, in compute_dict_like
getattr(obj._gotitem(key, ndim=1), op_name)(how, **kwargs)
File "/Users/[Location]/.venv/lib/python3.12/site-packages/pandas/core/groupby/generic.py", line 275, in aggregate
if self.ngroups == 0:
^^^^^^^^^^^^
File "/Users/[Location]/.venv/lib/python3.12/site-packages/pandas/core/groupby/groupby.py", line 825, in ngroups
return self.grouper.ngroups
^^^^^^^^^^^^^^^^^^^^
File "properties.pyx", line 36, in pandas._libs.properties.CachedProperty.__get__
File "/Users/[Location]/.venv/lib/python3.12/site-packages/pandas/core/groupby/ops.py", line 758, in ngroups
return len(self.result_index)
^^^^^^^^^^^^^^^^^
File "properties.pyx", line 36, in pandas._libs.properties.CachedProperty.__get__
File "/Users/[Location]/.venv/lib/python3.12/site-packages/pandas/core/groupby/ops.py", line 769, in result_index
return self.groupings[0].result_index.rename(self.names[0])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "properties.pyx", line 36, in pandas._libs.properties.CachedProperty.__get__
File "/Users/[Location]/.venv/lib/python3.12/site-packages/pandas/core/groupby/grouper.py", line 718, in result_index
return self.group_index
^^^^^^^^^^^^^^^^
File "properties.pyx", line 36, in pandas._libs.properties.CachedProperty.__get__
File "/Users/[Location]/.venv/lib/python3.12/site-packages/pandas/core/groupby/grouper.py", line 722, in group_index
codes, uniques = self._codes_and_uniques
^^^^^^^^^^^^^^^^^^^^^^^
File "properties.pyx", line 36, in pandas._libs.properties.CachedProperty.__get__
File "/Users/[Location]/.venv/lib/python3.12/site-packages/pandas/core/groupby/grouper.py", line 801, in _codes_and_uniques
codes, uniques = algorithms.factorize( # type: ignore[assignment]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/[Location]/.venv/lib/python3.12/site-packages/pandas/core/algorithms.py", line 795, in factorize
codes, uniques = factorize_array(
^^^^^^^^^^^^^^^^
File "/Users/[Location]/.venv/lib/python3.12/site-packages/pandas/core/algorithms.py", line 595, in factorize_array
uniques, codes = table.factorize(
^^^^^^^^^^^^^^^^
File "pandas/_libs/hashtable_class_helper.pxi", line 7280, in pandas._libs.hashtable.PyObjectHashTable.factorize
File "pandas/_libs/hashtable_class_helper.pxi", line 7194, in pandas._libs.hashtable.PyObjectHashTable._unique
TypeError: unhashable type: 'Series'
Solution
You can use the same logic, aggregate using the common id:
d = {c: 'sum' for c in df.columns[1:]}
d['Name'] = lambda x: '/'.join(map(str, x))
out = (df
.groupby(df['product code'].replace(altdf.set_index('Alt Code')['Product Code']))
.agg(d).reset_index()
)
Then fix the product codes:
mapper = altdf.groupby('Product Code')['Alt Code'].agg(set)
out['product code'] = out['product code'].map(lambda x: '/'.join(map(str, sorted(mapper.get(x, set())|{x}))))
Output:
product code Name Consumption A Consumption B Consumption C Consumption D
0 123/134/179/426 AA/AD/ED/FD 118 161 199 271
1 456/467 BB/BD 6 10 14 18
2 789 CC 12 5 33 89
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.