Issue
I would like to reformat a dataframe so that each set of duplicate values in column one is replaced with a single header, which is then placed on its own row with the relevant data from columns two and three placed underneath it.
Here's what my current dataframe looks like:
df = pd.DataFrame({
'Dups': ['Invoices-Invoices', 'Invoices-Invoices', 'Contracts-Invoices', 'Contracts-Contracts', 'Contracts-Contracts'],
'FileOne': ['C:\text1.doc', 'C:\text2.doc', 'C:\text3.doc', 'C:\text4.doc', 'C:\text5.doc'],
'FileTwo': ['C:\doc1.doc', 'C:\doc2.doc', 'C:\doc3.doc', 'C:\doc4.doc', 'C:\doc5.doc']
})
Here is what I would like my dataframe to look like:
I've tried df.pivot()
, df.melt()
, and df.stack()
. These approaches rearrange the data but not in the way that I am looking for.
Update thanks to Baron Legendre:
df['Files'] = df['FileOne'] + "-" + df['FileTwo']
df = df.melt(id_vars='Dups', value_vars=['Files']).groupby(['Dups', 'variable']).agg(list)
for header in df:
for item in df['value']:
print('header')
for x in item:
print(x)
Just trying to figure out how to print the headers, then move the whole thing to a csv.
Solution
Your original dataframe is already close to the desired state. All is left is to add the group subheaders on top of values:
groups = pd.Series(df['Dups'].unique())
header_df = pd.DataFrame({
'Dups': groups,
'FileOne': groups,
'FileTwo': pd.Series([], dtype=str)
}).fillna({'FileTwo': ''})
output_df = pd.concat([df.assign(level=1), header_df.assign(level=0)]).sort_values(['Dups', 'level']).reset_index(drop=True).iloc[:, -3:-1]
output_df
FileOne FileTwo
0 Contracts-Contracts
1 C:\text4.doc C:\doc4.doc
2 C:\text5.doc C:\doc5.doc
3 Contracts-Invoices
4 C:\text3.doc C:\doc3.doc
5 Invoices-Invoices
6 C:\text1.doc C:\doc1.doc
7 C:\text2.doc C:\doc2.doc
Update: And just in case styling in your screenshot is also a must:
import numpy as np
output_df.style. \
apply(lambda x: np.where(x.isin(groups), 'font-weight: bold', None), axis=1). \
to_excel("test.xlsx", index=False, header=False)
.. will give you an Excel-file formatted like below:
Answered By - qaziqarta
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.