Issue
I would like to obtain the original OrderedDict's value list as an output, but my attempt using pandas.DataFrame.duplicated
merely returns a boolean list with the repeated values marked as 'True'.
- My initial OrderedDict :
{'Sheet_1': ID Name Surname Grade
0 104 Eleanor Rigby 6
1 104 Eleanor Rigby 6
2 168 Barbara Ann 8
3 450 Polly Cracker 7
4 90 Little Joe 10
5 90 Little Joe 10,
'Sheet_2': ID Name Surname Grade
0 106 Lucy Sky 8
1 128 Delilah Gonzalez 5
2 100 Christina Rodwell 3
3 100 Christina Rodwell 3
4 40 Ziggy Stardust 7,
'Sheet_3': ID Name Surname Grade
0 22 Lucy Diamonds 9
1 50 Grace Kelly 7
2 50 Grace Kelly 7
3 105 Uma Thurman 7
4 105 Uma Thurman 7
5 29 Lola King 3}
- My desired OrderedDict (which I will later write to excel) :
{'Sheet_1': ID Name Surname Grade
1 104 Eleanor Rigby 6
5 90 Little Joe 10,
'Sheet_2': ID Name Surname Grade
3 100 Christina Rodwell 3,
'Sheet_3': ID Name Surname Grade
2 50 Grace Kelly 7
4 105 Uma Thurman 7}
- So far I have gone with this piece of code:
# Importing modules
import openpyxl as op
import pandas as pd
import numpy as np
import xlsxwriter
from openpyxl import Workbook, load_workbook
# Defining the file path
path_excel_file = r'C:\Users\machukovich\Desktop\stack.xlsx'
# Loading the files into a dictionary of Dataframes
dfs = pd.read_excel(path_excel_file, sheet_name=None, skiprows=2)
# Looping through the different sheets so to
for sheet_name, df in dfs.items():
duplicated_values_df = df.duplicated(subset='ID', keep='last')
### At this point I am obtaining a list of booleans only for one sheet, while I would like the loop to run all of the sheets of the excel file.
# Then, I would create a new excel file with the duplicated_values_df data
Path_new_file = r'C:\Users\machukovich\Desktop\new_file.xlsx'
# Create a Pandas Excel writer using XlsxWriter as the engine.
with pd.ExcelWriter(Path_new_file, engine='xlsxwriter') as writer:
for sheet_name, df in duplicated_values_df.items():
df.to_excel(writer, sheet_name=sheet_name, startrow=2, index=False)
I looked through past responses, yet I struggled to find a clear solution. Looking forward your apreciated help.
Solution
Assuming dic
your input dictionary, you could combine duplicated
and drop_duplicates
:
out = {k: d[d.duplicated()].drop_duplicates()
for k, d in dic.items() }
Output:
{'Sheet_1': ID Name Surname Grade
1 104 Eleanor Rigby 6
5 90 Little Joe 10,
'Sheet_2': ID Name Surname Grade
3 100 Christina Rodwell 3,
'Sheet_3': ID Name Surname Grade
2 50 Grace Kelly 7
4 105 Uma Thurman 7}
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.