Wednesday, December 20, 2023

[FIXED] how to take intersection of multiple columns in excel and writing them into a new column BY python

December 20, 2023 dataframe, excel, pandas, python No comments

Issue

i have an excel file that according to users input , add columns called "cellname". for example: if user tells 3, i would have 3 columns called "cellname"

what i am looking for : find the values in 'cellname' columns , that are same and then add them into new coulmn called: result this the code till now

import pandas as pd
import os

v=int(input("how many periods do you have:\n"))
new_excel_by_user = str(input("Write the name of your new EXCEL file: "))
user_excel = pd.ExcelWriter(new_excel_by_user+'.xlsx', engine='xlsxwriter')


sys_excel = pd.ExcelWriter('sys_excel11.xlsx', engine='xlsxwriter')

merged_df_H = pd.DataFrame()
merged_df_L = pd.DataFrame()


for i in range(v):
  print("\nPeriod %d :" % (i+1))

  excelpathH = input("Insert your Excel path (HIGH): ")
  excelpathL = input("Insert your Excel path (LOW): ")

  excelpathH = excelpathH.replace('"', '')
  excelpathL = excelpathL.replace('"', '')

  dfH = pd.read_excel(os.path.join(excelpathH), engine='openpyxl')
  dfL = pd.read_excel(os.path.join(excelpathL), engine='openpyxl')

  #merged_df_H=merged_df_H.merge(dfH,how='inner',on='Azimuth')

  azi=dfH['CELL_FIRST_ANTENNA_AZIMUTH']
  merged_df_H['azi']=azi
  azi=merged_df_H['azi']

  lat=dfH['LATITUDE']
  merged_df_H['LATITUDE']=lat
  lat=merged_df_H['LATITUDE']

  long=dfH['LONGITUDE']
  merged_df_H['LONGITUDE']=long
  long=merged_df_H['LONGITUDE']

  bsc=dfH['BSC_RNC_NAME']
  merged_df_H['BSC_RNC_NAME']=bsc
  bsc=merged_df_H['BSC_RNC_NAME']


  Rnh=dfH['KPI VALUE']
  #Rnh=dfH['KPIVALUE']

  # for rrr in dfL:
   Rnl=dfL['KPI VALUE']
  #Rnl=dfL['KPIVALUE']

   R=Rnh-Rnl
   merged_df_H['R%s'%(i+1)]=R
   R=merged_df_H['R%s'%(i+1)]

   merged_df_H['CellName%s'%(i+1)]=dfH['CELL_NAME']
   merged_df_H['CellName']=merged_df_H['CellName%s'%(i+1)]

   #SN=merged_df_H['ss%s'%(i+1)]

   mask1 = merged_df_H['R%s'%(i+1)].ge(3)

   R = merged_df_H.filter(like='R').melt()['value']
   SN = merged_df_H.filter(like='CellName').melt()['value']
   azim=merged_df_H.filter(like='CELL_FIRST_ANTENNA_AZIMUTH').melt()['value']
   mask = SN.duplicated(keep=False)





   merged_df_H = pd.concat([merged_df_H.loc[mask1]],
                  axis=1)

   ### merged_df_H = pd.concat([merged_df_H,RESULT], axis=1)

   # out = merged_df_H.join(pd.concat([SN[mask], R[mask]], 
     #          axis=1, 
     #          keys=('SN-RESULT','R-RESULT')).reset_index(drop=True))




  print(merged_df_H)

  # Write merged dataframes to a new Excel file
  with pd.ExcelWriter(new_excel_by_user + '.xlsx', engine='xlsxwriter') as 
user_excel:
  merged_df_H.to_excel(user_excel, sheet_name='High', index=False)

WHAT I HAVE :

WHAT I WANT : EXPOER THE CELLVALUES WHERE CELLNAME IS GREATER THAN 10(the condition must be in every column):

bad output

Solution

According to your example, it first finds common values, evaluates (>10), and then writes to the result columns.

Code:

import pandas as pd

excel_file_path = 'C:\\test.xlsx' # absolute path of your excel file
df = pd.read_excel(excel_file_path, engine='openpyxl')
df['resultcellname'] = ''
df['resultcellnamevalue'] = ''

#common_values = set(df['cellname1']) & set(df['cellname2']) & set(df['cellname3']) # find common values

cellname_columns = [col for col in df.columns if col.startswith('cellname') and not(col.endswith('value'))]
common_values = set(df[cellname_columns[0]])
for col in cellname_columns[1:]:
    common_values &= set(df[col])

processed_values = set() # not to write the same result values again 
line=0

for index, row in df.iterrows():
    pairs = [(row.iloc[i], row.iloc[i + 1]) for i in range(0, len(row), 2)]  # pairs of "cellname" and "cellnamevalue" and convert them to tuples

    for cellname, cellnamevalue in pairs:
        if cellname in common_values and float(cellname) > 10 and cellname not in processed_values:
            df.at[line, 'resultcellname'] = cellname
            df.at[line, 'resultcellnamevalue'] = cellnamevalue
            processed_values.add(cellname)
            line=line+1

print(df)
df.to_excel('result_file.xlsx', index=False)

Output:

   cellname1 cellname1value  cellname2 cellname2value  cellname3 cellname3value resultcellname resultcellnamevalue
0         15              A          8              T         15              A             15                   A
1          2              B         15              A          5              R             11                   E
2          3              C         10              C          4              T             19                   H
3          9              D          1              D          1              B
4         11              E          4              T         11              E
5          6              F         11              E         25              L
6         19              H         13              G         19              H
7         12              T         19              H         13              T

EDIT:

You can select columns if col.startswith('cellname') and not(col.endswith('value')

Code:

cellname_columns = [col for col in df.columns if col.startswith('cellname') and not(col.endswith('value'))]
common_values = set(df[cellname_columns[0]])
for col in cellname_columns[1:]:
    common_values &= set(df[col])
print(common_values)

Answered By - Ömer Sezer

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, December 20, 2023

[FIXED] how to take intersection of multiple columns in excel and writing them into a new column BY python

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels