Saturday, March 5, 2022

[FIXED] Copy only Employee id to excel sheet

March 05, 2022 pandas, python, spyder No comments

Issue

I have a unformatted data containing in Notepad file as shown below.

#Civil
GROUP CIVIL RPatel66 LKohli12 m12 PSen72 m72
GROUP CIVIL SKumar22 ASekar32 m32 BSiva90 
#Mechanical
GROUP MECHANICAL OKhan78 m78 MShah81 JKumar11 
GROUP MECHANICAL VHiremath12 TVasu43 m43 NReddy21
#Electrical
GROUP ELECTRICAL LPathan88 SPatil56 m56 AParth33
GROUP ELECTRICAL HAnil45 m45 Khari67 m67 Skumar49

When i run the below code

from openpyxl import Workbook
wb = Workbook()
ws = wb.active
f = open('C:\\Users\\Kiran\\Desktop\\Input.txt', 'r+') 
data = f.readlines()
spaces = ""
for i in range(len(data)):
    row = data[i].split(" ")  
    ws.append(row)
wb.save("Output1.xlsx")

import openpyxl
book= openpyxl.load_workbook('Output1.xlsx')
sheet = book['Sheet']
sheet.delete_cols(1,2) #deletes Column 1 and 2
book.save("Output1.xlsx")

For the above i am getting error and not getting the required output as i need.

I need output in a excel sheet as shown below.I need output as Eg:Rpatel66,LKohli12 etc in the excel sheet it should not contain m12,m72

RPatel66
LKohli12
PSen72
SKumar22 
ASekar32
BSiva90
OKhan78
MShah81
JKumar11
VHiremath12
TVasu43
NReddy21
LPathan88
SPatil56
AParth33
HAnil45
Khari67
Skumar49

Solution

Refer the below code to get the desired output for your query. Assuming the data is present in <your-file-name.txt>

import re
import pandas as pd
with open("<your-file-name.txt>",'r') as f:
    content=f.readlines()
    
content = [x for x in content if not x.startswith('#')]
temp_content_1=list(map(lambda x: x.replace('GROUP','').replace('MECHANICAL','').replace('CIVIL','').replace('ELECTRICAL','').strip(), content))

temp_content_2=list(map(lambda x: re.sub(' m\d+','',x), temp_content_1))

final=' '.join(temp_content_2).split()
df=pd.DataFrame({"Employee":final})
df.to_excel("<your-output-file-name.xlsx>", index=None)

Alternate Solution

import re
import pandas as pd
with open("<your-file-name.txt>",'r') as f:
    content=f.readlines()

temp_content_1=list(map(lambda x: re.sub(' m\d+','',x), content))
temp_content_2=' '.join(temp_content_1)

final=re.findall(r'\w+\d+', temp_content_2)    
df=pd.DataFrame({"Employee":final})
df.to_excel("<your-output-file-name.xlsx>", index=None)

Answered By - Anant Kumar

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, March 5, 2022

[FIXED] Copy only Employee id to excel sheet

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels