Issue
I have a unformatted data containing in Notepad file as shown below.
#Civil
GROUP CIVIL RPatel66 LKohli12 m12 PSen72 m72
GROUP CIVIL SKumar22 ASekar32 m32 BSiva90
#Mechanical
GROUP MECHANICAL OKhan78 m78 MShah81 JKumar11
GROUP MECHANICAL VHiremath12 TVasu43 m43 NReddy21
#Electrical
GROUP ELECTRICAL LPathan88 SPatil56 m56 AParth33
GROUP ELECTRICAL HAnil45 m45 Khari67 m67 Skumar49
When i run the below code
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
f = open('C:\\Users\\Kiran\\Desktop\\Input.txt', 'r+')
data = f.readlines()
spaces = ""
for i in range(len(data)):
row = data[i].split(" ")
ws.append(row)
wb.save("Output1.xlsx")
import openpyxl
book= openpyxl.load_workbook('Output1.xlsx')
sheet = book['Sheet']
sheet.delete_cols(1,2) #deletes Column 1 and 2
book.save("Output1.xlsx")
For the above i am getting error and not getting the required output as i need.
I need output in a excel sheet as shown below.I need output as Eg:Rpatel66,LKohli12 etc in the excel sheet it should not contain m12,m72
RPatel66
LKohli12
PSen72
SKumar22
ASekar32
BSiva90
OKhan78
MShah81
JKumar11
VHiremath12
TVasu43
NReddy21
LPathan88
SPatil56
AParth33
HAnil45
Khari67
Skumar49
Solution
Refer the below code to get the desired output for your query. Assuming the data is present in <your-file-name.txt>
import re
import pandas as pd
with open("<your-file-name.txt>",'r') as f:
content=f.readlines()
content = [x for x in content if not x.startswith('#')]
temp_content_1=list(map(lambda x: x.replace('GROUP','').replace('MECHANICAL','').replace('CIVIL','').replace('ELECTRICAL','').strip(), content))
temp_content_2=list(map(lambda x: re.sub(' m\d+','',x), temp_content_1))
final=' '.join(temp_content_2).split()
df=pd.DataFrame({"Employee":final})
df.to_excel("<your-output-file-name.xlsx>", index=None)
Alternate Solution
import re
import pandas as pd
with open("<your-file-name.txt>",'r') as f:
content=f.readlines()
temp_content_1=list(map(lambda x: re.sub(' m\d+','',x), content))
temp_content_2=' '.join(temp_content_1)
final=re.findall(r'\w+\d+', temp_content_2)
df=pd.DataFrame({"Employee":final})
df.to_excel("<your-output-file-name.xlsx>", index=None)
Answered By - Anant Kumar
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.