Issue
at the moment i am using a excel sheet as an example for evaluation. Its an copy of an import where the Programm tends to fill column with mixed entries.
In Excel its look like this: Excel sheet
My Code looks like this in VS Code:
# 03.01.2024
# Regex - Reguläre Ausdrücke zum Filtern, hier mit dem Frame aus der Auswertung
import pandas as pd
import numpy as np
from datetime import datetime
from pandas import *
# Einlesen der Excel
excel_file = 'F:\\Google Drive\\Matthias\\Arbeit\\ISQM\\08 - Tool Menü - Skripte DD 19.12.2023\\Testskripte\\value_cnts_2.xlsx'
xlsx = ExcelFile(excel_file)
excel_Tabelle = xlsx.sheet_names[0]
df = xlsx.parse(excel_Tabelle)
# df['Vorgänger'] = df['Vorgänger'].fillna(0)
#df = df['Nachfolger'].str.replace('----', '')
df['Verknüpfung1'] = df['Nachfolger'].str.extract('(?P<EE>EE)')
#df['Verknüpfung3'] = df['Nachfolger'].str.extract('[EA+,AA,EA-]')
df['Verknüpfung2'] = df['Nachfolger'].str.extract('(?P<EE>AA)')
df['Verknüpfung1'] = df['Verknüpfung1'].fillna(0)
df['Verknüpfung2'] = df['Verknüpfung2'].fillna(0)
df`
`
I write the excel sheet into a pandas dataframe. As you can see i am trying the regex expressions in the extract function.
With the regex101 editor i found some regex, the named capturing group, which give me this output:
Nachfolger Verknüpfung1 Verknüpfung2
0 54;20 0 0
1 ---- 0 0
2 ---- 0 0
3 ---- 0 0
4 ---- 0 0
5 ---- 0 0
6 ---- 0 0
7 ---- 0 0
8 ---- 0 0
9 ---- 0 0
10 ---- 0 0
11 ---- 0 0
12 ---- 0 0
13 ---- 0 0
14 ---- 0 0
15 ---- 0 0
16 21;4;49;50;51 0 0
17 52 0 0
18 ---- 0 0
19 52;128AA;207;22;223 0 AA
20 ---- 0 0
21 52;24 0 0
22 28 0 0
23 ---- 0 0
24 52;227;27 0 0
25 30 0 0
26 227 0 0
27 ---- 0 0
28 52;31 0 0
29 ---- 0 0
30 138EE;34 EE 0
31 ---- 0 0
32 139EE;36 EE 0
33 ---- 0 0
34 140EE;38 EE 0
35 ---- 0 0
36 141EE;40 EE 0
37 ---- 0 0
I want to filter the columns which contains characters. I want to count them.
My Question now is there way where you can use regular expressions only in one column. I tried another expression ([EA+,AA,EA-]) where i just get an Error.
Thank you!
Solution
What you expect is not clear. However, you can try:
df['Verknüpfung3'] = df['Nachfolger'].str.extract(r'(EA[+-]|AA)')
# OR
df['Verknüpfung'] = df['Nachfolger'].str.extract(r'(EE|EA[+-]|AA)')
Answered By - Corralien
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.