Issue
In some code, I ended up with a multindex header with some intruder columns that I'm trying to get rid of. These columns have the pattern '[A-Z]\d+'
and they can be located anywhere in the inner level .
My dataframe looks like this one :
import pandas as pd
section_1 = pd.MultiIndex.from_product([['SECTION 1'], ['CED', 'D', 'C1']])
section_2 = pd.MultiIndex.from_product([['SECTION 2'], ['BCD', 'FGH', 'CE', 'L1', 'L2']])
section_3 = pd.MultiIndex.from_product([['SECTION 3'], ['A6', 'AB', 'KLMN', 'A5']])
columns = section_1.append(section_2).append(section_3)
df = pd.DataFrame([[None]*len(columns)], columns=columns)
print(df)
SECTION 1 SECTION 2 SECTION 3
CED D C1 BCD FGH CE L1 L2 A6 AB KLMN A5
0 None None None None None None None None None None None None
I tried this code but it giving me an empty result :
wanted = df.stack(0, dropna=False).filter(regex=r'[^[A-Z]\d]').unstack()
print(wanted)
Empty DataFrame
Columns: []
Index: [0]
I expect this output :
SECTION 1 SECTION 2 SECTION 3
CED D BCD FGH CE AB KLMN
0 None None None None None None None
Do you guys have an idea on how to deal with this problem ? I'm very interested on how to fix my regex approach.
Solution
One possibility is to drop columns matching the regex:
df = df.drop(filter(lambda c:re.match(r'[A-Z]\d+$', c[1]), df.columns), axis=1)
Output:
SECTION 1 SECTION 2 SECTION 3
CED D BCD FGH CE AB KLMN
0 None None None None None None None
Note that your addition of []
around the regex in your question changes its meaning completely. See the explanation at regex101.
Answered By - Nick
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.