Issue
import pandas as pd
data = {'T1_sometext': [1,2,3,4,5,6],
'T1_anothertext': [1,2,3,4,5,6],
"T1_anothertext2": [1,2,3,4,5,6],
"T2_anothertext2": [1,2,3,4,5,6],
"T2_anothertext4": [1,2,3,4,5,6],
"T2_anothertext5": [1,2,3,4,5,6],
}
df = pd.DataFrame(data)
How do I extract the T1 columns together and T2 columns together, so that I could plot them separately?
Is there a regex-way to do it? Let's say I have up to T20. I would like to automate it using python code.
In other words, I would like to extract (or select) every 3 columns with the same prefix, that is, starting with same string (T1, T2, T3, etc...)
Thank you!
Solution
You can extract
the IDs and use groupby
to get subgroups:
names = df.columns.str.extract(r'([^_]+)', expand=False)
for name, d in df.groupby(names, axis=1):
print(f'>>> {name}')
print(d)
Output:
>>> T1
T1_sometext T1_anothertext T1_anothertext2
0 1 1 1
1 2 2 2
2 3 3
...
>>> T2
T2_anothertext2 T2_anothertext4 T2_anothertext5
0 1 1 1
1 2 2 2
...
Or as dictionary:
d = {name: d for name, d in df.groupby(names, axis=1)}
Output:
{'T1': subdf_with_T1
'T2': subdf_with_T2
}
alternative using a MultiIndex
:
df2 = df.set_axis(df.columns.str.split('_', 1, expand=True), axis=1)
for c in df2. columns.unique(level=0):
print(f'>>> {c}')
print(df2[c])
Output:
>>> T1
sometext anothertext anothertext2
0 1 1 1
1 2 2 2
...
>>> T2
anothertext2 anothertext4 anothertext5
0 1 1 1
1 2 2 2
...
df2
:
T1 T2
sometext anothertext anothertext2 anothertext2 anothertext4 anothertext5
0 1 1 1 1 1 1
1 2 2 2 2 2 2
2 3 3 3 3 3 3
3 4 4 4 4 4 4
4 5 5 5 5 5 5
...
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.