Sunday, August 14, 2022

[FIXED] Pandas extract every nth column that start with same string prefix

August 14, 2022 pandas, python, regex No comments

Issue

import pandas as pd

data = {'T1_sometext': [1,2,3,4,5,6],
        'T1_anothertext': [1,2,3,4,5,6],
        "T1_anothertext2": [1,2,3,4,5,6],
        "T2_anothertext2": [1,2,3,4,5,6],
        "T2_anothertext4": [1,2,3,4,5,6],
        "T2_anothertext5": [1,2,3,4,5,6],
        }
df = pd.DataFrame(data)

How do I extract the T1 columns together and T2 columns together, so that I could plot them separately?

Is there a regex-way to do it? Let's say I have up to T20. I would like to automate it using python code.

In other words, I would like to extract (or select) every 3 columns with the same prefix, that is, starting with same string (T1, T2, T3, etc...)

Thank you!

Solution

You can extract the IDs and use groupby to get subgroups:

names = df.columns.str.extract(r'([^_]+)', expand=False)

for name, d in df.groupby(names, axis=1):
    print(f'>>> {name}')
    print(d)

Output:

>>> T1
   T1_sometext  T1_anothertext  T1_anothertext2
0            1               1                1
1            2               2                2
2            3               3                
...
>>> T2
   T2_anothertext2  T2_anothertext4  T2_anothertext5
0                1                1                1
1                2                2                2
...

Or as dictionary:

d = {name: d for name, d in df.groupby(names, axis=1)}

Output:

{'T1': subdf_with_T1
 'T2': subdf_with_T2
}

alternative using a `MultiIndex`:

df2 = df.set_axis(df.columns.str.split('_', 1, expand=True), axis=1)

for c in df2. columns.unique(level=0):
    print(f'>>> {c}')
    print(df2[c])

Output:

>>> T1
   sometext  anothertext  anothertext2
0         1            1             1
1         2            2             2
...
>>> T2
   anothertext2  anothertext4  anothertext5
0             1             1             1
1             2             2             2
...

df2:

        T1                                    T2                          
  sometext anothertext anothertext2 anothertext2 anothertext4 anothertext5
0        1           1            1            1            1            1
1        2           2            2            2            2            2
2        3           3            3            3            3            3
3        4           4            4            4            4            4
4        5           5            5            5            5            5
...

Answered By - mozway

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, August 14, 2022

[FIXED] Pandas extract every nth column that start with same string prefix

Issue

Solution

alternative using a `MultiIndex`:

0 comments:

Post a Comment

Popular Posts

Labels

Sunday, August 14, 2022

Issue

Solution

alternative using a MultiIndex:

0 comments:

Post a Comment

Popular Posts

Labels

alternative using a `MultiIndex`: