Issue
I want to reorder the hsp_treatment_df
dataframe based on the column name in ascending order for each patient ID, where:
- if the numeric value after the first underscore is the smallest, the column comes first, regardless of the time
- if the numeric value before the first underscore is the smallest (i.e., time), the column comes first; likewise if the value is largest, it comes last
In the column names, the substring before the first underscore is the time, and the substring after the first underscore is the patient ID. For example, in 12h_P1_T4_TimeC2_PIDC4_Non-Survivor
:
- 12h represents the time (t)
- P1 represents the patient ID (pid)
Code:
import pandas as pd
import numpy as np
# Analyse time points for each patient
## Get the patient ID from each column
for cols in hsp_treatment_df.columns:
pid = cols.split('_', 2)[1]
pid = pid.lstrip("P") # get patient ID without the "P" substring
time = cols.split('_', 2)[0]
time = time.rstrip(time[-1]) # get time without the "h" substring
# Rearrange the columns so that for each "pid", the "t" is in ascending order
if (min(pid)) and (min(time)):
c = []
c.append(cols)
hsp_treatment_df.columns = c
Traceback:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-128-f48ee0e3d3de> in <module>()
13 c = []
14 c.append(cols)
---> 15 hsp_treatment_df.columns = c
4 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/base.py in _validate_set_axis(self, axis, new_labels)
56 elif new_len != old_len:
57 raise ValueError(
---> 58 f"Length mismatch: Expected axis has {old_len} elements, new "
59 f"values have {new_len} elements"
60 )
ValueError: Length mismatch: Expected axis has 29 elements, new values have 1 elements
Dataframe:
`hsp_treatment_df.iloc[:,0:12].head().to_dict()`
{'0h_P1_T1_TimeC1_PIDC4_Non-Survivor': {'DNAJB6 /// TMEM135': '9.25',
'DNAJC14': '8.44',
'DNAJC15': '8.66',
'DNAJC30': '7.34',
'DNAJC9': '7.62'},
'0h_P2_T1_TimeC2_PIDC2_Survivor': {'DNAJB6 /// TMEM135': '9.26',
'DNAJC14': '8.34',
'DNAJC15': '8.63',
'DNAJC30': '7.42',
'DNAJC9': '7.19'},
'12h_P1_T4_TimeC2_PIDC4_Non-Survivor': {'DNAJB6 /// TMEM135': '8.58',
'DNAJC14': '8.64',
'DNAJC15': '8.98',
'DNAJC30': '7.2',
'DNAJC9': '7.23'},
'12h_P2_T4_TimeC3_PIDC2_Survivor': {'DNAJB6 /// TMEM135': '8.94',
'DNAJC14': '8.38',
'DNAJC15': '8.03',
'DNAJC30': '7.32',
'DNAJC9': '7.26'},
'24h_P1_T5_TimeC4_PIDC4_Non-Survivor': {'DNAJB6 /// TMEM135': '8.13',
'DNAJC14': '8.43',
'DNAJC15': '8.33',
'DNAJC30': '7.41',
'DNAJC9': '6.68'},
'24h_P2_T5_TimeC3_PIDC2_Survivor': {'DNAJB6 /// TMEM135': '8.85',
'DNAJC14': '8.54',
'DNAJC15': '8.73',
'DNAJC30': '7.26',
'DNAJC9': '7.49'},
'48h_P1_T6_TimeC3_PIDC1_Non-Survivor': {'DNAJB6 /// TMEM135': '9.05',
'DNAJC14': '8.27',
'DNAJC15': '7.63',
'DNAJC30': '7.46',
'DNAJC9': '7.42'},
'48h_P2_T6_TimeC3_PIDC3_Survivor': {'DNAJB6 /// TMEM135': '8.57',
'DNAJC14': '8.23',
'DNAJC15': '8.6',
'DNAJC30': '7.45',
'DNAJC9': '7.62'},
'4h_P1_T2_TimeC1_PIDC4_Non-Survivor': {'DNAJB6 /// TMEM135': '8.25',
'DNAJC14': '8.69',
'DNAJC15': '8.78',
'DNAJC30': '7.53',
'DNAJC9': '7.29'},
'4h_P2_T2_TimeC2_PIDC1_Survivor': {'DNAJB6 /// TMEM135': '9.49',
'DNAJC14': '8.47',
'DNAJC15': '8.7',
'DNAJC30': '7.41',
'DNAJC9': '7.23'},
'8h_P1_T3_TimeC4_PIDC4_Non-Survivor': {'DNAJB6 /// TMEM135': '8.49',
'DNAJC14': '8.58',
'DNAJC15': '8.22',
'DNAJC30': '7.29',
'DNAJC9': '7.13'},
'8h_P2_T3_TimeC2_PIDC2_Survivor': {'DNAJB6 /// TMEM135': '9.16',
'DNAJC14': '8.13',
'DNAJC15': '7.88',
'DNAJC30': '7.58',
'DNAJC9': '7.18'}}
Expected output:
Gene Symbol | 0h_P1_T1_TimeC1_PIDC4_Non-Survivor | 4h_P1_T2_TimeC1_PIDC4_Non-Survivor | 8h_P1_T3_TimeC4_PIDC4_Non-Survivor | 12h_P1_T4_TimeC2_PIDC4_Non-Survivor | 24h_P1_T5_TimeC4_PIDC4_Non-Survivor | 48h_P1_T6_TimeC3_PIDC1_Non-Survivor | 0h_P2_T1_TimeC2_PIDC2_Survivor | 4h_P2_T2_TimeC2_PIDC1_Survivor | 8h_P2_T3_TimeC2_PIDC2_Survivor | 12h_P2_T4_TimeC3_PIDC2_Survivor | 24h_P2_T5_TimeC3_PIDC2_Survivor | 48h_P2_T6_TimeC3_PIDC3_Survivor |
---|---|---|---|---|---|---|---|---|---|---|---|---|
DNAJC9 | 7.62 | 7.29 | 7.23 | 7.13 | 6.68 | 7.42 | 7.19 | 7.23 | 7.18 | 7.26 | 7.49 | 7.62 |
DNAJC30 | 7.34 | 7.53 | 7.29 | 7.2 | 7.41 | 7.46 | 7.42 | 7.41 | 7.58 | 7.32 | 7.26 | 7.45 |
DNAJC14 | 8.44 | 8.69 | 8.58 | 8.64 | 8.43 | 8.27 | 8.34 | 8.47 | 8.13 | 8.38 | 8.54 | 8.23 |
DNAJC15 | 8.66 | 8.78 | 8.22 | 8.98 | 8.33 | 7.63 | 8.63 | 8.7 | 7.88 | 8.03 | 8.73 | 8.6 |
DNAJB6 /// TMEM135 | 9.25 | 8.25 | 8.49 | 8.58 | 8.13 | 9.05 | 9.26 | 9.49 | 9.16 | 8.94 | 8.85 | 8.57 |
Solution
You can use sort_index
with a key
function that extracts patient ID and time, puts the patient ID first and adds the zero filled the time:
def key_fun(idx):
new = idx.str.extract(r'(\d+)h_P(\d+)')
return pd.Index(new[1] + new[0].str.zfill(2))
df = df.sort_index(axis=1, key=key_fun)
Answered By - Stef
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.