Issue
I have several columns named the same in a data frame. How can I rename the below normal
and KIRC
to normal_1
, normal_2
, KIRC_1
, KIRC_2
?
import pandas as pd
gene_exp.columns = gene_exp.iloc[-1]
gene_exp = gene_exp.iloc[:-1]
gene_exp
# Append "_[number]"
c = pd.Series(gene_exp.columns)
for dup in gene_exp.columns[gene_exp.columns.duplicated(keep=False)]:
c[df.columns.get_loc(dup)] = ([dup + '_' + str(d_idx)
if d_idx != 0
else dup
for d_idx in range(gene_exp.columns.get_loc(dup).sum())]
)
gene_exp
Traceback:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3360 try:
-> 3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
/opt/conda/lib/python3.7/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
/opt/conda/lib/python3.7/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'KIRC'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
/tmp/ipykernel_27/3403075751.py in <module>
5 if d_idx != 0
6 else dup
----> 7 for d_idx in range(gene_exp.columns.get_loc(dup).sum())]
8 )
9 gene_exp
/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
-> 3363 raise KeyError(key) from err
3364
3365 if is_scalar(key) and isna(key) and not self.hasnans:
KeyError: 'KIRC'
Sample data
Gene | NAME | KIRC | normal | normal | KIRC | |
---|---|---|---|---|---|---|
0 | ABC | DEF | GHI | JKL | MNO | PQR |
1 | STU | VWX | YZ | ABC | DEF | GHI |
Desired output:
Gene | NAME | KIRC_1 | normal_1 | normal_2 | KIRC_2 | |
---|---|---|---|---|---|---|
0 | ABC | DEF | GHI | JKL | MNO | PQR |
1 | STU | VWX | YZ | ABC | DEF | GHI |
Solution
# set Gene and Name as Index, as we don't need these renamed
df.set_index(['Gene','NAME'], inplace=True)
# create a dataframe from the columns
df2=pd.DataFrame(df.columns.values, columns=['col'])
# create new columns by counting repeated names and adding 1 to count
# assign columns to the dataframe
df.columns=df2['col']+ '_' +(df2.groupby('col').cumcount()+1).astype(str)
# reset index
out=df.reset_index()
Gene NAME KIRC_1 normal_1 normal_2 KIRC_2
0 ABC DEF GHI JKL MNO PQR
1 STU VWX YZ ABC DEF GHI
Answered By - Naveed
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.