Tuesday, October 25, 2022

[FIXED] Rename pandas column iteratively

October 25, 2022 pandas, python No comments

Issue

I have several columns named the same in a data frame. How can I rename the below normal and KIRC to normal_1, normal_2, KIRC_1, KIRC_2?

import pandas as pd

gene_exp.columns = gene_exp.iloc[-1]
gene_exp = gene_exp.iloc[:-1]
gene_exp

# Append "_[number]" 
c = pd.Series(gene_exp.columns)
for dup in gene_exp.columns[gene_exp.columns.duplicated(keep=False)]: 
    c[df.columns.get_loc(dup)] = ([dup + '_' + str(d_idx) 
                                     if d_idx != 0 
                                     else dup 
                                     for d_idx in range(gene_exp.columns.get_loc(dup).sum())]
                                    )
gene_exp

Traceback:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3360             try:
-> 3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:

/opt/conda/lib/python3.7/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

/opt/conda/lib/python3.7/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'KIRC'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_27/3403075751.py in <module>
      5                                      if d_idx != 0
      6                                      else dup
----> 7                                      for d_idx in range(gene_exp.columns.get_loc(dup).sum())]
      8                                     )
      9 gene_exp

/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:
-> 3363                 raise KeyError(key) from err
   3364 
   3365         if is_scalar(key) and isna(key) and not self.hasnans:

KeyError: 'KIRC'

Sample data

	Gene	NAME	KIRC	normal	normal	KIRC
0	ABC	DEF	GHI	JKL	MNO	PQR
1	STU	VWX	YZ	ABC	DEF	GHI

Desired output:

	Gene	NAME	KIRC_1	normal_1	normal_2	KIRC_2
0	ABC	DEF	GHI	JKL	MNO	PQR
1	STU	VWX	YZ	ABC	DEF	GHI

Solution

# set Gene and Name as Index, as we don't need these renamed
df.set_index(['Gene','NAME'], inplace=True)

# create a dataframe from the columns
df2=pd.DataFrame(df.columns.values, columns=['col'])

# create new columns by counting repeated names and adding 1 to count
# assign columns to the dataframe
df.columns=df2['col']+ '_' +(df2.groupby('col').cumcount()+1).astype(str)

# reset index
out=df.reset_index()

   Gene     NAME    KIRC_1  normal_1    normal_2    KIRC_2
0   ABC     DEF     GHI          JKL         MNO       PQR
1   STU     VWX     YZ           ABC        DEF        GHI

Answered By - Naveed

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, October 25, 2022

[FIXED] Rename pandas column iteratively

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels