Tuesday, December 5, 2023

[FIXED] Pandas operation to obtain a matrix from a dataframe

December 05, 2023 dataframe, numpy, pandas, python No comments

Issue

I am fairly new to python and very new to pandas. I am trying to do a matrix operation, I have a dataframe called sample_df that looks like this:

      Set1    Set2  %overlap  %unique for Set1  %unique for Set2
0    S 077  S2 077    98.790              0.01              0.02
1    S 080  S2 080    99.165              0.01              0.01
2    S 023  S2 023    98.490              0.01              0.02
3    S 080  S2 115    97.760              0.02              0.03

There are 3 values for each of Set1 and Set2.

I am trying to create a 5x5 matrix that has the values of Set2 as column names

The extra row right below the column1 needs to contain the '%unique for Set2' values corresponding to each value in the 'Set2'. Similarly the extra column right next to each value of 'Set1' needs to contain the corresponding value of '%unique for Set1'. The rest of the matrix is diagonally filled with the corresponding values from '% overlap'

The resulting df needs to look like this:

                 S2 077    S2 080     S2 030      S2 115
                 0.02      0.01       0.02        0.03
S 077    0.01    98.790              
S 080    0.01              99.165     
S 023    0.01                         98.490
S 080    0.02                                     97.760

So far, I created a new dataframe by pivoting the sample_df:

sub_df = sample_df.pivot(index='Set1', columns='Set2', values='%overlap')

But this gives me a dataframe where 'S 080' only appears once and has two values against 'S2 080' and 'S2 115' in the same row, I want them in different rows.

I can insert an empty row and column to the matrix but I'm not sure how to fill the values, as I don't think I can use sub_df.pivot for this (or maybe I am not using it right). Can anybody help if there's a simple way to do this?

Solution

Use DataFrame.pivot with DataFrame.reindex by original ordering created by MultiIndex.from_frame:

cols1 = ['Set1','%unique for Set1']
cols2 = ['Set2','%unique for Set2']
mux1 = pd.MultiIndex.from_frame(sample_df[cols1], names=(None, None))
mux2 = pd.MultiIndex.from_frame(sample_df[cols2], names=(None, None))

out = (sample_df.pivot(index=cols1,
                      columns=cols2, 
                      values='%overlap')
                .reindex(index=mux1, columns=mux2))
print (out)
           S2 077  S2 080 S2 023 S2 115
             0.02    0.01   0.02   0.03
S 077 0.01  98.79     NaN    NaN    NaN
S 080 0.01    NaN  99.165    NaN    NaN
S 023 0.01    NaN     NaN  98.49    NaN
S 080 0.02    NaN     NaN    NaN  97.76

Or use DataFrame constructor with fill diagonal to Series:

cols1 = ['Set1','%unique for Set1']
cols2 = ['Set2','%unique for Set2']
mux1 = pd.MultiIndex.from_frame(sample_df[cols1], names=(None, None))
mux2 = pd.MultiIndex.from_frame(sample_df[cols2], names=(None, None))

mat = np.full((sample_df.shape[0], sample_df.shape[0]), np.nan)
np.fill_diagonal(mat, sample_df['%overlap'])

out = pd.DataFrame(mat, index=mux1, columns=mux2)
print (out)
           S2 077  S2 080 S2 023 S2 115
             0.02    0.01   0.02   0.03
S 077 0.01  98.79     NaN    NaN    NaN
S 080 0.01    NaN  99.165    NaN    NaN
S 023 0.01    NaN     NaN  98.49    NaN
S 080 0.02    NaN     NaN    NaN  97.76

Answered By - jezrael

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, December 5, 2023

[FIXED] Pandas operation to obtain a matrix from a dataframe

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels