Issue
I am fairly new to python and very new to pandas. I am trying to do a matrix operation, I have a dataframe called sample_df
that looks like this:
Set1 Set2 %overlap %unique for Set1 %unique for Set2
0 S 077 S2 077 98.790 0.01 0.02
1 S 080 S2 080 99.165 0.01 0.01
2 S 023 S2 023 98.490 0.01 0.02
3 S 080 S2 115 97.760 0.02 0.03
There are 3 values for each of Set1 and Set2.
I am trying to create a 5x5 matrix that has the values of Set2 as column names
The extra row right below the column1 needs to contain the '%unique for Set2' values corresponding to each value in the 'Set2'. Similarly the extra column right next to each value of 'Set1' needs to contain the corresponding value of '%unique for Set1'. The rest of the matrix is diagonally filled with the corresponding values from '% overlap'
The resulting df needs to look like this:
S2 077 S2 080 S2 030 S2 115
0.02 0.01 0.02 0.03
S 077 0.01 98.790
S 080 0.01 99.165
S 023 0.01 98.490
S 080 0.02 97.760
So far, I created a new dataframe by pivoting the sample_df
:
sub_df = sample_df.pivot(index='Set1', columns='Set2', values='%overlap')
But this gives me a dataframe where 'S 080' only appears once and has two values against 'S2 080' and 'S2 115' in the same row, I want them in different rows.
I can insert an empty row and column to the matrix but I'm not sure how to fill the values, as I don't think I can use sub_df.pivot
for this (or maybe I am not using it right). Can anybody help if there's a simple way to do this?
Solution
Use DataFrame.pivot
with DataFrame.reindex
by original ordering created by MultiIndex.from_frame
:
cols1 = ['Set1','%unique for Set1']
cols2 = ['Set2','%unique for Set2']
mux1 = pd.MultiIndex.from_frame(sample_df[cols1], names=(None, None))
mux2 = pd.MultiIndex.from_frame(sample_df[cols2], names=(None, None))
out = (sample_df.pivot(index=cols1,
columns=cols2,
values='%overlap')
.reindex(index=mux1, columns=mux2))
print (out)
S2 077 S2 080 S2 023 S2 115
0.02 0.01 0.02 0.03
S 077 0.01 98.79 NaN NaN NaN
S 080 0.01 NaN 99.165 NaN NaN
S 023 0.01 NaN NaN 98.49 NaN
S 080 0.02 NaN NaN NaN 97.76
Or use DataFrame
constructor with fill diagonal to Series
:
cols1 = ['Set1','%unique for Set1']
cols2 = ['Set2','%unique for Set2']
mux1 = pd.MultiIndex.from_frame(sample_df[cols1], names=(None, None))
mux2 = pd.MultiIndex.from_frame(sample_df[cols2], names=(None, None))
mat = np.full((sample_df.shape[0], sample_df.shape[0]), np.nan)
np.fill_diagonal(mat, sample_df['%overlap'])
out = pd.DataFrame(mat, index=mux1, columns=mux2)
print (out)
S2 077 S2 080 S2 023 S2 115
0.02 0.01 0.02 0.03
S 077 0.01 98.79 NaN NaN NaN
S 080 0.01 NaN 99.165 NaN NaN
S 023 0.01 NaN NaN 98.49 NaN
S 080 0.02 NaN NaN NaN 97.76
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.