Issue
I have a special dataframe called df
here is how it looks like
RepID +Col01 +Col02 +Col03 -Col01 +Col04 +Col05 -Col03 -Col04 +Col06 -Col07
1 5 7 9 8 3 8 1 9 4 6
2 1 3 3 3 1 2 2 3 6 0
3 9 8 0 9 4 9 5 1 2 0
4 3 1 0 5 8 7 1 0 9 2
5 0 7 1 2 0 0 2 9 2 1
They are all positive numbers in the data
but if you notice the column name it is a column name that is either with + or with -
Some of these columns have + with no columns with the - (Such as +Col06)
Some of these columns have - with no columns with the + (Such as -Col07)
Some other have both (Such as +Col01 and -Col01)
I want to make this dataset normalised by subtracting ting the value in the - columns from the + columns, and chnage the column name to a name with no + or - in the begining of the name, so the end table will look like this
RepID Col01 Col02 Col03 Col04 Col05 Col06 Col07
1 -3 7 8 -6 8 4 -6
2 -2 3 1 -2 2 6 -0
3 0 8 -5 3 9 2 0
4 -2 1 -1 8 7 9 -2
5 -2 7 -1 -9 0 2 -1
Is there anyway I can do that
Solution
No need for groupby
, just extract the +
and -
columns separately, subtract with fill_value=0
, concat
to the ID:
out = pd.concat([df[['RepID']],
df.filter(regex='^\+')
.rename(columns=lambda x: x[1:])
.sub(df.filter(regex='^-')
.rename(columns=lambda x: x[1:]),
fill_value=0)
], axis=1)
Output:
RepID Col01 Col02 Col03 Col04 Col05 Col06 Col07
0 1 -3 7.0 8 -6 8.0 4.0 -6.0
1 2 -2 3.0 1 -2 2.0 6.0 0.0
2 3 0 8.0 -5 3 9.0 2.0 0.0
3 4 -2 1.0 -1 8 7.0 9.0 -2.0
4 5 -2 7.0 -1 -9 0.0 2.0 -1.0
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.