Issue
I am looking to subtract each element in a data frame column by a specific number from that column. I am presently doing that by converting each column to a numpy array and it is not ideal.
As an example,
data = [[1, 10], [2, 20], [3, 30],[4, 40],[5, 50]]
# Existing dataframe
df = pd.DataFrame(data, columns=['column1', 'column2'])
a = np.array([2,4]) # this is an array for the index of elements. 2 is for column 1, 4 is for column 2.
# In column 1 with index=2, find the element, and subtract that from all the elements in column 1.
#Similarly with column 2, with index = 4, find the element, and subtract that from all the elements in column 2
# Required Output dataframe
data2 = [[-2, -40], [-1, -30], [0, -20],[1, -10],[2, 0]]
df2 = pd.DataFrame(data2, columns=['column1', 'column2'])
Output
Existing data frame:
column1 column2
0 1 10
1 2 20
2 3 30
3 4 40
4 5 50
Required Output data fram
column1 column2
0 -2 -40
1 -1 -30
2 0 -20
3 1 -10
4 2 0
Solution
We can use numpy indexing to select the values from the DataFrame by converting DataFrame.to_numpy
, then subtracting:
output = df - df.to_numpy()[a, np.arange(df.columns.size)]
Or with DataFrame.sub
:
output = df.sub(df.to_numpy()[a, np.arange(df.columns.size)], axis='columns')
output
:
column1 column2
0 -2 -40
1 -1 -30
2 0 -20
3 1 -10
4 2 0
Values are selected using the row indexes in a
:
a = np.array([2, 4])
# [2, 4]
A RangeIndex is created for the length of columns using np.arange
and the Index.size
:
col_index = np.arange(df.columns.size)
# [0 1]
These indices can be used together to select values from the DataFrame:
df.to_numpy()[a, col_index]
# [ 3 50]
Answered By - Henry Ecker
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.