Issue
I have a large dataframe and a list of many locations I need to set to a certain value. Currently I'm iterating over the locations to set the values one by one:
import pandas as pd
import numpy as np
#large dataframe
column_names = np.array(range(100))
np.random.shuffle(column_names)
row_names = np.array(range(100))
np.random.shuffle(row_names)
df = pd.DataFrame(columns=column_names, index=row_names)
#index values to be set
ix = np.random.randint(0, 100,1000)
#column values to be set
iy = np.random.randint(0, 100,1000)
#setting the specified locations to 1, one by one
for k in range(len(ix)):
df.loc[ix[k], iy[k]]=1
This appears to be prohibitively slow. For the above example, on my machine, the last for loop takes 0.35 seconds. On the other hand, if I do
df.loc[ix, iy]=1
only takes 0.035 seconds so it is ten times faster. Unfortunately, it does not give the correct result, as it sets all combinations of elements of ix
and iy
to 1. I was wondering whether there is a similarly fast way to set values of many locations at once, avoiding the iteration over the locations?
Solution
You can access the underlying numpy array with .values
and use the position indices after conversion:
cols = pd.Series(range(df.shape[1]), index=df.columns)
idx = pd.Series(range(df.shape[0]), index=df.index)
df.values[idx.reindex(ix), cols.reindex(iy)] = 1
Minimal example:
# input
df = pd.DataFrame(index=['a', 'b', 'c'],
columns=['A', 'B', 'C'])
ix = ['a', 'b', 'c']
iy = ['A', 'C', 'A']
# output
A B C
a 1 NaN NaN
b NaN NaN 1
c 1 NaN NaN
previous answer
df.values[ix, iy] = 1
Minimal example:
df = pd.DataFrame(index=range(5), columns=range(5))
ix = [1, 2, 4]
iy = [1, 3, 2]
df.values[ix, iy] = 1
Output:
0 1 2 3 4
0 NaN NaN NaN NaN NaN
1 NaN 1 NaN NaN NaN
2 NaN NaN NaN 1 NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN 1 NaN NaN
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.