Issue
I have a pandas dataframe with datetime index, and a range column and a data column.
The shape of the df is (4000,3).
I take the data column out as a np.array, transforms it to a 1000 by 4 matrix and then I drop the rows I have nan in. Then let’s say I got 22 rows of nans so I got (1000-22) rows left.
Then I apply a function to this matrix, where the function always returns data with same dimensions of matrix as input. Now these output values I want to insert in new columns of the original df. So I would need reshape the matrix fill inn nans where the rows were dropped and then insert it into the new column.
However, I can’t seem to find a good way of doing it and it need to be really quick as I am deploying it on thousands of data frames with a lot more data than in this example.
Solution
This should do what you need it to with a fair amount of efficiency:
import numpy as np
import pandas as pd
import random
c1=np.random.choice(range(1,20),1000)
c2=np.random.choice(range(1,20),1000)
c3=np.random.choice(range(1,20),1000)
c4=np.random.choice(range(1,20),1000)
df=pd.DataFrame({'col1':c1,'col2':c2,'col3':c3, 'col4':c4})
df=df.replace(19, np.nan)
arr= np.array(df)
###### Functional Portion #######
naMask=np.isnan(arr)
arr1=arr[~np.isnan(arr)] #or however you are dropping nan values#
### Apply your function to arr1 yeilding arr2 ###
np.place(arr, naMask==True, [np.nan])
np.place(arr, naMask==False, [arr2])
You can ignore the beginning as it is just an attempt to approximate your array.
Answered By - j__carlson
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.