Tuesday, February 8, 2022

[FIXED] Iterating over numpy array, np.ndindex making a mess

February 08, 2022 numpy, numpy-ndarray, numpy-slicing, python, python-3.x No comments

Issue

I have a dataframe and since I have to perform many calculations on it I figured I'd give Numpy a try, so I'm just learning how to use it. This is my dataframe

df = pd.DataFrame({'col1': ['z', 'x', 'c', 'v', 'b', 'n'], 'col2': [100, 200, 300, 400, 500, 600]})
df1 = pd.DataFrame({'col1': ['z', 'x', 'c', 'v', 'b', 'n'], 'col2': [100, 212, 300, 405, 552, 641]})
df['col3'] = np.empty((len(df), 0)).tolist()
df1['col3'] = np.empty((len(df), 0)).tolist()

df2 = df.merge(df1, on='col1', how='outer')

Now what i want to do is append col2_y - col2_x - sum(col3_y) to column col3_y if col2_y != col2_x. now I tried this

df2 = df2.to_numpy()
    df = [df2[x, 3:4] - df2[x, 1:2] for x in np.ndindex(len(df2))]
    df2 = [np.where(df2[x, 1:2] != df2[x, 3:4],
                              np.append(df2[x, 4:5], (df2[x, 3:4] - df2[x, 1:2]) - (df2[x, 4:5].sum())),
                              df2[x, 4:5]) for x in np.ndindex(len(df2))]

but somehow from this

[['z' 100 list([]) 100 list([])]
 ['x' 200 list([]) 212 list([])]
 ['c' 300 list([]) 300 list([])]
 ['v' 400 list([]) 405 list([])]
 ['b' 500 list([]) 552 list([])]
 ['n' 600 list([]) 641 list([])]]

It's turning into this

[array([[0]], dtype=object), 
 array([[12]],dtype=object),
 array([[0]],dtype=object),
 array([[5]], dtype=object), 
 array([[52]], dtype=object), 
 array([[41]], dtype=object)]

[array([[list([])]], dtype=object), 
 array([[list([])]], dtype=object), 
 array([[list([])]], dtype=object), 
 array([[list([])]], dtype=object), 
 array([[list([])]], dtype=object), 
 array([[list([])]], dtype=object)]

Am I not using the np.ndindex correctly? Is the slicing correct at least?

Do I even need it or is there a better way to accomplish what I'm trying to do?

I appreciate any suggestions!

Solution

Your dataframe:

In [43]: df2
Out[43]: 
  col1  col2_x col3_x  col2_y col3_y
0    z     100     []     100     []
1    x     200     []     212     []
2    c     300     []     300     []
3    v     400     []     405     []
4    b     500     []     552     []
5    n     600     []     641     []

and the array derived from it (note the object dtype):

In [44]: arr = df2.to_numpy()
In [45]: arr
Out[45]: 
array([['z', 100, list([]), 100, list([])],
       ['x', 200, list([]), 212, list([])],
       ['c', 300, list([]), 300, list([])],
       ['v', 400, list([]), 405, list([])],
       ['b', 500, list([]), 552, list([])],
       ['n', 600, list([]), 641, list([])]], dtype=object)

That iterative difference - the result is actually a list:

In [46]: arr1 = [arr[x, 3:4] - arr[x, 1:2] for x in np.ndindex(len(arr))]
In [47]: arr1
Out[47]: 
[array([[0]], dtype=object),
 array([[12]], dtype=object),
 array([[0]], dtype=object),
 array([[5]], dtype=object),
 array([[52]], dtype=object),
 array([[41]], dtype=object)]

The same thing as Series:

In [48]: df2['col2_y']-df2['col2_x']
Out[48]: 
0     0
1    12
2     0
3     5
4    52
5    41
dtype: int64

and array column different, without iteration. Object dtype math is still slower than numeric:

In [50]: arr[:,3]-arr[:,1]
Out[50]: array([0, 12, 0, 5, 52, 41], dtype=object)

A numpy integer dtype version:

In [51]: df2['col2_y'].to_numpy()-df2['col2_x'].to_numpy()
Out[51]: array([ 0, 12,  0,  5, 52, 41])

I'm not sure I want to tackle the following line

[np.where(df2[x, 1:2] != df2[x, 3:4],
                              np.append(df2[x, 4:5], (df2[x, 3:4] - df2[x, 1:2]) - (df2[x, 4:5].sum())),
                              df2[x, 4:5]) for x in np.ndindex(len(df2))]

It can be cleaned up with:

[np.where(x[1] != x[3],
          np.append(x[4], (x[3] - x[1]) - sum(x[4])),
          x[4]) 
 for x in arr]

Since all the x[4] columns are empty lists this

[array([], dtype=float64),
 ...
 array([], dtype=float64)]

oops, somewhere in fiddling I've added values to the last lists:

In [65]: df2
Out[65]: 
  col1  col2_x col3_x  col2_y col3_y
0    z     100     []     100    [0]
1    x     200     []     212   [12]
2    c     300     []     300    [0]
3    v     400     []     405    [5]
4    b     500     []     552   [52]
5    n     600     []     641   [41]

Answered By - hpaulj

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, February 8, 2022

[FIXED] Iterating over numpy array, np.ndindex making a mess

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels