Issue
import pandas as pd
import numpy as np
s = [ "S" + str(i) for i in range(1,101)]
c = [ "C" + str(i) for i in range(1,51)]
arr1 = np.random.randn(len(c),len(s))
arr2 = np.random.randn(len(c),len(s))
How to create and fill pandas dataframe df with 100 * 50 = 5000 rows for each possible s and c pairs such that arr1_col has arr1[s,c] and arr2_col has arr2[s,c]?
df = pd.DataFrame({'S':s, 'C':c, 'arr1_col':arr1[s,c] , 'arr2_col':arr2[s,c]})
Solution
Assuming this 4x3 reproducible input:
import pandas as pd
import numpy as np
s = [ "S" + str(i) for i in range(1,4+1)]
c = [ "C" + str(i) for i in range(1,3+1)]
arr1 = np.arange(len(c)*len(s)).reshape(len(s), len(c))
arr2 = np.arange(len(c)*len(s)).reshape(len(s), len(c))*10
I imagine that you want to use numpy.repeat
, numpy.tile
and numpy.ravel
:
# numpy_1
df = pd.DataFrame({'S': np.repeat(s, len(c)),
'C': np.tile(c, len(s)),
'arr1': arr1.ravel(),
'arr2': arr2.ravel(),
})
# pandas_1
df = (pd.concat({'arr1': pd.DataFrame(arr1, index=s, columns=c),
'arr2': pd.DataFrame(arr2, index=s, columns=c),},
axis=1)
.stack().rename_axis(['S', 'C']).reset_index()
)
Output:
S C arr1 arr2
0 S1 C1 0 0
1 S1 C2 1 10
2 S1 C3 2 20
3 S2 C1 3 30
4 S2 C2 4 40
5 S2 C3 5 50
6 S3 C1 6 60
7 S3 C2 7 70
8 S3 C3 8 80
9 S4 C1 9 90
10 S4 C2 10 100
11 S4 C3 11 110
ordering C first
# numpy_2
df = pd.DataFrame({'S': np.tile(s, len(c)),
'C': np.repeat(c, len(s)),
'arr1': arr1.ravel(order='F'),
'arr2': arr2.ravel(order='F'),
})
Or:
# pandas_2
df = (pd.concat({'arr1': pd.DataFrame(arr1, index=s, columns=c),
'arr2': pd.DataFrame(arr2, index=s, columns=c),},
axis=0)
.T.stack().rename_axis(['C', 'S']).reset_index()
)
Output:
S C arr1 arr2
0 S1 C1 0 0
1 S2 C1 3 30
2 S3 C1 6 60
3 S4 C1 9 90
4 S1 C2 1 10
5 S2 C2 4 40
6 S3 C2 7 70
7 S4 C2 10 100
8 S1 C3 2 20
9 S2 C3 5 50
10 S3 C3 8 80
11 S4 C3 11 110
comparison of timings
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.