Issue
I want to construct a pandas dataframe where columns have elements from specified arrays with unique elements only. I want to find the most efficient pythonic way of doing this.
For example, I have the following numpy arrays as input:
a = np.array(["a1"])
b = np.array(["b1", "b2"])
c = np.array(["c1", "c2", "c3"])
Below is how I want my Pandas dataframe to be like
a b c
0 a1 b1 c1
1 a1 b1 c2
2 a1 b1 c3
3 a1 b2 c1
4 a1 b2 c2
5 a1 b2 c3
Below is the code that I am using:
import pandas as pd
hashmap = {"a":[], "b":[], "c":[]}
for a_elem in a:
for b_elem in b:
for c_elem in c:
hashmap["a"] += [a_elem]
hashmap["b"] += [b_elem]
hashmap["c"] += [c_elem]
df = pd.DataFrame.from_dict(hashmap)
How can I make this code more efficient?
Solution
Use itertools.product
:
from itertools import product
df = pd.DataFrame(product(a, b, c), columns=['a', 'b', 'c'])
Output:
a b c
0 a1 b1 c1
1 a1 b1 c2
2 a1 b1 c3
3 a1 b2 c1
4 a1 b2 c2
5 a1 b2 c3
Alternative with MultiIndex.from_product
:
df = pd.MultiIndex.from_product([a, b, c], names=['a', 'b', 'c']
).to_frame(index=False)
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.