Issue
For example, I have a dataframe where two of the columns are "Zeroes" and "Ones" that contain only zeroes and ones, respectively. If I combine them into one column I get first all the zeroes, then all the ones.
I want to combine them in a way that I get each element from both columns, not all elements from the first column and all elements from the second column. So I don't want the result to be [0, 0, 0, 1, 1, 1], I need it to be [0, 1, 0, 1, 0, 1].
I process 100K+ rows of data. What is the fastest or optimal way to achieve this? Thanks in advance!
Solution
Try:
import pandas as pd
df = pd.DataFrame({ "zeroes" : [0, 0, 0], "ones": [1, 1, 1], "some_other" : list("abc")})
res = df[["zeroes", "ones"]].to_numpy().ravel(order="C")
print(res)
Output
[0 1 0 1 0 1]
Micro-Benchmarks
import pandas as pd
from itertools import chain
df = pd.DataFrame({ "zeroes" : [0] * 10_000, "ones": [1] * 10_000})
%timeit df[["zeroes", "ones"]].to_numpy().ravel(order="C").tolist()
672 µs ± 8.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit [v for vs in zip(df["zeroes"], df["ones"]) for v in vs]
2.57 ms ± 54 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit list(chain.from_iterable(zip(df["zeroes"], df["ones"])))
2.11 ms ± 73 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Answered By - Dani Mesejo
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.