Issue
I have a data frame with an index column and column with a list of values (lists could be different length):
df2 = pl.DataFrame({'x': [1, 2, 3], 'y': [['a', 'b', 'c'], ['d', 'e', 'f', 'g'], ['h', 'i', 'j']]})
shape: (3, 2)
┌─────┬──────────────────┐
│ x ┆ y │
│ --- ┆ --- │
│ i64 ┆ list\[str\] │
╞═════╪════════════ ═════╡
│ 1 ┆ ["a", "b", "c"] │
│ 2 ┆ ["d", "e", … "g"]│
│ 3 ┆ ["h", "i", "j"] │
└─────┴──────────────────┘
I'm trying to transpose the list, convert it into a series and retain the index so the resulting data frame would look like:
┌─────┬─────┐
│ x ┆ yp │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪═════╡
│ 1 ┆ "a" │
| 1 ┆ "b" |
| 1 ┆ "c" |
| 2 ┆ "d" |
| 2 ┆ "e" |
| 2 ┆ "f" |
| 2 ┆ "g" |
│ 3 ┆ "h" │
|... ┆... |
└─────┴─────┘
I could probably iterate through the data frame but I don't think this would be the most optimal way to do this. Any help would be appreciated.
Solution
import polars as pl
df2 = pl.DataFrame({'x': [1, 2, 3], 'y': [['a', 'b', 'c'], ['d', 'e', 'f', 'g'], ['h', 'i', 'j']]})
# Unnest the 'y' column and repeat 'x' values
df_unnested = df2.explode('y')
# Print the resulting DataFrame
print(df_unnested)
Answered By - Sha tha
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.