Issue
I have a problem that involves converting time series from one representation to another. Each item in the time series has attributes "time", "id", and "value" (think of it as a measurement at "time" for sensor "id"). I'm storing all the items in a Pandas dataframe with columns named by the attributes.
The set of "time"s is a small set of integers (say, 32), but some of the "id"s are missing "time"s/"value"s. What I want to construct is an output dataframe with the form:
id time0 time1 ... timeN
val0 val1 ... valN
where the missing "value"s are represented by NaNs.
For example, suppose the input looks like the following:
time id value
0 0 13
2 0 15
3 0 20
2 1 10
3 1 12
Then, assuming the set of possible times is 0, 2, and 3, the desired output is:
id time0 time1 time2 time3
0 13 NaN 15 20
1 NaN NaN 10 12
I'm looking for a Pythonic way to do this since there are several million rows in the input and around 1/4 million groups.
Solution
You can transform your table with a pivot. If you need to handle duplicate values for index/column pairs, you can use the more general pivot_table.
For your example, the simple pivot is sufficient:
>>> df = df.pivot(index="id", columns="time", values="value")
time 0 2 3
id
0 13.0 15.0 20.0
1 NaN 10.0 12.0
To get the exact result from your question, you could reindex the columns to fill in the empty values, and rename the column index like this:
# add missing time columns, fill with NaNs
df = df.reindex(range(df.columns.max() + 1), axis=1)
# name them "time#"
df.columns = "time" + df.columns.astype(str)
# remove the column index name "time"
df = df.rename_axis(None, axis=1)
Final df
:
time0 time1 time2 time3
id
0 13.0 NaN 15.0 20.0
1 NaN NaN 10.0 12.0
Answered By - w-m
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.