Issue
I have the following Pandas data frame (called df
).
+--------+--------+------+--------+
| Person | Animal | Year | Number |
+--------+--------+------+--------+
| John | Dogs | 2000 | 2 |
| John | Dogs | 2001 | 2 |
| John | Dogs | 2002 | 2 |
| John | Dogs | 2003 | 2 |
| John | Dogs | 2004 | 2 |
| John | Dogs | 2005 | 2 |
| John | Cats | 2000 | 1 |
| John | Cats | 2001 | NaN |
| John | Cats | 2002 | NaN |
| John | Cats | 2003 | 4 |
| John | Cats | 2004 | 5 |
| John | Cats | 2005 | 6 |
| Peter | Dogs | 2000 | NaN |
| Peter | Dogs | 2001 | 1 |
| Peter | Dogs | 2002 | NaN |
| Peter | Dogs | 2003 | 5 |
| Peter | Dogs | 2004 | 5 |
| Peter | Dogs | 2005 | 5 |
| Peter | Cats | 2000 | NaN |
| Peter | Cats | 2001 | 4 |
| Peter | Cats | 2002 | 4 |
| Peter | Cats | 2003 | 4 |
| Peter | Cats | 2004 | 4 |
| Peter | Cats | 2005 | 4 |
+--------+--------+------+--------+
My target is to get the following, which means using the interpolate method to fill the NaN
values, but based on the other column value. In other words, it should
- partition the df using the
Person
andAnimal
columns - order by
Year
(asc) - apply the interpolate method
.
+--------+--------+------+--------+
| Person | Animal | Year | Number |
+--------+--------+------+--------+
| John | Dogs | 2000 | 2 |
| John | Dogs | 2001 | 2 |
| John | Dogs | 2002 | 2 |
| John | Dogs | 2003 | 2 |
| John | Dogs | 2004 | 2 |
| John | Dogs | 2005 | 2 |
| John | Cats | 2000 | 1 |
| John | Cats | 2001 | 2 |
| John | Cats | 2002 | 3 |
| John | Cats | 2003 | 4 |
| John | Cats | 2004 | 5 |
| John | Cats | 2005 | 6 |
| Peter | Dogs | 2000 | NaN |
| Peter | Dogs | 2001 | 1 |
| Peter | Dogs | 2002 | 3 |
| Peter | Dogs | 2003 | 5 |
| Peter | Dogs | 2004 | 5 |
| Peter | Dogs | 2005 | 5 |
| Peter | Cats | 2000 | NaN |
| Peter | Cats | 2001 | 4 |
| Peter | Cats | 2002 | 4 |
| Peter | Cats | 2003 | 4 |
| Peter | Cats | 2004 | 4 |
| Peter | Cats | 2005 | 4 |
+--------+--------+------+--------+
What I have done
I can filter for each Person and each Animal and then apply the interpolate methods. Finally, merge all together, but this sounds dull and long if you have many columns.
Solution
You can try:
df['Number'] = (df.sort_values('Year', ascending=True)
.groupby(['Person', 'Animal'])['Number']
.transform(lambda x: x.interpolate()))
print(df)
# Output
Person Animal Year Number
0 John Dogs 2000 2.0
1 John Dogs 2001 2.0
2 John Dogs 2002 2.0
3 John Dogs 2003 2.0
4 John Dogs 2004 2.0
5 John Dogs 2005 2.0
6 John Cats 2000 1.0
7 John Cats 2001 2.0 # interpolate
8 John Cats 2002 3.0 # interpolate
9 John Cats 2003 4.0
10 John Cats 2004 5.0
11 John Cats 2005 6.0
12 Peter Dogs 2000 NaN
13 Peter Dogs 2001 1.0
14 Peter Dogs 2002 3.0
15 Peter Dogs 2003 5.0
16 Peter Dogs 2004 5.0
17 Peter Dogs 2005 5.0
18 Peter Cats 2000 NaN
19 Peter Cats 2001 4.0
20 Peter Cats 2002 4.0
21 Peter Cats 2003 4.0
22 Peter Cats 2004 4.0
23 Peter Cats 2005 4.0
For multiple columns, just use the same operation:
cols = ['Number'] # list of columns
df[cols] = (df.sort_values('Year', ascending=True)
.groupby(['Person', 'Animal'])[cols]
.transform(lambda x: x.interpolate()))
Answered By - Corralien
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.