Issue
I have a DataFrame like this one:
import numpy as np
import pandas as pd
df = pd.DataFrame({'Sl.No.': {0: nan, 1: 1.0, 2: 2.0, 3: 3.0,
4: nan, 5: 4.0, 6: 5.0, 7: 6.0},
'Location': {0: 'AREA 1', 1: 'Loc A', 2: 'Loc B', 3: 'Loc C',
4: 'AREA 2', 5: 'Loc D', 6: 'Loc E', 7: 'Loc F'},
'Total Depth': {0: nan, 1: 5.6, 2: 6.8, 3: 2.9,
4: nan, 5: 4.3, 6: 6.0, 7: 5.3},
'Jan': {0: nan, 1: 2.3, 2: 4.2, 3: 2.1,
4: nan, 5: 2.9, 6: 5.2, 7: 4.3},
'Feb': {0: nan, 1: 1.9, 2: 3.8, 3: 2.0,
4: nan, 5: 2.5, 6: 4.6, 7: 3.2}})
This produces:
Sl.No. | Location | Total Depth | Jan | Feb | |
---|---|---|---|---|---|
0 | nan | AREA 1 | nan | nan | nan |
1 | 1 | Loc A | 5.6 | 2.3 | 1.9 |
2 | 2 | Loc B | 6.8 | 4.2 | 3.8 |
3 | 3 | Loc C | 2.9 | 2.1 | 2 |
4 | nan | AREA 2 | nan | nan | nan |
5 | 4 | Loc D | 4.3 | 2.9 | 2.5 |
6 | 5 | Loc E | 6 | 5.2 | 4.6 |
7 | 6 | Loc F | 5.3 | 4.3 | 3.2 |
In the above table AREA 1 and AREA 2 act as the heading for the rows with Sl.No. 1-3 and 4-6 respectively.
I wish to add another column called 'Area' to the dataframe and then delete the these two rows. That is, I want the following table:
Sl.No. | Area | Location | Total Depth | Jan | Feb | |
---|---|---|---|---|---|---|
0 | 1 | AREA 1 | Loc A | 5.6 | 2.3 | 1.9 |
1 | 2 | AREA 1 | Loc B | 6.8 | 4.2 | 3.8 |
2 | 3 | AREA 1 | Loc C | 2.9 | 2.1 | 2 |
3 | 4 | AREA 2 | Loc D | 4.3 | 2.9 | 2.5 |
4 | 5 | AREA 2 | Loc E | 6 | 5.2 | 4.6 |
5 | 6 | AREA 2 | Loc F | 5.3 | 4.3 | 3.2 |
What is the best way to do this?
Solution
You can use whether "Sl.No." column has NaN values or not as a mask on the "Location" column using where
. Then use ffill
to fill the NaN values. Finally, drop the rows with NaN values:
df['Area'] = df['Location'].where(df['Sl.No.'].isna()).ffill()
df = df.dropna()
You can also create Area identifier column using str.contains
+ cumsum
. Then groupby
it and apply a lambda that creates a new column "Area" using the first item in the Location
column for each Location:
out = (df.groupby(df['Location'].str.contains('AREA').cumsum())
.apply(lambda g: g.assign(Area=g['Location'].iat[0]).dropna())
.reset_index(drop=True))
Output:
Sl.No. Location Total Depth Jan Feb Area
0 1.0 Loc A 5.6 2.3 1.9 AREA 1
1 2.0 Loc B 6.8 4.2 3.8 AREA 1
2 3.0 Loc C 2.9 2.1 2.0 AREA 1
3 4.0 Loc D 4.3 2.9 2.5 AREA 2
4 5.0 Loc E 6.0 5.2 4.6 AREA 2
5 6.0 Loc F 5.3 4.3 3.2 AREA 2
Answered By - enke
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.