Issue
I am working on a dataframe that has a column named season (newly created, np.nan filled), another column is match_id, it's values are like: match 1 has match_id 1, match 2 has match_id 2, ... , match n has match_id n. It's cricket (close to baseball) dataset so it's ball by ball. 1 match has 20+20 overs max (Each over has 6 balls). So match_id 1 is approx from index 0 to 240. Then match_id 2 is approx from index 241 to 480. Data is ball by ball (1 row for 1 ball)/match by match(approx 240 rows for 1 match)/ Season by Season (approx 14160 rows for 1 season).
My condition is that if match_id is from 1 to 59, place 2017 in those season column rows.
In my dataset match_id and other columns pre existed. I created np.nan column season, now I want to fill it.
my data looks like,
In[]: df_raw.head(6)
out[]:
season match_id inning batting_team bowling_team over ball
0 NaN 1 1 Sunrisers Hyderabad Royal Challengers Bangalore 1 1
1 NaN 1 1 Sunrisers Hyderabad Royal Challengers Bangalore 1 2
2 NaN 1 1 Sunrisers Hyderabad Royal Challengers Bangalore 1 3
3 NaN 1 1 Sunrisers Hyderabad Royal Challengers Bangalore 1 4
4 NaN 1 1 Sunrisers Hyderabad Royal Challengers Bangalore 1 5
5 NaN 1 1 Sunrisers Hyderabad Royal Challengers Bangalore 1 6
Solution
Alternatively use loc
function:
df.loc[(df['match_id']<=59) & (df['match_id']>=1), 'season'] = 2017
Note that since season
column contains NaNs it will be stored as floating point numbers. When you have finished filling in the season
values you can convert the values to integers
df['season'] = df['season'].astype('int')
Answered By - Dmitri Chubarov
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.