Issue
I am wanting to grab the rows for the highest league a player played for each year. I have ordered each league that I want to compare from lowest to highest with a list I created from my code below.
import pandas as pd
prospect = pd.read_html('https://www.baseball-reference.com/register/player.fcgi?id=bishop000hun')[0]
levels = ['Rk', 'A-', 'A', 'A+', 'AA', 'AAA', 'MLB']
prospect = prospect[['Year', 'Tm', 'Lg', 'Lev', 'PA']][prospect['Lev'].isin(levels)]
prospect = prospect.sort_values('Lev', ascending = False).groupby(['Year']).tail(1)
However, I generated this output.
Year Tm Lg Lev PA
6 2019 Salem-Keizer NORW A- 117
15 2022 Eugene NORW A+ 358
11 2021 San Jose LAW A 9
What I was hoping for was for the 2021 row to get me the row that contained the A+ level instead of the A level. Can anyone assist me as to how to resolve this error? Thanks in advance.
Solution
your example
import pandas as pd
prospect = pd.read_html('https://www.baseball-reference.com/register/player.fcgi?id=bishop000hun')[0]
levels = ['Rk', 'A-', 'A', 'A+', 'AA', 'AAA', 'MLB']
prospect = prospect[['Year', 'Tm', 'Lg', 'Lev', 'PA']][prospect['Lev'].isin(levels)]
use sort_values with key
m = {j: i for i, j in enumerate(levels)}
out = prospect.sort_values('Lev', key=lambda x: x.map(m)).groupby(['Year']).tail(1)
out:
Year Tm Lg Lev PA
6 2019 Salem-Keizer NORW A- 117
10 2021 Eugene HAW A+ 15
15 2022 Eugene NORW A+ 358
Answered By - Panda Kim
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.