Issue
I am trying to scrape the names of players from this page: https://www.espncricinfo.com/series/england-in-pakistan-2022-1327226/pakistan-vs-england-1st-t20i-1327228/full-scorecard
To do that I first get the tables containing the batting scorecards:
batting_scorecard = response.xpath("//table[@class='ds-w-full ds-table ds-table-md ds-table-auto ci-scorecard-table']")
Then I try to get the player names:
batting_scorecard.xpath("//a[contains(@href,'/player/')]/span/span/text()").getall()
This returns a list that contains all the player names (as well as some rubbish to be parsed) but it also contains names of players/umpires/referees who are not in the specified tables.
In the list below 'Luke Wood' (last occurrence), 'Aleem Dar', 'Asif Yaqoob', 'Ahsan Raza', 'Rashid Riaz', 'Muhammad Javed' should not be returned as they are in a different table. The batting_scorecard tables have class "ds-w-full ds-table ds-table-md ds-table-auto ci-scorecard-table"
whereas this data is in a table with class "ds-w-full ds-table ds-table-sm ds-table-auto "
.
Can anyone see what the problem is?
['Mohammad Rizwan',
'\xa0',
'Babar Azam',
'\xa0',
'Haider Ali',
'\xa0',
'Shan Masood',
'\xa0',
'Iftikhar Ahmed',
'\xa0',
'Mohammad Nawaz',
'\xa0',
'Khushdil Shah',
'\xa0',
'Naseem Shah',
'\xa0',
'Usman Qadir',
'\xa0',
'Haris Rauf',
',',
'\xa0',
'Shahnawaz Dahani',
'\xa0',
'Phil Salt',
'\xa0',
'Alex Hales',
'\xa0',
'Dawid Malan',
'\xa0',
'Ben Duckett',
'\xa0',
'Harry Brook',
'\xa0',
'Moeen Ali',
'\xa0',
'Sam Curran',
',',
'\xa0',
'David Willey',
',',
'\xa0',
'Adil Rashid',
',',
'\xa0',
'Luke Wood',
',',
'\xa0',
'Richard Gleeson',
'\xa0',
'Luke Wood',
'Aleem Dar',
'Asif Yaqoob',
'Ahsan Raza',
'Rashid Riaz',
'Muhammad Javed',
'Mohammad Rizwan',
'\xa0',
'Babar Azam',
'\xa0',
'Haider Ali',
'\xa0',
'Shan Masood',
'\xa0',
'Iftikhar Ahmed',
'\xa0',
'Mohammad Nawaz',
'\xa0',
'Khushdil Shah',
'\xa0',
'Naseem Shah',
'\xa0',
'Usman Qadir',
'\xa0',
'Haris Rauf',
',',
'\xa0',
'Shahnawaz Dahani',
'\xa0',
'Phil Salt',
'\xa0',
'Alex Hales',
'\xa0',
'Dawid Malan',
'\xa0',
'Ben Duckett',
'\xa0',
'Harry Brook',
'\xa0',
'Moeen Ali',
'\xa0',
'Sam Curran',
',',
'\xa0',
'David Willey',
',',
'\xa0',
'Adil Rashid',
',',
'\xa0',
'Luke Wood',
',',
'\xa0',
'Richard Gleeson',
'\xa0',
'Luke Wood',
'Aleem Dar',
'Asif Yaqoob',
'Ahsan Raza',
'Rashid Riaz',
'Muhammad Javed']
Solution
Change your selector to:
batting_scorecard.xpath(".//a[contains(@href,'/player/')]/span/span/text()").getall()
This way (by adding a dot in front of xpath), XPATH will only search within the actual element, not in the full page.
Answered By - Barry the Platipus
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.