Issue
I have a number of pages containing statistics in lists that I am scraping. Everything is working except this one minor issue I cannot seem to resolve. In using the text of the data fields to find them, one heading that is very similar to another picks up the wrong value. Anyone know how to correct for this?
HTML looks like this:
<li><span class="bp3-tag p p-50">50</span> <span class="some explaining words.">Positioning</span>
<li><span class="bp3-tag p p-14">14</span> <span class="some other explaining words.">BB Positioning</span>
Code looks like this, and the output returns 14 for both values when it should return 50 for Positioning and 14 for BB Positioning...
stats = ['Positioning', 'BB Positioning']
url = urlopen(req)
soups = bs(url, 'lxml')
def statistics(soups):
data = {}
divs_without_skill = soups[1].find_all('div', {'class': 'col-3'})
more_lis = [div.find_all('li') for div in divs_without_skill]
lis = soups[0].find_all('li') + more_lis[0]
for li in lis:
for stats in fifa_stats:
if stats in li.text:
data[stats.replace(' ', '_').lower()] = str(
(li.text.split(' ')[0]).replace('\n', ''))
return(data)
Any help greatly appreciated.
Solution
import requests
from bs4 import BeautifulSoup
from pprint import pp
def main(url):
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
goal = {x.h5.text: [i.text for i in x.select(
'.bp3-tag')] for x in soup.select('div.column.col-3')[7:-1]}
pp(goal)
main('https://sofifa.com/player/244042/moussa-djitte/210049')
Output:
{'Attacking': ['56', '71', '64', '62', '53'],
'Skill': ['72', '46', '29', '36', '70'],
'Movement': ['78', '79', '83', '65', '74'],
'Power': ['67', '77', '74', '70', '59'],
'Mentality': ['51', '29', '69', '57', '65', '55'],
'Defending': ['33', '14', '16'],
'Goalkeeping': ['8', '8', '6', '15', '13']}
Answered By - αԋɱҽԃ αмєяιcαη
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.