Issue
We want to scrape some content from this webpage. The HTML of the element we are interested in is this (div.white-bg-border-radius-kousik.shadow-kousik-effect.mb-2
).
For this, we are trying to use this selector in BeautifulSoup
(Python). It does not work. I tried three four variants, they did not work as well, the HTML shows that this element is present 36 times in the page. The selectors return either blank set or 2-3 results, so I am obviously missing something. Need to find out the right way of doing it.
from bs4 import BeautifulSoup
import os
import urllib.request
url = "https://bankcodesfinder.com/world-postal-codes/india"
with urllib.request.urlopen(url) as response:
html = str(response.read())
soup = BeautifulSoup(html, 'html.parser')
elements = soup.find_all('div.white-bg-border-radius-kousik.shadow-kousik-effect.mb-2') # This returns blank set
elements2 = soup.findAll('div', class_=['shadow-kousik-effect', 'mb-2']) #returns just 3 elements, whereas this is a subset class search of the original list of 3 classes, so this should return at least 36 elements
elements3 = soup.select('div.shadow-kousik-effect') # returns just 3 results
Solution
I think it has to do with your response
which on my machine gives tags with trailing \r\n
.
<div\r\n class="white-bg-border-radius-kousik shadow-kousik-effect mb-2">
<a \r\n="" class="nounderline" href="/world...>
Using requests
, your css selector returns the 35 elements (search-box excluded).
import requests
url = "https://bankcodesfinder.com/world-postal-codes/india"
soup = BeautifulSoup(requests.get(url).text, "html.parser")
css = "div.white-bg-border-radius-kousik.shadow-kousik-effect.mb-2"
regions = [list(tag.stripped_strings) for tag in soup.select(css)]
Output :
# len(regions) # 35
[
['ANDAMAN & NICOBAR ISLANDS', '102 Branches'],
['ANDHRA PRADESH', '10493 Branches'],
['ARUNACHAL PRADESH', '302 Branches'],
['ASSAM', '4022 Branches'],
['BIHAR', '9113 Branches'],
...
]
Answered By - Timeless
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.