Issue
I am trying to scrap data from a web page generated by a search engine. For this purpose, I study the data structure of the document using a sample query and the output result. The output contains a string "blah-blah-blah" being a value of the field that I need to scrap. I have this simple code:
soup = BeautifulSoup(response.text, 'html.parser')
soup.find_all(string=re.compile("blah-blah-blah"))
But the output is pretty useless.
['blah-blah-blah', 'blah-blah-blah', 'blah-blah-blah']
It only tells me that there are three occurrences of this string.
How can I find the location of those strings? I mean the corresponding tags, elements, fields etc., which will help me to find this string without knowing its value. This will later help me to scrap the value of the corresponding tag/attribute/whatever using soup.select()
or soup.find()
.
Solution
You can find parent tag as mentioned below:
soup = BeautifulSoup(response.text, 'html.parser')
matching_strings = soup.find_all(string=re.compile("blah-blah-blah"))
for string in matching_strings:
parent_tag = string.find_parent()
print(f"Parent Tag: {parent_tag.name}")
print(f"Full Parent Tag: {parent_tag}")
Answered By - Hetvi
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.