Issue
Hey so I am using Beautiful soup to make a scrapper which aims to extract the id of an app searched on playstore. The code:
def linkgen(name):
base = "https://play.google.com/store/search?q="
req = requests.get(base + name)
soup = BeautifulSoup(req.content, "html.parser")
soup2=soup.find( class_ = "Si6A0c Gy4nib" )
print(soup2)
The output generated:
<a class="Si6A0c Gy4nib" href="/store/apps/details?id=com.facebook.katana" jslog="38003; 1:575|CBSqARUKEwjwyfy+1fj6AhXGZI4KHfF0AA8=; track:click,impression"><div class="Shbxxd"><img alt="Screenshot image" aria-hidden="true" class="T75of jpDEN" loading="lazy" src="https://play-lh.googleusercontent.com/9s-9zONYk4NZvLlHVMIF5cGCzrx7PjZYQ3uow5P8Rj2Mt_XHWygV3gOt75_iI1YtTg=w416-h235" srcset="https://play-lh.googleusercontent.com/9s-9zONYk4NZvLlHVMIF5cGCzrx7PjZYQ3uow5P8Rj2Mt_XHWygV3gOt75_iI1YtTg=w832-h470 2x"/></div><div class="j2FCNc"><img alt="Thumbnail image" aria-hidden="true" class="T75of stzEZd" loading="lazy" src="https://play-lh.googleusercontent.com/ccWDU4A7fX1R24v-vvT480ySh26AYp97g1VrIB_FIdjRcuQB2JP2WdY7h_wVVAeSpg=s64" srcset="https://play-lh.googleusercontent.com/ccWDU4A7fX1R24v-vvT480ySh26AYp97g1VrIB_FIdjRcuQB2JP2WdY7h_wVVAeSpg=s128 2x"/><div class="cXFu1"><div class="ubGTjb"><span class="DdYX5">Facebook</span></div><div class="ubGTjb"><span class="wMUdtb">Meta Platforms, Inc.</span></div><div class="ubGTjb"><div aria-label="Rated 3.2 stars out of five stars" style="display: inline-flex; align-items: center;"><span class="w2kbF">3.2</span><span class="Q4fJQd"><i aria-hidden="true" class="google-material-icons Yvy3Fd">star</i></span></div></div></div></div></a>
Out of this output I want to extract the id present in the href link(For this case I want to extract "com.facebook.katana"). I have tried searching for href in a tag and tried using regex as well but couldn't get any output. Anyone?
Thank you
Solution
To get only href tag content you can try using this regex sample in your python code:
r"(?<=id=)(.*?)(\")"
Then remove the last char at the end of the string. If you want to try the regex just go here :)
Hopes this will help you! Have a nice day.
Answered By - jontec
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.