Issue
this is the code I run for the most of application that I want to parse:
r = requests.get("https://play.google.com/store/apps/details?id=com.pocketly");
soup = BeautifulSoup(r.text)
the result I get most of the times is:
<!DOCTYPE html>
<html><head><meta content="text/html;charset=utf-8" http-equiv="content-type"/><meta content="width=device-width, initial-scale=1" name="viewport"/><link href="//www.gstatic.com/android/market_images/web/favicon_v3.ico" rel="shortcut icon"/><title>Not Found</title>
<style nonce="o8Z-lUeTEzblbdo5fMv2Ew">
body {
font-family: arial,sans-serif;
margin: 50px 10px;
padding: 0;
text-align: center;
}
img {
border: 0
}
.rounded {
-webkit-border-radius: 5px;
-moz-border-radius: 5px;
border-radius: 5px;
}
#content {
margin: 0 auto;
width: 750px;
}
#error-section {
background-color: #d2e3fb;
border: 1px solid #a1b4d9;
color: #666;
font-weight: bold;
padding: 12px 0;
}
#search-section {
border: 1px solid #a1b4d9;
margin: 10px 0;
}
#play-logo {
float: left;
margin: 17px;
}
#search-box {
float: left;
margin: 20px;
}
#debug {
margin-top: 50px;
text-align:left;
}
</style>
</head><body bgcolor="#ffffff" dir="ltr" text="#000000"><div id="content"><div class="uaxL4e" id="error-section">We're sorry, the requested URL was not found on this server.</div><div class="uaxL4e" id="search-section"><a href="/store"><img alt="Google Play" id="play-logo" src="//www.gstatic.com/android/market_images/web/play_prism_hlock_v2_1x.png" srcset="//www.gstatic.com/android/market_images/web/play_prism_hlock_v2_2x.png 2x"/></a><form action="/store/search" id="search-box" method="get" style="margin: 32px 10px;"><input name="q" type="text" value=""/><input type="submit" value="Search"/></form><div style="clear:both"></div></div></div></body></html>
some of the applications are returning normal results with the full page information, however most of them are like this above...
What could be the problem? please help
Solution
It seems you need to supply User-Agent
HTTP header to your request to get right information back:
import requests
from bs4 import BeautifulSoup
url = 'https://play.google.com/store/apps/details?id=com.pocketly'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/114.0'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')
desc = soup.select_one('[data-g-id="description"]').text
print(desc)
Prints:
Pocketly – Your Go-To Personal Loan App for Instant LoansExample | Repayment Time | APR | Amounts | LendersProcessing fees of INR 20 to INR 120 or 3%-7%. GST extra as applicable.
...
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.