Issue
I'm trying to extract the user id from this link https://www.instagram.com/design.kaf/ using bs4 and Regex
Found a JSON key inside script tag called "profile_id" but I can't even search that script tag
You can find my try in regex here
Also I can't find something I can pull this certain <script>
tag
my code :
url= "https://www.instagram.com/design.kaf/"
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.72 Safari/537.36'
}
response = requests.request("GET", url, headers=headers)
soup = BeautifulSoup(response.text, 'lxml')
a=str(soup.findall("script"))
x = re.findall('profile_id":"-?\d+"', a)
id = int(x[0])
print(id)
Solution
you can try this code, it is an approach with loop and string search
import requests
from bs4 import BeautifulSoup
url = 'https://www.instagram.com/design.kaf/'
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.72 Safari/537.36'
}
r = requests.request("GET", url)
soup = BeautifulSoup(r.text, 'html.parser')
s = soup.findAll('script')
s = str(s)
id_str, counter = '', 0
counter = 0
while True:
# our required string format "profile_id":"0123456789....",
str_to_find = '"profile_id":"'
index_p = s.find(str_to_find) # returns the index of first character i.e. double quote
# first number of id will start from index_p + length of the searched string
if s[index_p+len(str_to_find)+counter] == '"':
break # iteration will stop when we again find double quote
else:
id_str += s[index_p+len(str_to_find)+counter]
counter += 1
print(id_str) # print 5172989370 in this case
Answered By - iamawesome
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.