Issue
My code's function is to read a list of URLS on an xlxs sheet (i.e stackoverflow.com).
It then goes to stackoverflow.com and checks to see if there is an Instagram account linked on the home page, if so it returns the link to that and writes it in the adjacent column.
However, some sites will have it listed in multiple places, header, footer or have a feed which will return multiple results to the cell.
Is there a way to return just a single result?
for cell in sheet[col][1:]:
try:
url = cell.value
r = requests.get(url)
ig_get = ['instagram.com']
ig_get_present = []
soup = BeautifulSoup(r.content, 'html5lib')
all_links = soup.find_all('a', href=True)
print(cell.value)
for ig_get in ig_get:
for link in all_links:
if ig_get in link.attrs['href']:
ig_get_present.append(link.attrs['href'])
ig_got = str(ig_get_present)
print(ig_got)
sheet.cell(cell.row, col2).value = ig_got
except requests.exceptions.ConnectionError:
pass
except requests.exceptions.TooManyRedirects:
pass
except requests.exceptions.MissingSchema:
pass
Edit for clarity:
Some domains will have multiple links to their social media pages, i.e one in the header, one in the footer, one in the navigation bar etc OR a mirror of their social media feed. In these cases, I am outputted with multiple of the same link in the cell:
['https://instagram.com/xxx', 'https://instagram.com/xxx', 'https://instagram.com/xxx']
I would only want one of these, not all of them.
Solution
If all you want is to only input the first match into the cell then all you really need is a break statement placed immediately after the first match.
For example:
...
...
url = cell.value
res = requests.get(url)
domain = 'instagram.com'
urls = []
soup = BeautifulSoup(res.content, 'html5lib')
all_links = soup.find_all('a', href=True)
for link in all_links:
if domain in link['href']:
url = link['href']
urls.append(url)
sheet.cell(cell.row, col2).value = url
break
...
...
The break
statement in python is a control flow statement that immediately breaks you out of whatever loop your code is executing.
You can read more about it in the python docs https://docs.python.org/3/tutorial/controlflow.html#break-and-continue-statements-and-else-clauses-on-loops
Answered By - alexpdev
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.