Issue
I'm writing new code and having problem getting desired output. The code reads an html file and finds tags. it outputs the url only. I insert additional code to complete the link. I'm trying to insert the url two times within the string.
####### Parse for <a> tags and save ############
with open("page1.html", 'r') as htmlb:
soup2 = BeautifulSoup(htmlb, 'lxml')
links = []
for link in soup2.findAll('a', attrs={'href': re.compile("^https://")}):
links.append('<a href="'+link.get('href')+'">'"{link}"'</a><br>')
time.sleep(.1)
with open("page-2.html", 'w') as html:
html.write('{links}\n'.format(links=links))
Solution
This should give you the desired html output file:
import re
from bs4 import BeautifulSoup
import html
with open("page1.html", 'r') as htmlb:
soup2 = BeautifulSoup(htmlb, 'lxml')
with open("page2.html", 'w') as h:
for link in soup2.find_all('a'):
h.write("<a href=\"{}\">{}</a><br>".format(link.get('href'),link.get('href')))
Answered By - dejanualex
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.