Issue
import pandas as pd
import glob
import csv
import re
from bs4 import BeautifulSoup
links_with_text = []
textfile = open("a_file.txt", "w")
for filename in glob.iglob('*.html'):
with open(filename) as f:
soup = BeautifulSoup(f)
links_with_text = [a['href'] for a in soup.find_all('a', href=True) if a.text]
print(links_with_text)
for element in links_with_text:
textfile.write(element + "\n")
sample Output:
file name:
- link1
- link2
- link3
file name2:
- link1
- link2
- link3
file name3:
- link1
- link2
- link3
I found a post some what related to mine but there it prints the output in multiple text files but here I would like to have those file names with their links in one textfile.
BeautifulSoup on multiple .html files
Please suggest. Thank you in advance
Solution
To have the filename at the top of each block, just add another .write()
line as follows:
from bs4 import BeautifulSoup
import glob
import csv
links_with_text = []
with open("a_file.txt", "w") as textfile:
for filename in glob.iglob('*.html'):
textfile.write(f"{filename}:\n")
with open(filename) as f:
soup = BeautifulSoup(f)
links_with_text = [a['href'] for a in soup.find_all('a', href=True) if a.text]
for element in links_with_text:
textfile.write(f" {element}\n")
Answered By - Martin Evans
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.