Issue
I have a directory on my drive and it has many.html files. Those files contain text when it opens with the browser. I have the following code to convert one .html into .txt file. How can I make iterations for all files and save each file as .txt with its original name?
thank you in advance
from bs4 import BeautifulSoup
markup = open("/content/drive/MyDrive/arc_Articlesww0c5e.html")
soup = BeautifulSoup(markup.read())
markup.close()
f = open("arc_Articlesww0c5e.txt", "w")
f.write(soup.get_text())
f.close()
Solution
This might give you an idea of how to proceed:
import os
from bs4 import BeautifulSoup
your_dir = "/content/drive/MyDrive"
for file in os.listdir(your_dir):
if file.endswith((".htm", ".html")):
with open(os.path.join(your_dir, file)) as markup:
soup = BeautifulSoup(markup.read())
with open(file.split(".")[0]+".txt", "w") as f:
f.write(soup.get_text())
Answered By - user56700
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.