Issue
Hello i would like to add each link seperate in the database. When i print out "new_lst" it displays every link so i think it wants to put the whole outcome in 1 row and now seperate. My code:
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="root",
password="",
database="webscraper"
)
req = Request("https://google.com")
html_page = urlopen(req)
main_link = "https://google.com"
soup = BeautifulSoup(html_page, "html.parser")
links = []
for link in soup.findAll('a'):
links.append(link.get('href'))
new_lst = ('"'.join(links))
mycursor = mydb.cursor()
sql = "INSERT INTO links (main_link, link_scraped) VALUES (%s, %s)"
val = (main_link, new_lst)
mycursor.execute(sql, val)
mydb.commit()
Solution
You are already iterating over with a for loop.
Yes, it is putting the whole outcome in one line as you are combining them in new_lst = ('"'.join(links))
, this can be avoided by just changing to inserting one item at a time that you are looping over already. Though, this approach doesn't do any checking or validation before putting it into the database, I would add some extra checks if need be before processing the SQL command.
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="root",
password="",
database="webscraper"
)
req = Request("https://google.com")
html_page = urlopen(req)
main_link = "https://google.com"
soup = BeautifulSoup(html_page, "html.parser")
mycursor = mydb.cursor()
for link in soup.findAll('a'):
sql = "INSERT INTO links (main_link, link_scraped) VALUES (%s, %s)"
val = (main_link, link)
mycursor.execute(sql, val)
mydb.commit()
Answered By - Warkaz
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.