Thursday, October 20, 2022

[FIXED] Looping through the page numbers with Python BeautifulSoup

October 20, 2022 beautifulsoup, python No comments

Issue

Attempting to update my script so that it searches through not only the url provided but all of the pages in range (1-3) and adds them to the CSV. Can anyone spot why my current code would not be working? The addition to pages following 1 are in the following format: page-2

from bs4 import BeautifulSoup 
import requests 
from csv import writer
from random import randint
from time import sleep

#example of second page url: https://www.propertypal.com/property-for-sale/ballymena-area/page-2

url= "https://www.propertypal.com/property-for-sale/ballymena-area/"
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36'}

for page in range(1, 4):
    req = requests.get(url + 'page-' + str(page), headers=headers)
    # print(page)

    soup = BeautifulSoup(req.content, 'html.parser')
    lists = soup.find_all('li', class_="pp-property-box")

    with open('ballymena.csv', 'w', encoding='utf8', newline='') as f:
        thewriter = writer(f)
        header = ['Address', 'Price']
        thewriter.writerow(header)

        for list in lists:
            title = list.find('h2').text
            price = list.find('p', class_="pp-property-price").text

            info = [title, price]
            thewriter.writerow(info)

sleep(randint(2,10))

Solution

You are overwrite req multiple times and end up only analyzing the results of page 2. Put everything inside your loop. edit: Also the upper limit in range() is not included, so you probably want to do for page in range(1, 4): to get the first three pages.

edit full example:

from bs4 import BeautifulSoup
import requests
from csv import writer


url = "https://www.propertypal.com/property-for-sale/ballymena-area/page-"
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36'}

with open('ballymena.csv', 'w', encoding='utf8', newline='') as f:
    thewriter = writer(f)
    header = ['Address', 'Price']
    thewriter.writerow(header)

    for page in range(1, 4):
        req = requests.get(f"{url}{page}", headers=headers)
        soup = BeautifulSoup(req.content, 'html.parser')

        for li in soup.find_all('li', class_="pp-property-box"):
            title = li.find('h2').text
            price = li.find('p', class_="pp-property-price").text

            info = [title, price]
            thewriter.writerow(info)

Answered By - bitflip

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, October 20, 2022

[FIXED] Looping through the page numbers with Python BeautifulSoup

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels