Friday, January 5, 2024

[FIXED] Unable to print the title and rating using python

January 05, 2024 beautifulsoup, python, python-3.x, python-requests No comments

Issue

As I wants to scrape the title and rating from below code but it prints only single movie name I want entire movies that's on one page Here is code help me to find the entire titles and rating

import requests
from bs4 import BeautifulSoup
import pandas as pd

url_imdb = "https://assets-datascientest.s3.eu-west-1.amazonaws.com/IMDB_en.html"

# Retrieving the HTML code of the page
response = requests.get(url_imdb)
page_imdb = response.content

# Creating a BeautifulSoup object
bs_imdb = BeautifulSoup(page_imdb, 'html.parser')

# Retrieving HTML code containing all the movie elements
films_imdb = bs_imdb.findAll('tbody', {'class': 'lister-list'})

# Creating an empty list to store the data
data = []

# Looping through each movie element and extracting relevant information
for flims in films_imdb:
    movie_title = flims.find('td', class_='titleColumn').get_text().replace('1', '').split()
    title = ' '.join(movie_title[:-1])
    release_year = movie_title[-1].replace('(', '').replace(')', '')
    rating= flims.find('td',class_='ratingColumn imdbRating').get_text()
    print(title,rating,release_year)

Solution

To parse the list of movies in the 'lister-list' table, the 'tr' elements must be extracted and traversed.

You use the find function to find the table and the 'find_all' function to retrieve each row of the table.

I have also attached the code for processing the information related to the movies

import requests
from bs4 import BeautifulSoup
import pandas as pd

url_imdb = "https://assets-datascientest.s3.eu-west-1.amazonaws.com/IMDB_en.html"

# Retrieving the HTML code of the page
response = requests.get(url_imdb)
page_imdb = response.content

# Creating a BeautifulSoup object
bs_imdb = BeautifulSoup(page_imdb, 'html.parser')

data = []
# Retrieving HTML code containing all the movie elements
films_imdb = bs_imdb.find('tbody', {'class': 'lister-list'})
rows = films_imdb.find_all('tr') # Retrieving all the rows of the table (each row contains a movie)
# Creating an empty list to store the data
data = [] 

# Looping through each movie element and extracting relevant information
for flims in rows:
    movie_title = flims.find('td', class_='titleColumn') # Retrieving the HTML code containing the title of the movie
    title = movie_title.find('a').get_text() # Retrieving the title of the movie
    release_year = movie_title.find('span', class_='secondaryInfo').get_text() # Retrieving the release year of the movie
    rating= flims.find('td',class_='ratingColumn imdbRating').get_text().replace('\n', '') # Retrieving the rating of the movie and removing the \n character
    film_data = [title, release_year, rating] # Creating a list containing the data of the movie
    data.append(film_data) # Adding the movie data to the list containing all the movies data

# Print rows of data
for i in data:
    print(i)

Output

['Glass Onion: une histoire à couteaux tirés', '(2022)', '7,3']
["Avatar: la voie de l'eau", '(2022)', '7,9']
['À couteaux tirés', '(2019)', '7,9']
['Babylon', '(2022)', '7,4']
['Avatar', '(2009)', '7,9']
["Les Banshees d'Inisherin", '(2022)', '7,9']

Answered By - Gireada

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, January 5, 2024

[FIXED] Unable to print the title and rating using python

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels