Issue
I want to create a data frame table that shows movie title and gross total. Already managed to scrapp both values, but when i create the data frame table, gross values are shown inside brackets [ ]. I want to delete those brackets.
Here is the code i have so far:
from bs4 import BeautifulSoup
import requests
import pandas as pd
url = "https://www.imdb.com/list/ls024149810/"
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
# Extract Titles
movies = [i.find('a', href = True).get_text()
for i in soup.find_all('h3',{'class':'lister-item-header'})]
print(movies)
# Extract Gross
gross_total = soup.find_all('span', attrs = {'name':'nv'})[1::2]
print(gross_total)
# Create list
list(zip(movies, gross_total))
# Make and print data frame table
df = pd.DataFrame(list(zip(movies, gross_total)),
columns = ['Title', 'Gross Total'])
print(df)
Here is the output, I want to delete the brackets [ ]:
Solution
You could use :-soup-contains to target the Gross span and then an adjacent sibling to move to the next span.
from bs4 import BeautifulSoup
import requests
import pandas as pd
url = "https://www.imdb.com/list/ls024149810/"
page = requests.get(url, headers = {'user-agent':'mozilla/5.0'})
soup = BeautifulSoup(page.content, "html.parser")
df = pd.DataFrame(
[
{
i.select_one("h3 > a").text : i.select_one('span:-soup-contains("Gross:") + span').text
for i in soup.select(".lister-item")
}
]
).T.reset_index()
df.columns = ["Title", "Gross"]
df
Answered By - QHarr
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.