Issue
I'm trying to learn web scraping using beautifulsoup and I have implemented this code. But only movie titles are being written into the csv file but not Genre although both of them have been retrieved.
URL: http://www.imdb.com/search/title?sort=num_votes,desc&start=1&title_type=feature&year=1950,2012
f = csv.writer(open('movie-names.csv', 'w'))
f.writerow(['Title', 'Genre'])
pages = []
genre;
for i in range(1,2):
url = 'http://www.imdb.com/search/title?sort=num_votes,desc&start=1&title_type=feature&year=1950,2012'
pages.append(url)
for item in pages:
page = requests.get(item)
soup = BeautifulSoup(page.text, 'html.parser')
movie_titles = soup.find_all(class_ = 'lister-item-content')
for movie_title in movie_titles:
title = movie_title.find('a').contents[0]
genre = movie_title.find_all(class_ = 'genre')[0].get_text()
print(genre)
f.writerow([title, genre])
Solution
Use pandas
it is much easier to export data in CSV
.
from bs4 import BeautifulSoup
import requests
import pandas as pd
pages = []
for i in range(1,2):
url = 'http://www.imdb.com/search/title?sort=num_votes,desc&start=1&title_type=feature&year=1950,2012'
pages.append(url)
Movie_title=[]
Movie_genre=[]
for item in pages:
page = requests.get(item)
soup = BeautifulSoup(page.text, 'html.parser')
movie_titles = soup.select('.lister-item-content')
for movie_title in movie_titles:
title = movie_title.select_one('a').text
Movie_title.append(title)
genre = movie_title.select_one('.genre').text.replace('\n','')
Movie_genre.append(genre)
df = pd.DataFrame({"Movie_title":Movie_title,"Movie_genre":Movie_genre})
df.to_csv("movie-names.csv",index=False)
Output:
Answered By - KunduK
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.