Issue
I have this huge netflix dataset which I am trying to see which actors appeared in the most movies/tv shows specifically in America. First, I created a list of unique actors from the dataset. Then created a nested for loop to loop through each name in list3(containing unique actors which checked each row in df3(filtered dataset with 2000+rows) if the column cast contained the current actors name from list3. I believe using iterrows takes too long
myDict1 = {}
for name in list3:
if name not in myDict1:
myDict1[name] = 0
for index, row in df3.iterrows():
if name in row["cast"]:
myDict1[name] += 1
myDict1
Title | cast |
---|---|
Movie1 | Robert De Niro, Al Pacino, Tarantino |
Movie2 | Tom Hanks, Robert De Niro, Tom Cruise |
Movie3 | Tom Cruise, Zendaya, Seth Rogen |
I want my output to be like this:
Name | Count |
---|---|
Robert De Niro | 2 |
Tom Cruise | 2 |
Solution
Use
out = df['cast'].str.split(', ').explode().value_counts()
out = pd.DataFrame({'Name': out.index, 'Count': out.values})
>>> out
Name Count
0 Tom Cruise 2
1 Robert De Niro 2
2 Zendaya 1
3 Seth Rogen 1
4 Tarantino 1
5 Al Pacino 1
6 Tom Hanks 1
Answered By - Amit Vikram Singh
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.