Issue
Brand new to learning Python, but very familiar with Google Sheets -- I'm essentially trying to mimic the "filter" function but cannot find anything on it.
The goal of my script is to pull the social media tags of NBA players (from the URLs).
I have it working to pull all links, but want to clean up my code so basically there's an if statement saying
If my results contain (https://www.facebook.com"), (https://www.twitter.com") or (https://www.instagram.com"), that would be the only info pulled.
Right now, it looks more like this:
It isn't the end of the world, because I can paste into a Google Sheet and clean, but it would be really nice to learn something like this.
from bs4 import BeautifulSoup
import requests
def get_profile(url):
profiles = []
req = requests.get(url)
soup = BeautifulSoup(req.text, 'html.parser')
container = soup.find('div', attrs={'class', 'main-container'})
for profile in container.find_all('a'):
profiles.append(profile.get('href'))
for profile in profiles:
print(profile)
get_profile('https://basketball.realgm.com/player/Carmelo-Anthony/Summary/452')
get_profile('https://basketball.realgm.com/player/LeBron-James/Summary/250')
Solution
You can use the in
keyword to search for substrings. In your case, you could check each profile like so:
if "https://www.facebook.com" in profile:
print(profile)
in
returns True if it finds the substring.
Answered By - blueharen
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.