Issue
I'm using BeautifulSoup to scrape an html page where the information I need are stored in a code like this:
<a class=" l00_PR_lTitleonContext" href="site0.html"> Title 0 </a>
<a class=" l01_PR_lTitleonContext" href="site1.html"> Title 1 </a>
<a class=" l02_PR_lTitleonContext" href="site2.html"> Title 2 </a>
[...]
I'd like to get "Title 0", "Title 1" and "Title 2" but the class name change for each item, so I'm using regex like this:
titles = soup.findAll("a", attrs={"class": re.compile('^TitleonContext.*')})
for title in titles:
print(title)
But it's not working (nothing is printed). What am I doing wrong?
Solution
Try using the following regex instead re.compile(r'.*TitleonContext')
or re.compile('.*TitleonContext')
, otherwise you're looking for this value to be started with (^
).
Answered By - EvgenyKolyakov
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.