Issue
I wanna find a tag where two keywords are present in. For example, I wanna find a tag that includes an occurrence of 'Yankee' AND 'duck'. Code is below:
elif len(keywords) == 2:
keyword1 = keywords[0]
keyword2 = keywords[1]
print("Searching for product...")
keywordLinkFound = False
while keywordLinkFound is False:
html = self.driver.page_source
soup = BeautifulSoup(html, 'lxml')
try:
keywordLink = soup.find('loc', text=re.compile(keyword1 + keyword2)).text
return keywordLink
except AttributeError:
print("Product not found on site, retrying...")
time.sleep(monitorDelay)
self.driver.refresh()
break
And here is the xml i am trying to get:
<url>
<loc>
https://packershoes.com/products/copy-of-382-new-balance-m999jtc-1
</loc>
<lastmod>2018-12-04T21:49:25-05:00</lastmod>
<changefreq>daily</changefreq>
<image:image>
<image:loc>
https://cdn.shopify.com/s/files/1/0208/5268/products/NB999JTC-2_4391df07-a3a2-4c82-87b3-49d776096473.jpg?v=1543851653
</image:loc>
<image:title>NEW BALANCE M999JTC "MADE IN USA"</image:title>
</image:image>
</url>
<url>
<loc>
https://packershoes.com/products/copy-of-382-packer-x-new-era-new-york-yankee-duck-canvas-1
</loc>
<lastmod>2018-12-06T14:39:37-05:00</lastmod>
<changefreq>daily</changefreq>
<image:image>
<image:title>
NEW ERA JAPAN 59FIFTY NEW YORK YANKEES "DUCK CANVAS"
</image:title>
</image:image>
</url>
Solution
keyword1 + keyword2
is the string yankeeduck
, so you're searching for that string, and it won't match when the two words are not connected like that. You need to allow anything between them, as well as recognize them in the opposite order. So the regexp should be:
yankee.*duck|duck.*yankee
Therefore, the code should be:
regexp = "%s.*%s|%s.%s"%(keyword1, keyword2, keyword2, keyword1)
keywordLink = soup.find('loc', text=re.compile(regexp)).text
And in case the keywords contain characters that are special in regexp, you should escape them:
keyword1 = re.escape(keywords[0])
keyword2 = re.escape(keywords[1])
Answered By - Barmar
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.