Issue
This is what the HTML looks like:
<div class="full-news none">
Demo: <a href="https://www.lolinez.com/?https://www.makemytrip.com"
rel="external noopener noreferrer" target="_blank">https://www.makemytrip.com</a>
<br/>
How can I remove this part from the href: https://www.lolinez.com/?
, so that the final output becomes like this:
<div class="full-news none">
Demo: <a href="https://www.makemytrip.com"
rel="external noopener noreferrer" target="_blank">https://www.makemytrip.com</a>
<br/>
I have tried using the decompose
function of beautiful soup, but it completely removes the entire tag, How can this be fixed?
Solution
Note Without additional context I would narrow down to following approaches
Option#1
Replace your substring the string
that you pass to BeautifulSoup
constructor:
soup = BeautifulSoup(YOUR_STRING.replace('https://www.lolinez.com/?',''), 'lxml')
Option#2
Replace the substring in your soup
you can select all the <a>
that contains www.lolinez.com
and replace the value of its href
:
for x in soup.select('a[href*="www.lolinez.com"]'):
x['href'] = x['href'].replace('https://www.lolinez.com/?','')
Example
import bs4, requests
from bs4 import BeautifulSoup
html='''
<a href="https://www.lolinez.com/?https://www.makemytrip.com" rel="external noopener noreferrer" target="_blank">https://www.makemytrip.com</a>
<a href="https://www.makemytrip.com" rel="external noopener noreferrer" target="_blank">https://www.makemytrip.com</a>
<a href="https://www.lolinez.com/?https://www.makemytrip.com" rel="external noopener noreferrer" target="_blank">https://www.makemytrip.com</a>
'''
soup = BeautifulSoup(html, 'lxml')
for x in soup.select('a[href*="www.lolinez.com"]'):
x['href'] = x['href'].replace('https://www.lolinez.com/?','')
soup
Output
<html><body><a href="https://www.makemytrip.com" rel="external noopener noreferrer" target="_blank">https://www.makemytrip.com</a><a href="https://www.makemytrip.com" rel="external noopener noreferrer" target="_blank">https://www.makemytrip.com</a><a href="https://www.makemytrip.com" rel="external noopener noreferrer" target="_blank">https://www.makemytrip.com</a></body></html>
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.