Issue
i'm new to all of this, so i need a little bit of help. For a uni project i am trying to extract ingedrients from a website and in general the code works how it should, but i just don't know how to get "Bärlauch" instead of "B%C3%A4rlauch" in the end.
I used beautifulsoup with the following code:
URL = [...]
links = []
for url in range(0,10):
req = requests.get(URL[url])
soup = bs(req.content, 'html.parser')
for link in soup.findAll('a'):
links.append(str(link.get('href')))
I don't get why it doesn't work as it should, eventhough the encoding already is utf-8. Maybe someone knows better.
Thanks!
Solution
URLs are URL-encoded. The response of a request ist a response
not a req
(uest).
URLS = [...]
links = []
for url in URLS:
response = requests.get(url)
soup = bs(response.content, 'html.parser')
for link in soup.find_all('a'):
links.append(urllib.parse.unquote(link.get('href')))
Answered By - Daniel
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.