my URL is this
This works well in selecting all links from for A to Z.
link = s.get(url)
link_soup = BeautifulSoup(link.text, 'lxml')
links = (
.find_all("a", href=True)
But when I try to select_one #0-9
.find_all("a", href=True)
I get this error
SelectorSyntaxError: Malformed id selector at position 0
line 1:
How can I select only the links from "#0-9 and A-Z"? I know I can just use a for loop and use re to change the ending of the URL and manually scrape the links from there but is there a way to get the same results using select or bs4.
Thanks again for the help.
To answer the direct question you can use an attribute = value css selector to specify the id attribute and its value. The numbers are within "" and so do not pose an issue to the parser.'[id="0-9"]')
Or escape the leading digit using its Unicode code point (no following space needed in this case and can be abbreviated to \30)'#\\30-9')
However, you could specify a single pattern to extract all links in one go and without the additional up down walking of the DOM.
links = ['' + i['href'] for i in'h2:not(:has(#See_also)) + ul a')]
Answered By - QHarr
Post a Comment
Note: Only a member of this blog may post a comment.