Issue
I am trying to select all <table>
elements on some web pages with BeautifulSoup. The table elements do not have specific classes or ids.
import bs4
import requests
def get_keycode_soup(url):
res = requests.get(url)
res.raise_for_status()
return bs4.BeautifulSoup(res.text, features="html.parser")
def parse_qmk_soup():
qmk_soup = get_keycode_soup("https://docs.qmk.fm/#/keycodes")
tables = qmk_soup.select("table")
# pass line for breakpoint
pass
def main():
parse_qmk_soup()
if __name__ == "__main__":
main()
I have also tried selecting all the different table elements with
tables = qmk_soup.find_all("table")
# and
table_rows = qmk_soup.find_all("tr")
Whenever I pause the debugger on the pass
line, tables
is always None
.
I have tried some similar methods to this post and this post, but since there do not appear to be any other descriptive tags on the tables I'm trying to select, iterating feels inefficient.
Is there a way to simply select all the <table>
elements on their own?
Edit: it appears that the page requires JS to load the tables as suggested by @DeepSpace below. Additionally, see the answer from @MendelG regarding following where the data is loaded from in case you might obtain the data from the source.
Solution
If you inspect your browser's Network calls, and view the HTTP requests, you'll see that the data is loaded from a different website URL, which is:
https://docs.qmk.fm/keycodes.md?cache-bust=1706627991267
The thing is, it's really a markdown file (.md
). However, at least you obtain the original file
So, there isn't really any HTML to parse, to obtain it in a readable format.
Answered By - MendelG
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.