Issue
I'm pretty new to Python, and I have a Python function that's supposed to get the HTML from a Wikipedia page (https://en.wikipedia.org/wiki/List_of_largest_cities_of_U.S._states_and_territories_by_population) and, for the purposes of this question, get the HTML in the first column in each row. I'm using Python and BeautifulSoup4.
def getStates():
page = requests.get("https://en.wikipedia.org/wiki/List_of_largest_cities_of_U.S._states_and_territories_by_population")
soup = BeautifulSoup(page.text, "html.parser")
table = soup.find("tbody")
rows = table.findAll("tr")
for row in rows:
columns = row.findAll("td")
print(columns[0])
The "columns" variable should be a list, which I know because:
print(columns)
gives me multiple lists of HTML (because of the for loop) enclosed in square brackets and commas.print(len(columns))
returns "9", meaning there's 9 columns in each row, which can be confirmed by counting the columns in the Wikipedia page.The
findAll()
function returns a list, as shown in the BS4 documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#find-all
However, if I do print(columns[0])
, or any index, I get the following error: IndexError: list index out of range
. And so, can someone give me any idea as to what I'm doing wrong? I feel like I'm making an obvious mistake here but trying to search up this problem didn't yield any results.
Solution
I had a list of lists, but as @juanpa.arrivillaga said, I didn't realize the first list (in index 0) was empty.
Answered By - sping
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.