Issue
I am trying to take three selenium.text objects scraped from a website and I want to put them in a table format so that I have a array of 3 columns by 25 rows.
Example of desired end output:
Country | Date | Election Type |
---|---|---|
Zambia | August 12, 2021 (All day) | General |
Current code below:
# Installing selenium in Jupyter notebook
!pip install selenium
#checking my file path since we'll need to make sure our webdriver is in the same path
import os
import sys
os.path.dirname(sys.executable)
#Opens webbrowser chrome
from selenium import webdriver
#Manually directing the webdriver path and specifying the Chrome browser
browser = webdriver.Chrome('/Users/peterschoffelen/Documents/Copulus/chromedriver')
type(browser)
#Pulls up the webpage for the elections calendar
browser.get('https://www.ndi.org/elections-calendar')
#Clicking load more button
#tools needed
import time
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
#Finding the link
linkElem = browser.find_element_by_link_text('LOAD MORE')
type(linkElem)
#linkElem.click() #Clicks the load more button to make sure all available links are shown
while True:
try:
loadMoreButton = browser.find_element_by_link_text('LOAD MORE')
time.sleep(2)
linkElem.click()
time.sleep(5)
except Exception as e:
print (e)
break
print ("Complete")
time.sleep(10)
#creating an object for each of the elements I am interested in
election_date = browser.find_elements_by_css_selector('span.date-display-single')
election_country = browser.find_elements_by_css_selector('div.election-title')
election_type = browser.find_elements_by_css_selector('div.election-type')
#checking the type for the created objects and looking at the first element in each object
print(type(election_date))
print(type(election_date[0]))
print(election_date[0])
#Our above checking showed us that we need to specfically extract the text from the elements we scraped
print(election_date[0].text)
print(election_country[0].text)
print(election_type[0].text)
#With this number being greater then 10, we can be confident we have gotten all the elections available
print(len(election_date))
#Creating that list as an object
elections = []
for d, t, c in zip(election_date, election_type, election_country):
date_text = d.text
type_text = t.text
country_text = c.text
elections.append(country_text)
elections.append(type_text)
elections.append(date_text)
#Show us the whole table we have now created of three columns
print(elections)
#
print(elections[0:3])
import numpy as np
democracy = np.array(elections)
#Shows that I just have created a single column 75 row arrary which is not what I want :(
print(democracy.shape)
Thanks in advance for any help!
Solution
If I replace your 3 append lines with elections.append([country_text, type_text, date_text])
I get out a 25 by 3, which I think is what you wanted. Output looks like:
[['ZAMBIA', 'GENERAL', 'AUGUST 12, 2021 (ALL DAY)'], ['MOROCCO', 'GENERAL', 'SEPTEMBER 8, 2021 (ALL DAY)'], ['RUSSIA', 'LEGISLATIVE', 'SEPTEMBER 19, 2021 (ALL DAY)'], ['HAITI', 'PRESIDENTIAL AND PARLIAMENTARY', 'SEPTEMBER 26, 2021 (ALL DAY)'], ['SOMALIA', 'PRESIDENTIAL', 'OCTOBER 10, 2021 (ALL DAY)'], ['IRAQ', 'PARLIAMENTARY', 'OCTOBER 10, 2021 (ALL DAY)'], ['BULGARIA', 'PRESIDENTIAL', 'OCTOBER 20, 2021 (ALL DAY)'], ['CHAD', 'LEGISLATIVE', 'OCTOBER 24, 2021 (ALL DAY)'], ['UZBEKISTAN', 'GENERAL', 'OCTOBER 24, 2021 (ALL DAY)'], ['CAPE VERDE', 'PRESIDENTIAL', 'OCTOBER 2021'], ['MALI', 'REFERENDUM', 'OCTOBER 31, 2021 (ALL DAY)'], ['NICARAGUA', 'GENERAL', 'NOVEMBER 7, 2021 (ALL DAY)'], ['ARGENTINA', 'LEGISLATIVE', 'NOVEMBER 14, 2021 (ALL DAY)'], ['VENEZUELA', 'MUNICIPAL, REGIONAL', 'NOVEMBER 21, 2021 (ALL DAY)'], ['CHILE', 'GENERAL', 'NOVEMBER 21, 2021 (ALL DAY)'], ['HONDURAS', 'GENERAL', 'NOVEMBER 28, 2021 (ALL DAY)'], ['THE GAMBIA', 'PRESIDENTIAL', 'DECEMBER 4, 2021 (ALL DAY)'], ['HONG KONG', 'LEGISLATIVE', 'DECEMBER 19, 2021 (ALL DAY)'], ['COSTA RICA', 'GENERAL', 'FEBRUARY 6, 2022 (ALL DAY)'], ['HONG KONG', 'EXECUTIVE', 'MARCH 27, 2022 (ALL DAY)'], ['COLOMBIA', 'PRESIDENTIAL', 'MAY 29, 2022 (ALL DAY)'], ['INDIA', 'PRESIDENTIAL', 'JULY 2022'], ['KENYA', 'GENERAL', 'AUGUST 9, 2022 (ALL DAY)'], ['BRAZIL', 'GENERAL', 'OCTOBER 2, 2022 (ALL DAY)'], ['NIGERIA', 'GENERAL', 'FEBRUARY 18, 2023 (ALL DAY)']]
The first 3 items are: [['ZAMBIA', 'GENERAL', 'AUGUST 12, 2021 (ALL DAY)'], ['MOROCCO', 'GENERAL', 'SEPTEMBER 8, 2021 (ALL DAY)'], ['RUSSIA', 'LEGISLATIVE', 'SEPTEMBER 19, 2021 (ALL DAY)']] and the dimensions are: (25, 3)
Answered By - Jeremy Kahan
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.