Tuesday, November 9, 2021

[FIXED] Getting records from a table in a page using selenium, pandas and beautifulsoup triggered a search input

November 09, 2021 beautifulsoup, pandas, python, selenium No comments

Issue

I'm writing a python script to extract records of all people in a site using selenium, beautifulsoup and pandas. I, however don't know how to go about that because the site is designed such that someone has to search first before getting the result. For test purposes henceforth, I'm passing a search value and manipulating the same via selenium. The issue is that after writing the script on a python shell in ipython, I get the desirable results, but the same is throwing an error in a python file when running via python command.

code

from selenium import webdriver
from bs4 import BeautifulSoup
from time import sleep
import pandas as pd
import requests
import re

br.get(url)
content = br.page_source
soup = BeautifulSoup(content, 'lxml')
sleep(2)
sName = br.find_element_by_xpath("/html/body/div[1]/div[2]/section/div[2]/div/div/div/div/div/div/div[2]/form/div[1]/div/div/input")
sleep(3)
sName.send_keys("martin")
br.find_element_by_xpath("//*[@id='provider']/div[1]/div/div/div/button").click()
sleep(3)

table = soup.find('table')
tbody = table.find_all('tbody')
body = tbody.find_all('tr')
#
# get column heads
head = body[0]
body_rows = body[1:]
headings = []

for item in head.find_all('th'):
    item = (item.text).rstrip("\n")
    headings.append(item)

print(headings)
#declare an empty list for holding all records
all_rows = []

# loop through all table rows to get all table datas

for row_num in range(len(body_rows)):
    row = []
    for row_item in body_rows[row_num].find_all('td'):
        stripA = re.sub("(\xa0)|(\n)|,","",row_item.text)
        row.append(stripA)

all_rows.append(row)

# match each record to its field name
# cols = ['name', 'license', 'xxx', 'xxxx']
df = pd.DataFrame(data=all_rows, columns=headings)

Solution

You don't need the overhead of a browser or to worry about waits. You can simply mimic the post request the page makes

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd

data = {'search_register': '1', 'search_text': 'Martin'}
r = requests.post('https://osp.nckenya.com/ajax/public', data=data)
soup = bs(r.content, 'lxml')
results = pd.read_html(str(soup.select_one('#datatable2')))
print(results)

Answered By - QHarr

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, November 9, 2021

[FIXED] Getting records from a table in a page using selenium, pandas and beautifulsoup triggered a search input

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels