Issue
I'm trying to scrape a javascript loaded website https://e-consulta.sunat.gob.pe/cl-at-ittipcam/tcS01Alias by using selenium and beautifulsoup 4.
However, when trying to retrieve an element or subitem (a sub-branch) from the tree, i get this error
bloquefecha=bloque.find('div[@class="date"]').text
AttributeError: 'NoneType' object has no attribute 'text'
i'm attaching HERE a snapshot of my code and the developers console for illustrative purposes
Here is my code:
def beautifulseleniumsunat2():
navegador = webdriver.Chrome()
navegador.get("https://e-consulta.sunat.gob.pe/cl-at-ittipcam/tcS01Alias")
time.sleep(7) # esperamos 7 segundos a que cargue la pagina
pagsunat = navegador.page_source
soup = BeautifulSoup(pagsunat, "html.parser")
print (soup.prettify())
bloquesdias2 = soup.select('td[class*="table-bordered calendar-day current"]')
listafecha = []
listacompra=[]
listaventa=[]
for bloque in bloquesdias2:
bloquefecha=bloque.find('div[@class="date"]') #ALSO tried with findall and iterating with FOR loop on each element but ERROR says it's not iterable
listafecha.append(bloquefecha.text)
bloquecompra=bloque.find('div[@class="event normal-all-day begin end"]') #ALSO tried with findall and iterating with FOR loop on each element but ERROR says it's not iterable
listacompra.append(bloquecompra.text)
bloqueventa = bloque.find('div[@class="event pap-all-day begin end"]') #ALSO tried with findall and iterating with FOR loop on each element but ERROR says it's not iterable
listaventa.append(bloquecompra.text)
listafinal=[listacompra,listaventa,listafecha]
print (listafinal)
Solution
What happens?
As mentioned by Aziz Sonawalla you have to pass the class as separat argument to find()
but that wont fix all your issues. Cause if elements not available it will raise an error again e.g. if there is no compra / ventra entry.
How to fix that ?
You have to fetch the error - try
will give you the result if there is no error except
will set result to empty string.
try:
bloquecompra = day.select_one('div[class*="normal-all-day"]').get_text().split()[1]
except:
bloquecompra = ''
Example
You can replace all your code after print (soup.prettify())
:
data = []
for day in soup.select('table.calendar-table.table.table-condensed > tbody td[class*="current"]'):
bloquefecha = day.select_one('div.date').get_text()
try:
bloquecompra = day.select_one('div[class*="normal-all-day"]').get_text().split()[1]
except:
bloquecompra = ''
try:
bloqueventa = day.select_one('div[class*="pap-all-day"]').get_text().split()[1]
except:
bloqueventa = ''
data.append(';'.join([bloquefecha,bloquecompra,bloqueventa]))
data
Output
['1;3.618;3.624',
'2;;',
'3;;',
'4;;',
'5;3.624;3.628',
'6;3.627;3.631',
'7;3.625;3.630',
'8;3.620;3.623',
'9;3.610;3.615',
'10;;',
'11;;',
'12;3.615;3.618',
'13;3.606;3.608',
'14;3.610;3.615',
'15;3.610;3.613',
'16;3.610;3.614',
'17;;',
'18;;',
'19;3.609;3.617',
'20;3.611;3.615',
'21;3.612;3.615',
'22;3.618;3.622',
'23;;',
'24;;',
'25;;',
'26;;',
'27;;',
'28;;',
'29;;',
'30;;',
'31;;']
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.