Issue
response:
['<td class="V2ligneB" valign="top">\r\n LINAIA\r\n </td>',
'<td class="V2ligneB" valign="top" title="[email protected]">\r\n PAILLEREAU Florent \r\n
</td>',
'<td class="V2ligneB" valign="top">\r\n 35000 RENNES\r\n </td>',
'<td class="V2ligneB" valign="top">\r\n \r\n </td>',
'<td class="V2ligneB" valign="top" align="center">\r\n \n <a href="javascript:void(0)" onclick="window.open(\'index.cfm?fuseaction=mEnt.ficheEntAW&uuid=2f89094e-4da1-4e1b-9ada-c16cea5e25f9&affDoc=false\',\'ficheEntreprise\',\'scrollbars=yes,width=700,height=750\')">Fiche</a>\n \r\n </td>']
I want to extract the value "[email protected]".
I have css selector as below
email = response.css('td::attr(title)')[1].get()
but this is not working and I am getting below error and I don't understand why
IndexError Traceback (most recent call last)
Input In [43], in <cell line: 1>()
----> 1 all.css('td::attr(title)')[1].get().strip()
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\parsel\selector.py:70, in SelectorList.__getitem__(self, pos)
69 def __getitem__(self, pos):
---> 70 o = super(SelectorList, self).__getitem__(pos)
71 return self.__class__(o) if isinstance(pos, slice) else o
IndexError: list index out of range
Solution
The structure of your html is strange but I've recreated your problem and used python + BeautifulSoup to get an answer using a try/except to find the tag that has a 'title' attribute:
from bs4 import BeautifulSoup
resp = ['<td class="V2ligneB" valign="top">\r\n LINAIA\r\n </td>',
'''<td class="V2ligneB" valign="top" title="[email protected]">\r\n PAILLEREAU Florent \r\n
</td>''',
'<td class="V2ligneB" valign="top">\r\n 35000 RENNES\r\n </td>',
'<td class="V2ligneB" valign="top">\r\n \r\n </td>',
'<td class="V2ligneB" valign="top" align="center">\r\n \n <a href="javascript:void(0)" onclick="window.open(\'index.cfm?fuseaction=mEnt.ficheEntAW&uuid=2f89094e-4da1-4e1b-9ada-c16cea5e25f9&affDoc=false\',\'ficheEntreprise\',\'scrollbars=yes,width=700,height=750\')">Fiche</a>\n \r\n </td>']
for row,html in enumerate(resp):
soup = BeautifulSoup(html,'html.parser')
try:
email = soup.find('td')['title']
print(email)
except KeyError:
print(f'Not found in row: {row}')
Answered By - childnick
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.