Issue
I am currently facing difficulty retrieving the name 'Terence Crawford' from an HTML segment. The challenge lies in excluding the span element, which is present within the same parent element.
<td colspan="3" style="position:relative;" class="defaultTitleAlign">
<h1 style="display:inline-block;margin-right:5px;line-height:30px;">
<span style="font-weight:bold;"><i class="fas fa-crown" style="color:#f6b501 !important;"></i></span>
"Terence Crawford"
</h1>
<div style="width:100%;position:relative;margin-top:5px;">
</div>
</td>
I attempted to retrieve the name by specifying both the class attribute 'defaultTitleAlign' and the style attribute 'display:inline-block;margin-right:5px;line-height:30px;', but it only returns '/n' instead of the actual name. Even when targeting the entire content of the h1 element, the name is not being displayed.
In [9]: response.xpath("//td[@class='defaultTitleAlign']/h1/text()").get()
Out[9]: '\n '
Solution
You can use the getall()
method to collect all of the text()
from the given selector, then you can will find the section you are looking for in the returned list.
For example:
In [1]: from scrapy.selector import Selector
In [2]: html = """<td colspan="3" style="position:relative;" class="defaultTitleAlign">
...: <h1 style="display:inline-block;margin-right:5px;line-height:30px;">
...: <span style="font-weight:bold;"><i class="fas fa-crown" style="color:#f6b501 !important;"></i></span>
...: "Terence Crawford"
...: </h1>
...: <div style="width:100%;position:relative;margin-top:5px;">
...: </div>
...: </td>"""
In [4]: response = Selector(text=html)
In [5]: text_list = response.xpath("//td[@class='defaultTitleAlign']/h1//text()").getall()
In [6]: text = text_list[1].strip()
In [7]: text
Out[7]: '"Terence Crawford"'
Answered By - Alexander
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.