Wednesday, February 7, 2024

[FIXED] How to Extract the Name 'Terence Crawford' from an HTML Segment, Excluding the Span Element?

February 07, 2024 css-selectors, scrapy, web-scraping, xpath No comments

Issue

I am currently facing difficulty retrieving the name 'Terence Crawford' from an HTML segment. The challenge lies in excluding the span element, which is present within the same parent element.

<td colspan="3" style="position:relative;" class="defaultTitleAlign">
<h1 style="display:inline-block;margin-right:5px;line-height:30px;">
                        <span style="font-weight:bold;"><i class="fas fa-crown" style="color:#f6b501 !important;"></i></span>
                    "Terence Crawford"
    </h1>
<div style="width:100%;position:relative;margin-top:5px;">
</div>
</td>

I attempted to retrieve the name by specifying both the class attribute 'defaultTitleAlign' and the style attribute 'display:inline-block;margin-right:5px;line-height:30px;', but it only returns '/n' instead of the actual name. Even when targeting the entire content of the h1 element, the name is not being displayed.

In [9]: response.xpath("//td[@class='defaultTitleAlign']/h1/text()").get()
Out[9]: '\n                        '

Solution

You can use the getall() method to collect all of the text() from the given selector, then you can will find the section you are looking for in the returned list.

For example:

In [1]: from scrapy.selector import Selector

In [2]: html = """<td colspan="3" style="position:relative;" class="defaultTitleAlign">
   ...: <h1 style="display:inline-block;margin-right:5px;line-height:30px;">
   ...:                         <span style="font-weight:bold;"><i class="fas fa-crown" style="color:#f6b501 !important;"></i></span>
   ...:                     "Terence Crawford"
   ...:     </h1>
   ...: <div style="width:100%;position:relative;margin-top:5px;">
   ...: </div>
   ...: </td>"""

In [4]: response = Selector(text=html)

In [5]: text_list = response.xpath("//td[@class='defaultTitleAlign']/h1//text()").getall()

In [6]: text = text_list[1].strip()

In [7]: text
Out[7]: '"Terence Crawford"'

Answered By - Alexander

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, February 7, 2024

[FIXED] How to Extract the Name 'Terence Crawford' from an HTML Segment, Excluding the Span Element?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels