Wednesday, November 15, 2023

[FIXED] Scrapy - Unable to get the right data from the table

November 15, 2023 scrapy No comments

Issue

I am trying to pull the data from a particular table on this link - https://www.moneycontrol.com/mutual-funds/canara-robeco-blue-chip-equity-fund-direct-plan/portfolio-holdings/MCA212 enter image description here

The table ID in the HTML is - equityCompleteHoldingTable Please refer to the screenshot above, and help in getting the stock data as a dictionary from the website table.

Thanks.

What I tried In Scrapy Shell, I am trying the following commands -

scrapy shell 'https://www.moneycontrol.com/mutual-funds/canara-robeco-blue-chip-equity-fund-direct-plan/portfolio-holdings/MCA212'

table = response.xpath('//*[@id="equityCompleteHoldingTable"]')
rows = table.xpath('//tr')
row =  rows[2]
row.xpath('td//text()')[0].extract()

--- > returns "No. of Stocks". Here the extracted data is coming from a different table on the above webpage.

I have found that the class that this table is using is used in other tables as well. And one of those tables i actually returning the data "No. of Stocks".

What I expected I expected the data to come from the equityCompleteHoldingTable table (screenshot above)

Solution

Your primary problem is that you are not using relative xpath expressions.

For example rows = table.xpath("//tr") is an absolute xpath path. Absolute paths are parsed from the root of the page, regardless of how deeply nested the selector is.

A relative path query starts parsing from the current selector element. To use a relative xpath expression you only need to add a . as the very first character, similar to filesystem relative paths. For example: rows = table.xpath(".//tr")

With that in mind you will probably have more luck with the following:

>>> table = response.xpath('//*[@id="equityCompleteHoldingTable"]')
>>> rows = table.xpath('.//tr')
>>> row = rows[2]
>>> row.xpath('.//td/text()').extract()[3:]
['Banks', '30.99', '8247.9', '9.34%', '0.14%', '9.69% ', '7.66% ', '86.56 L', '0.00 ', 'Large Cap', '75.79']
>>>

In [1]: table = response.xpath('//*[@id="equityCompleteHoldingTable"]')

In [2]: rows = table.xpath('.//tr')

In [3]: row = rows[2]

In [4]: row.xpath('.//td//text()').getall()
Out[4]:
['\n                                                                                                            ',
 '\n                                                                                                        ',
 'ICICI Bank Ltd. ',
 '\n                                                ',
 'Banks',
 '30.99',
 '8247.9',
 '9.34%',
 '0.14%',
 '9.69% ',
 '(Aug 2022)',
 '7.66% ',
 '(Dec 2021)',
 '86.56 L',
 '0.00 ',
 'Large Cap',
 '75.79']

In [5]: cells = row.xpath('.//td//text()').getall()

In [6]: [i.strip() for i in cells]
Out[6]:
['',
 '',
 'ICICI Bank Ltd.',
 '',
 'Banks',
 '30.99',
 '8247.9',
 '9.34%',
 '0.14%',
 '9.69%',
 '(Aug 2022)',
 '7.66%',
 '(Dec 2021)',
 '86.56 L',
 '0.00',
 'Large Cap',
 '75.79']

Answered By - Alexander

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, November 15, 2023

[FIXED] Scrapy - Unable to get the right data from the table

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels