Issue
I am trying to pull the data from a particular table on this link - https://www.moneycontrol.com/mutual-funds/canara-robeco-blue-chip-equity-fund-direct-plan/portfolio-holdings/MCA212 enter image description here
The table ID in the HTML is - equityCompleteHoldingTable Please refer to the screenshot above, and help in getting the stock data as a dictionary from the website table.
Thanks.
What I tried In Scrapy Shell, I am trying the following commands -
scrapy shell 'https://www.moneycontrol.com/mutual-funds/canara-robeco-blue-chip-equity-fund-direct-plan/portfolio-holdings/MCA212'
table = response.xpath('//*[@id="equityCompleteHoldingTable"]')
rows = table.xpath('//tr')
row = rows[2]
row.xpath('td//text()')[0].extract()
--- > returns "No. of Stocks". Here the extracted data is coming from a different table on the above webpage.
I have found that the class that this table is using is used in other tables as well. And one of those tables i actually returning the data "No. of Stocks".
What I expected I expected the data to come from the equityCompleteHoldingTable table (screenshot above)
Solution
Your primary problem is that you are not using relative xpath expressions.
For example rows = table.xpath("//tr")
is an absolute xpath path. Absolute paths are parsed from the root of the page, regardless of how deeply nested the selector is.
A relative path query starts parsing from the current selector element. To use a relative xpath expression you only need to add a .
as the very first character, similar to filesystem relative paths. For example: rows = table.xpath(".//tr")
With that in mind you will probably have more luck with the following:
>>> table = response.xpath('//*[@id="equityCompleteHoldingTable"]')
>>> rows = table.xpath('.//tr')
>>> row = rows[2]
>>> row.xpath('.//td/text()').extract()[3:]
['Banks', '30.99', '8247.9', '9.34%', '0.14%', '9.69% ', '7.66% ', '86.56 L', '0.00 ', 'Large Cap', '75.79']
>>>
In [1]: table = response.xpath('//*[@id="equityCompleteHoldingTable"]')
In [2]: rows = table.xpath('.//tr')
In [3]: row = rows[2]
In [4]: row.xpath('.//td//text()').getall()
Out[4]:
['\n ',
'\n ',
'ICICI Bank Ltd. ',
'\n ',
'Banks',
'30.99',
'8247.9',
'9.34%',
'0.14%',
'9.69% ',
'(Aug 2022)',
'7.66% ',
'(Dec 2021)',
'86.56 L',
'0.00 ',
'Large Cap',
'75.79']
In [5]: cells = row.xpath('.//td//text()').getall()
In [6]: [i.strip() for i in cells]
Out[6]:
['',
'',
'ICICI Bank Ltd.',
'',
'Banks',
'30.99',
'8247.9',
'9.34%',
'0.14%',
'9.69%',
'(Aug 2022)',
'7.66%',
'(Dec 2021)',
'86.56 L',
'0.00',
'Large Cap',
'75.79']
Answered By - Alexander
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.