Issue
Suppose there's a website that has a list of details of some companies, for example, name, HQ area, turnover, etc. How do I scrape that data and fill it into different columns (like name, turnover) with each row having the details of a separate company?
Solution
Google Sheets allow you to import html tables or list with the IMPORTHTML(url, query, index)
function.
For example, using the Wikipedia page List of largest companies by revenue as an example.
We want the data from the main table, so the first thing that we have to do, is to know what index it occupies in the page. To do this, we can use document.querySelectorAll('table')
or $$('table')
, as you can see from the result, the table that we want is in the position 5 of the array, so inside our google sheet we can use:
=IMPORTHTML("https://en.wikipedia.org/wiki/List_of_largest_companies_by_revenue","table",5)
From here, you should change the query parameter to list
and find what index it occupies within the page using the method described above. In any case, you could always use IMPORTXML(url, xpath_query)
, and knowing the XPath of the information, you could come up with a similar solution.
Answered By - Emel
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.