Issue
I'm trying to scrape an html table with bs4, but my code is not working. I'd like to get the tds row data information so that I can write them in a csv file. this is my html code:
<table class="sc-jAaTju bVEWLO">
<thead>
<tr>
<td width="10%">Rank</td>
<td>Trending Topic</td>
<td width="30%">Tweet Volume</td>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td><a href="http:///example.com/search?q=%23One" target="_blank" without="true" rel="noopener noreferrer">#One</a></td>
<td>1006.4K tweets</td>
</tr>
<tr>
<td>2</td>
<td><a href="http:///example.com/search?q=%23Two" target="_blank" without="true" rel="noopener noreferrer">#Two</a></td>
<td>1028.7K tweets</td>
</tr>
<tr>
<td>3</td>
<td><a href="http:///example.com/search?q=%23Three" target="_blank" without="true" rel="noopener noreferrer">#Three</a></td>
<td>Less than 10K tweets</td>
</tr>
</tbody>
</table>
This is my first try:
url = requests.get(f"https://www.exportdata.io/trends/italy/2020-01-01/0")
soup = BeautifulSoup(url.text, "html.parser")
table = soup.find_all("table", attrs={"class":"sc-jAaTju bVEWLO"})
And my second one:
tables = soup.find_all('table')
for table in tables:
td = tables.td.text.strip()
But neither are working. What am I missing? Thank you
Solution
the page loads dynamically, so you need to find the request and substitute the date and time into it
import requests
import pandas as pd
url = "https://api.exportdata.io/trends/locations/it?date=2020-01-01&hour=0"
response = requests.get(url)
df = pd.DataFrame(response.json()).fillna('Less than 10K tweets')
print(df.to_string(columns=['name', 'tweet_volume']))
OUTPUT:
name tweet_volume
0 #lannocheverra Less than 10K tweets
1 Happy New Year 4948992.0
2 Buon 2020 18359.0
3 #Mattarella 19304.0
4 #skamfrance Less than 10K tweets
5 Mariah Carey 36853.0
6 #GliAristogatti Less than 10K tweets
7 Orietta Berti Less than 10K tweets
8 Gigi D'Alessio Less than 10K tweets
9 Auguriiiii Less than 10K tweets
10 #NewYear 163253.0
11 Welcome 2020 101403.0
12 Romina Power Less than 10K tweets
13 Auguri Matteo Less than 10K tweets
14 Al Bano Less than 10K tweets
15 fabrizio moro Less than 10K tweets
16 Panicucci Less than 10K tweets
17 John Boyega 78097.0
18 Inizio Less than 10K tweets
19 Auguri Silvia Less than 10K tweets
20 Auguri Marco Less than 10K tweets
21 #Ghostbusters Less than 10K tweets
22 #thebluesbrothers Less than 10K tweets
23 #FeliceAnnoNuovo Less than 10K tweets
24 #bottidicapodanno Less than 10K tweets
25 #ventiventi Less than 10K tweets
26 #quirinale Less than 10K tweets
Answered By - Sergey K
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.