Issue
I am new in Python and scrapy. I have a project to fetch data from a aspx webpages and store the values. There are 1000 pages like this that should I fetch the values everydays. it seems easy but I couldn't do it.
this is the webpage http://www.tsetmc.com/Loader.aspx?ParTree=151311&i=35366681030756042 that I should scrape! I look at the code from chrome
the red box is updating everyday, this is the the sourepage
<div class="aspNetHidden">
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="" />
</div>
<div id="tabs" class="InsInfoTab"></div>
<div id="MainBox" class="box1 zFull silver">
<div id="MainContent" class="tabcontent content">
<div id="TopBox"></div>
all data that i need included in TobBox ,
The network tab in chrome shows the data translation but there is not the information that i want to fetch
I prefer to use scrapy since I have many pages to fetch everyday . I also used scrapy.FromRequest
but i didnt get the information of < div id='topbox' ...>
that i want! which means the data that I get is something esle forexample one of the valie that i need is 28688000000
but the data from below code is something else.
import scrapy
class SpidyQuotes(scrapy.Spider):
name = 'spidyquotes-viewstate'
start_urls = ['http://tsetmc.com/Loader.aspx?ParTree=151311&i=35366681030756042']
download_delay = 1.5
def parse(self, response):
self.log('A response from %s just arrived!' % response.url)
return scrapy.FormRequest(
'http://www.tsetmc.com/tsev2/data/instinfofast.aspx?i=35366681030756042&c=23%20',
)
Solution
I didn't set up a scrapy project for this, but could you try it anyhow?
import requests
headers = {
"referer": "http://www.tsetmc.com/Loader.aspx?ParTree=151311&i=35366681030756042",
"user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36",
"X-Requested-With": "XMLHttpRequest"
}
print(requests.get("http://www.tsetmc.com/tsev2/data/instinfofast.aspx?i=35366681030756042&c=23%20", headers=headers).text)
Output:
12:30:00,A ,22860,22500,22600,23040,23500,21890,29025,107770816,2424554373560,1,20201111,123000;;44@121810@22850@22850@254506@60,2@2400@22840@22860@4652@6,3@36529@22810@22880@629@1,;95523,610991,381092;;;;0;
Which kinda looks like your data.
EDIT:
I'm pretty sure this is the data you're after. Some parts of the table might be static, but I'm not 100% sure. However, the response depends on the time you make it. Give it a try a few times and you'll get different results.
For example:
12:29:59,A ,23620,23540,23100,22500,23620,23100,6812,32872444,773767679280,0,20201114,122959;99/8/24 13:31:11,F,1247591.37,<div class='pn'>26516.85</div> 2.17%,47421383985982830,5039085007,61918112976496,730494,P,2229923171,42233712134879,761621,P,2247810,86754663785,3726,;1980@12346584@23620@23660@434@1,11@7181@23610@23700@4000@2,28@61104@23600@23790@2000@1,;95579,611671,381092;31872444,1000000,0,20603084,12269360,2242,1,0,3575,8;;;0;
Also, you can get the entire .csv
file with historical data.
Here's how:
import requests
headers = {
"referer": "http://www.tsetmc.com/Loader.aspx?ParTree=151311&i=35366681030756042",
"user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36",
"X-Requested-With": "XMLHttpRequest"
}
export_url = "http://tsetmc.com/tsev2/data/Export-txt.aspx?t=i&a=1&b=0&i=35366681030756042"
export = requests.get(export_url, headers=headers).text
with open("oil_data.csv", "w") as f:
f.write(export)
And this is what you get back:
Answered By - baduker
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.