Thursday, January 13, 2022

[FIXED] What is the trick to Scrape the NSE page?

January 13, 2022 python, python-3.x No comments

Issue

Basically, I am trying to scrape this link https://www.nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?segmentLink=17&instrument=OPTIDX&symbol=BANKNIFTY

This following code is the solution -

import requests
headers = {
    'Connection': 'keep-alive',
    'Cache-Control': 'max-age=0',
    'DNT': '1',
    'Upgrade-Insecure-Requests': '1',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.79 Safari/537.36',
    'Sec-Fetch-User': '?1',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'Sec-Fetch-Site': 'none',
    'Sec-Fetch-Mode': 'navigate',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'en-US,en;q=0.9,hi;q=0.8',
}

response = requests.get('https://www1.nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?segmentLink=17&instrument=OPTIDX&symbol=NIFTY', headers=headers, verify=True,timeout=(5, 14))
print(response.content)

It works perfectly on my laptop but it is not working on Google Collab or Heroku server or digital ocean as well as in motion hosting.

What is the catch here?

Solution

After testing in various environments, NSE blocks python-request as well as normal access to this website from most of the reputed cloud servers.

However, CURL works in DigitalOcean Bangalore and Amazon AWS Mumbai server. But if you convert that CURL data to Python request, that is blocked.

So here is a pythonic solution that will work in those cloud servers. It looks lame but I'm using it without digging deep -

import subprocess
import os
os.chdir(os.path.dirname(os.path.abspath(__file__)))

subprocess.Popen('curl "https://www.nseindia.com/api/quote-derivative?symbol=BANKNIFTY" -H "authority: beta.nseindia.com" -H "cache-control: max-age=0" -H "dnt: 1" -H "upgrade-insecure-requests: 1" -H "user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36" -H "sec-fetch-user: ?1" -H "accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9" -H "sec-fetch-site: none" -H "sec-fetch-mode: navigate" -H "accept-encoding: gzip, deflate, br" -H "accept-language: en-US,en;q=0.9,hi;q=0.8" --compressed  -o maxpain.txt', shell=True)

f=open("maxpain.txt","r")
var=f.read()
print(var)

It basically runs the curl function and sends the output to a file and read the file back. That's it.

Answered By - Amit Ghosh

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, January 13, 2022

[FIXED] What is the trick to Scrape the NSE page?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels