Issue
I'm trying to populate cookies from a domain using this library browser_cookie3. It appears to be doing fine. However, the only and main problem is that I can't figure out any way how to supply proxies within this library to get cookies from the location the proxy is from.
For example, if I use this domain www.nordstrom.com
within that library and execute the script below:
import browser_cookie3
cj = browser_cookie3.chrome(domain_name='www.nordstrom.com')
for item in cj:
if not 'internationalshippref' in item.name:
continue
cookie = f'{item.name}={item.value}'
break
print(cookie)
I always get the following result as my current location is Bangladesh:
internationalshippref=preferredcountry=BD&preferredcurrency=BDT&preferredcountryname=Bangladesh
How to get cookies from the above site using proxies within browser_cookie3 or any other library?
Solution
The website has some basic security against scraping. But using playwright, I was able to get to their website and get the cookies without much hassle. Follow this self-explanatory small sample to start the browser with proxies enabled and get the cookies:
from playwright.sync_api import sync_playwright
def get_proxy(server, user=None, password=None):
if user and password:
return {'server': server, 'username': user, 'password': password}
else:
return {'server': server}
def get_cookies(user_agent, proxy=None):
with sync_playwright() as p:
browser = p.firefox.launch(headless=True, proxy=proxy)
context = browser.new_context(no_viewport=True, user_agent=user_agent)
page = context.new_page()
page.goto("https://www.nordstrom.com")
with page.expect_navigation(url="https://www.nordstrom.com/", wait_until='load'):
pass
cookies = context.cookies()
browser.close()
return cookies
proxy = get_proxy(server='http://my.server.com:8282', user='optional', password='optional')
print(get_cookies('my useragent', proxy))
Output
[{'name': 'rfx-forex-rate', 'value': 'currencyCode=USD&exchangeRate=1"eId=0', 'domain': 'www.nordstrom.com', 'path': '/', 'expires': 1656083650, 'httpOnly': False, 'secure': True, 'sameSite': 'None'}, {'name': 'internationalshippref', 'value': 'preferredcountry=US&preferredcurrency=USD&preferredcountryname=United%20States', 'domain': 'www.nordstrom.com', 'path': '/', 'expires': 1971440050, 'httpOnly': False, 'secure': True, 'sameSite': 'None'}, {'name': 'no-track', 'value': 'ccpa=false', 'domain': 'www.nordstrom.com', 'path': '/', 'expires': 1971440050, 'httpOnly': False, 'secure': True, 'sameSite': 'None'}, {'name': 'nordstrom', 'value': 'bagcount=0&firstname=&ispinned=False&isSocial=False&shopperattr=||0|False|-1&shopperid=c38c25da4c2542fd873e7a88d0ba163f&USERNAME=', 'domain': 'www.nordstrom.com', 'path': '/', 'expires': 1971440050, 'httpOnly': False, 'secure': True, 'sameSite': 'None'}, {'name': 'nui', 'value': 'firstVisit=2022-06-24T14%3A14%3A10.457Z&geoLocation=&isModified=false&lme=false', 'domain': 'www.nordstrom.com', 'path': '/', 'expires': 1971440050, 'httpOnly': False, 'secure': True, 'sameSite': 'None'}, {'name': 'session', 'value': 'FILTERSTATE=&RESULTBACK=&RETURNURL=http%3A%2F%2Fshop.nordstrom.com&SEARCHRETURNURL=http%3A%2F%2Fshop.nordstrom.com&FLSEmployeeNumber=&FLSRegisterNumber=&FLSStoreNumber=&FLSPOSType=&gctoken=&CookieDomain=&IsStoreModeActive=0', 'domain': 'www.nordstrom.com', 'path': '/', 'expires': -1, 'httpOnly': False, 'secure': True, 'sameSite': 'None'}, {'name': 'shoppertoken', 'value': 'shopperToken=eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJjMzhjMjVkYTRjMjU0MmZkODczZTdhODhkMGJhMTYzZiIsImF1ZCI6Imd1ZXN0IiwiaXNzIjoibm9yZHN0cm9tLWd1ZXN0LWF1dGgiLCJleHAiOjE5NzE2OTkyNTAsInJlZnJlc2giOjE2NTYwOTQ0NTAsImp0aSI6IjA2YmE5M2I3LTA3ZjMtNDRkMi1iYjc4LTQwYzBjYWFiZWI4MSIsImlhdCI6MTY1NjA4MDA1MH0.jyAHIejISNgu_cGmZh7k9R7iiB7HEwwDLc9g5ek79fz71yQn34kuwERAG4lZf3laZPUXJgakl3L-DScPLJ4FJ9j_kNUxjuw2Eg4rk7hIPvZ35kIqwtwbkrO8XjyhjgxTeXyAV5HCZa8QFO263REuI0gA1y9-MFA2fyGME3uWQruwB_q_6hfeR-Nyq8epBOuBRRqttLY6sV0sXACzRyPciqR3ykochm90DwG3H2PU4cYts6OO0wFqrnM_LhcMzD2AmiK7XegdwwKBlwzJcRqoiXu_OZFoMHPI2_eW3FFfED8A93jPyGYKmFm_Hm4RpItibGG27TJJRY0HmaO_BvqxKA', 'domain': 'www.nordstrom.com', 'path': '/', 'expires': 1971397793, 'httpOnly': False, 'secure': True, 'sameSite': 'None'}, {'name': 'usersession', 'value': 'CookieDomain=nordstrom.com&SessionId=1029b2c9-bbc5-45db-8454-202c6271ad8f', 'domain': 'www.nordstrom.com', 'path': '/', 'expires': -1, 'httpOnly': False, 'secure': True, 'sameSite': 'None'}, {'name': 'experiments', 'value': 'ExperimentId=789ff94f-d13c-4ebb-9303-433a542f3ae8', 'domain': '.nordstrom.com', 'path': '/', 'expires': 1971699250, 'httpOnly': False, 'secure': True, 'sameSite': 'None'}, {'name': 'Ad34bsY56', 'value': 'AxkjEJaBAQAAfjgIHurZtpHYD2QEPO5pusibS79jQ7brx8HiJfld2cp5Ie3MAUjwyyWcuJMswH8AAEB3AAAAAA|1|1|8421240d3766a87fc796cc577ffbc7cd05a87826', 'domain': '.nordstrom.com', 'path': '/', 'expires': 3233927649, 'httpOnly': False, 'secure': True, 'sameSite': 'None'}, {'name': 'Bd34bsY56', 'value': 'A6AoEJaBAQAANCRRvsL-3aoOFIk1xtv3Y6fYMRV0SY7IjL4nIEPc1ebkqh6SAUjwyyWcuJMswH8AAEB3AAAAAA==', 'domain': 'www.nordstrom.com', 'path': '/', 'expires': 1687637001, 'httpOnly': False, 'secure': True, 'sameSite': 'None'}]
Do keep in mind that calling get_cookies
repeatedly is very inefficient, since it spawns a resource-heavy browser every time. If you do need to get cookies repeatedly, I would suggest using something like multiprocessing to spawn another process, which keeps the browser alive inside it, and serves any request to get the cookies at the same time through queues.
Note:
About this line:
with page.expect_navigation(url="https://www.nordstrom.com/", wait_until='load'):
pass
This is because the website uses automatic redirection through javascript if you visit it without setting the appropriate headers and cookies. Therefore, as soon as we enter the website the first time, we wait for a bit for the redirect to happen. Once it does, we will get the cookies we want.
Update : As per comments below, I updated the code above to pass an additional user-agent parameter.
Answered By - Charchit
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.