Issue
A few months ago I followed this Scrapy shell method to scrape a real estate listings webpage and it worked perfectly.
I pulled my cookie
and user-agent
text from Firefox (Developer tools -> Headers) when the target URL is loaded, and I would get a successful response (200)
and be able to pull items from response.xpath
.
For example:
url = 'https://www.realtor.com/realestateandhomes-search/McLean_VA/type-single-family-home/pg-1?pos=39.126499,-77.43902,38.685678,-76.779841,11&qdm=true'
cookie = '__fp=7387663eca6ba5161d1c58711dd65164; split=n; split_tcv=105; __vst=742f3db3-c514-4032-8650-21d4ccfdd85f; __ssn=a0587b1b-bc15-4e3d-8738-3fc8757071ab; __ssnstarttime=1656813474; criteria=pg%3D1%26sprefix%3D%252Frealestateandhomes-search%26typ%3D1%26area_type%3Dcity%26search_type%3Dcity%26city%3DMcLean%26state_code%3DVA%26state_id%3DVA%26lat%3D38.9435449%26long%3D-77.1929134%26county_fips%3D51059%26county_fips_multi%3D51059%26loc%3DMcLean%252C%2520VA%26locSlug%3DMcLean_VA%26county_needed_for_uniq%3Dfalse%26p…; _gid=GA1.2.260165497.1656813481; AMCV_AMCV_8853394255142B6A0A4C98A4%40AdobeOrg=-1124106680%7CMCMID%7C79412848632605408421861417717111497169%7CMCIDTS%7C19177%7CMCOPTOUT-1656820680s%7CNONE%7CvVersion%7C5.2.0; _fbp=fb.1.1656813480720.1479123934; AMCVS_AMCV_8853394255142B6A0A4C98A4%40AdobeOrg=1; adcloud={%22_les_v%22:%22y%2Crealtor.com%2C1656815313%22}; _clck=d7aq33|1|f2u|0; _clsk=sechzn|1656871412641|1|0|n.clarity.ms/collect; _uetsid=c56bc620fa7111ecb58a6f841dcc81b4; _uetvid=c56bcb70fa7111eca88d1bd5d241568e'
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Firefox/102.0'
(fetch(scrapy.Request(url=url, headers={'cookie': cookie, 'user-agent': user_agent})), response)
listings = json.loads(response.xpath('/html/body/script[1]/text()').getall()[0])['props']['pageProps']['searchResults']['home_search']['results']
Now I'm trying again a few months later (with an updated cookie) and I'm getting a 403 error -- the server understands the request but refuses to authorize it:
In [7]: (fetch(scrapy.Request(url=url, headers={'cookie': cookie, 'user-agent':
...: user_agent})), response)
2022-07-03 14:14:43 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.realtor.com/realestateandhomes-search/McLean_VA/type-single-family-home?pos=39.069149,-77.355927,38.742653,-76.862935,11&qdm=true&view=map> (referer: None)
Out[7]: (None, None)
Any thoughts on what I might try to get this working again? Thanks.
Solution
The cookie is not what's causing the problem. (see below) I think the issue here is that with 'view=map', its looking for a 'referer' key in the header dict (in addition to other header keys). I would suggest adding a key/pair of 'referer':"url" in your headers. Alternatively you can try less heavy approach:
import requests
from bs4 import BeautifulSoup
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:101.0) Gecko/20100101 Firefox/101.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'DNT': '1',
'Upgrade-Insecure-Requests': '1',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'none',
'Sec-Fetch-User': '?1',
'Connection': 'keep-alive',
'If-None-Match': '"f0267-ybK8wNq/yADu0m5N1CYhPqrXfaY"',
}
response = requests.get('https://www.realtor.com/realestateandhomes-search/McLean_VA/type-single-family-home/pg-1?pos=39.126499,-77.43902,38.685678,-76.779841,11&qdm=true', headers=headers)
sp = BeautifulSoup(response.text,'lxml')
results = sp.find_all('li',{'data-testid':'result-card'})
print(results[0])
output:
<li class="jsx-1881802087 component_property-card" data-testid="result-card"><div class="jsx-2775064451 fallBackImgWrap"><img alt="" class="jsx-2775064451 fallBackImg" src="data:image/svg+xml;base64,PD94bWwgdmVyc2lvbj0iMS4wIiBzdGFuZGFsb25lPSJubyI/Pgo8IURPQ1RZUEUgc3ZnIFBVQkxJQyAiLS8vVzNDLy9EVEQgU1ZHIDIwMDEwOTA0Ly9FTiIKICJodHRwOi8vd3d3LnczLm9yZy9UUi8yMDAxL1JFQy1TVkctMjAwMTA5MDQvRFREL3N2ZzEwLmR0ZCI+CjxzdmcgdmVyc2lvbj0iMS4wIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciCiB3aWR0aD0iNTEuMDAwMDAwcHQiIGhlaWdodD0iNTEuMDAwMDAwcHQiIHZpZXdCb3g9IjAgMCA1MS4wMDAwMDAgNTEuMDAwMDAwIgogcHJlc2VydmVBc3BlY3RSYXRpbz0ieE1pZFlNaWQgbWVldCI+Cgo8ZyB0cmFuc2Zvcm09InRyYW5zbGF0ZSgwLjAwMDAwMCw1MS4wMDAwMDApIHNjYWxlKDAuMTAwMDAwLC0wLjEwMDAwMCkiCmZpbGw9IiMwMDAwMDAiIHN0cm9rZT0ibm9uZSI+CjwvZz4KPC9zdmc+Cg=="/><div class="jsx-1030820614 pre-card-wrap pre-card-redesign"><div class="jsx-1030820614 broker-info"><div class="jsx-1030820614 ellipsis"><span class="jsx-1030820614">Brokered by<!-- --> </span><span class="jsx-1030820614" data-label="pc-brokered">Maram Realty, LLC</span></div></div></div><div class="jsx-11645185 card-box type-srp-result" data-id="6566177482" data-label="property-card" data-testid="property-card"><div class="jsx-11645185 photo-wrap" data-testid="pc-photo-wrap" id="6566177482"><a aria-label="Navigate to 2533 Flint Hill Rd Listing Detail Page" class="jsx-1534613990 card-anchor" data-testid="property-anchor" href="/realestateandhomes-detail/2533-Flint-Hill-Rd_Vienna_VA_22181_M65661-77482" rel="noopener" target="_self"><picture class="rui__mnm7tm-0 izqcFw" data-lazy="force-loaded"><source data-testid="img-webp" height="100%" srcset="https://ap.rdcpix.com/c2319074726c472738b73fd84b91a521l-m3065887895od-w480_h360_x2.webp, https://ap.rdcpix.com/c2319074726c472738b73fd84b91a521l-m3065887895od-w480_h360_x2.webp 2x" type="image/webp" width="100%"/><img alt="2533 Flint Hill Rd, Vienna, VA 22181" class="fade top" data-atf="true" data-fmp="true" data-label="pc-photo" data-src="https://ap.rdcpix.com/c2319074726c472738b73fd84b91a521l-m3065887895od-w480_h360_x2.jpg" height="100%" itemprop="image" src="https://ap.rdcpix.com/c2319074726c472738b73fd84b91a521l-m3065887895od-w480_h360_x2.jpg" srcset="https://ap.rdcpix.com/c2319074726c472738b73fd84b91a521l-m3065887895od-w480_h360_x2.jpg, https://ap.rdcpix.com/c2319074726c472738b73fd84b91a521l-m3065887895od-w480_h360_x2.jpg 2x" width="100%"/></picture></a><div class="jsx-11645185 save-wrap"><button class="rui__sc-1n2ow0n-0 eXhJxR save-btn" data-testid="save-button" type="button"></button></div></div><div class="jsx-2684807014 pc-top-left"><div class="jsx-2934195181 pc-mt-8"><div class="jsx-3121580332 label-wrap"><span class="rui__prshsf-0 jwtbrs property-label-margin" data-label="pc-new"><span>New - 4 hours ago</span></span></div></div></div><div class="jsx-2191194021 pc-bottom-left-multilisting"><div class="jsx-2934195181 pc-mt-8"></div></div><div class="jsx-11645185 detail-wrap fixed-wrapper-ldp-redesign has-cta" data-testid="property-detail"><div class="jsx-3853574337"><div class="jsx-3853574337 statusLabelSection" data-testid="forsale"><span class="jsx-3853574337 statusIcon solidIconGreen"></span><span class="jsx-3853574337 statusText">For Sale</span></div></div><div class="jsx-11645185 summary-wrap"><div class="jsx-11645185 property-wrap"><div class="jsx-11645185 ldp-redesign-price srp-page-price" data-label="pc-price-wrapper"><span class="rui__x3geed-0 kitA-dS" data-label="pc-price">$1,025,000</span></div><div class="jsx-11645185 prop-meta"><div class="jsx-946479843 meta-section srp_listMeta" data-testid="property-meta-container"><ul class="jsx-946479843 property-meta list-unstyled property-meta-srpPage"><li class="jsx-946479843 prop-meta srp_list" data-label="pc-meta-beds"><span class="jsx-946479843 meta-value" data-label="meta-value">3</span><span class="jsx-946479843 meta-label" data-label="meta-label">bed</span></li><li class="jsx-946479843 prop-meta srp_list" data-label="pc-meta-baths"><span class="jsx-946479843 meta-value" data-label="meta-value">2.5</span><span class="jsx-946479843 meta-label" data-label="meta-label">bath</span></li><li class="jsx-946479843 prop-meta srp_list" data-label="pc-meta-sqft"><span class="jsx-946479843 meta-value" data-label="meta-value">1,221</span><span class="jsx-946479843 meta-label" data-label="meta-label">sqft</span></li><li class="jsx-946479843 prop-meta srp_list" data-label="pc-meta-sqftlot"><span class="jsx-946479843 meta-value" data-label="meta-value">0.52</span><span class="jsx-946479843 meta-label" data-label="meta-label">acre lot</span></li></ul></div></div><div class="jsx-11645185 card-bottom"><div class="jsx-11645185 address ellipsis srp-page-address srp-address-redesign" data-label="pc-address">2533 Flint Hill Rd<!-- -->, <div class="jsx-11645185 address-second ellipsis" data-label="pc-address-second">Vienna<!-- -->, VA<!-- --> <!-- -->22181</div></div><div class="jsx-11645185 cta-wrap cta-wrap-redesign"><button aria-label="Email agent for 2533 Flint Hill Rd, Vienna, VA 22181" class="rui__ermeke-1 iIBjTV" data-testid="cta-button" data-toggle="modal" type="button">Email agent</button></div></div></div></div></div></div></div></li>```
Answered By - 1extralime
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.