Issue
Any help is appreciated. New info as of 10.5.2022.
Need help with Selenium trying to scrape list of cars from carmax site. url = 'https://www.carmax.com/cars?includenontransferables=False&year=2018-2023&mileage=30000&price=18000-30000'
Outside of selenium, I am able submit URL (via Chrome on mac) and then click on "SEE MORE MATCHES" multiple times. It add 22 car tiles each time. I want to get the full 228 cars that match the filter.
When I use selenium, I get the initial page with list of 22 tiles (cars). But when I click manually on "SEE MORE MATCHES" (inside Selenium browser) I get the "We're Sorry, an error occured"
So on the selenium browser window I manually pasted the URL and I got a message:
Access Denied
You don't have permission to access "http://www.carmax.com/cars?" on this server.
Reference #18.61f1eb8.1664947333.87596fdb
Below is the code I am trying to run to loop thru all the pages to see all 228 car tiles.
# The following works and I see a list of cars
# browser = chromedriver()
# browser.get('https://www.carmax.com/cars?includenontransferables=False&year=2018-2023&mileage=30000&price=18000-30000')
# following works because the "SEE MORE MATCHES" @ bottom is display in browser
e = browser.find_element(By.ID, "see-more")
eBut = e.find_element(By.XPATH, ".//a")
print(eBut.text)
# The following works because button lights up in blue
hover = ActionChains(browser).move_to_element(eBut)
hover.perform()
# following causes an error "We're sorry, An error occurred in your search."
eBut.click()
time.sleep(3)
I check the network log via Chrome log. When I manually clicks on button... NOTE visitorID on Request URL
> General
Request URL: https://www.carmax.com/cars/api/search/run?uri=%2Fcars%2Fcrossovers%3Fyear%3D2018-2023%26mileage%3D30000%26price%3D18000-32000&skip=48&take=24&zipCode=76210&radius=radius-nationwide&shipping=-1&sort=lowest-price&scoringProfile=segment_4&visitorID=509f6eb9-eddb-4472-b412-b7e4d73fa263
Request Method: GET
Status Code: 200
Remote Address: [2600:1404:6400:1988::1c4e]:443
Referrer Policy: strict-origin-when-cross-origin
> Response Headers
cache-control: public,max-age=120
content-encoding: gzip
content-length: 24290
content-security-policy: upgrade-insecure-requests
content-type: application/json; charset=utf-8
date: Thu, 06 Oct 2022 03:05:31 GMT
request-context: appId=cid-v1:43e71566-b7e7-4ca6-b692-9f3f68fd9719
server: Microsoft-IIS/10.0
server-timing: cdn-cache; desc=MISS
server-timing: edge; dur=65
server-timing: origin; dur=546
set-cookie: KmxSession_0=SessionId=ef0ffdc3-143d-4dde-9e1c-d16c6ec16e2e&logOdds=0.16263300000000003&logOddsA=-1.103987916&logOddsI=0.8484898; domain=.carmax.com; path=/; expires=Thu, 06-Oct-2022 03:35:31 GMT
set-cookie: KmxVisitor_0=StoreId=6095&Zip=76210&Lat=33.1508&Lon=-97.094&ZipConfirmed=True&ZipDate=10/6/2022 3:05:31 AM&VisitorID=509f6eb9-eddb-4472-b412-b7e4d73fa263&IsFirstVisit=False&UsingStoreProxy=false&AdCode=SEMGAAB&AdCodeDate=10/3/2022 2:54 PM&DistanceShippingTestBucket=2&sRadius=radius-nationwide&LastSearch=638006222940089089&Sort=lowest-price&Shipping=-1; domain=.carmax.com; path=/; expires=Fri, 06-Oct-2023 03:05:31 GMT
set-cookie: bm_sv=A2572EFA5D77E5B9E212CF1F5E3EA1AA~YAAQjDgvF0lum4KDAQAAWV9BqxHO/mA6UGF3uH6Sqq7uQkZArnVAbp5XVaBvnRCWuL1zIgva6mSQmfTX1laMRUXpfsxv1+r/RI7NmAocHADTrGEH5s2EmRWsYB7OXs/nDyx7KiaT+F6qzTLnrAhFKv5hAnT3cfDY2QrducB3BpE3+x/2qCUG7FXEHZZ8Y4vFob+917bdn4LW9rRUjPBvHheQ4eu2Po9mQ8fTtCEQfoTz+em4VRXDYFgmVwWsDpUkeA==~1; Domain=.carmax.com; Path=/; Expires=Thu, 06 Oct 2022 05:02:07 GMT; Max-Age=6996; Secure
strict-transport-security: max-age=31536000
timing-allow-origin: *
vary: Accept-Encoding
x-frame-options: sameorigin
x-powered-by: ASP.NE
> REQUEST HEADERS
:authority: www.carmax.com
:method: GET
:path: /cars/api/search/run?uri=%2Fcars%2Fcrossovers%3Fyear%3D2018-2023%26mileage%3D30000%26price%3D18000-32000&skip=48&take=24&zipCode=76210&radius=radius-nationwide&shipping=-1&sort=lowest-price&scoringProfile=segment_4&visitorID=509f6eb9-eddb-4472-b412-b7e4d73fa263
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9
content-type: application/json
cookie: kndctr_0C1038B35278345B0A490D4C_AdobeOrg_identity=CiY2NDEyMjEzMzg1MTI4Njg1NTY5MTUzOTg1ODIwODUzMjcxNzEzN1IOCI-D3PK5MBgBKgNPUjLwAY-D3PK5MA==; _fbp=fb.1.1664808847128.839021144; _gcl_au=1.1.1062470335.1664808847; _gcl_aw=GCL.1664808848.Cj0KCQjwkOqZBhDNARIsAACsbfLBnIzuFAqQwL3--e31KfdmgSD6rJHg3lUTFwSJ8tfceih1AymJoW8aAutBEALw_wcB; _gcl_dc=GCL.1664808848.Cj0KCQjwkOqZBhDNARIsAACsbfLBnIzuFAqQwL3--e31KfdmgSD6rJHg3lUTFwSJ8tfceih1AymJoW8aAutBEALw_wcB; s_fid=7D110C609D492208-3EDD85A763A86C1B; ai_user=e2RLFbYVOZHSuYmJMZxkXo|2022-10-03T14:54:11.532Z; _gid=GA1.2.802830010.1664808852; KmxBestMatch=Bucket=Test; KmxStore=StoreId=6095; at_check=true; AMCVS_0C1038B35278345B0A490D4C%40AdobeOrg=1; s_cc=true; fs_cid=1.0; _clck=vizc8f|1|f5h|0; AKA_A2=A; bm_sz=56BC4464F92D8D3854014390299384A2~YAAQjDgvF0BSm4KDAQAAJkM+qxGtUaRM9kgKs3OhjlPMND6oKDS9L9JrclpSJtoVlcFyP7frV8YD1xCVgcRdw5uFc4++0cxpEv6gpgWh/CigS4uh70WMMwMrSkDHPy2JNGg1vhMIhuwUamy/wLad5DGd71D+cRQicNKzDMPyWJX7e3B4sGONFIQ8VJgq+XW07Y6inJC5kDssxm2FpuI+AqIL/WKcCQ8EWJvk2sXe2r5V8u/oxKUCI3LZ5kcp5dm3m5c2EJ9mSSeGQ34mZPVilnXKDNdt/L5RwAs0lVuW5ogBrSs=~3421507~3224888; KmxSession_0=SessionId=ef0ffdc3-143d-4dde-9e1c-d16c6ec16e2e&logOdds=0.16263300000000003&logOddsA=-1.103987916&logOddsI=0.8484898; bm_mi=F58243EB46DA811B0A46D45132FFFD84~YAAQjDgvF1tSm4KDAQAAJUc+qxGQeUt5Cp1D7OyTa+nWNRnuzi/Ci2BmD4+4Qm0W1sHJA30Ap3m6mceXOzh5wfK03HRe2phSECTcw4RJ5uZBY5eLLkAQpQq3KKGKs0PPcJfrMvauuj9k38zru/2XffC0/Zu/RmhjOvGltYTXUom0lHni/1NId4QNlZH+Dinwy+dQRQsrngcHD/7oF26xgE4ud/TqHYs9HaEeRbP9eypGSng6pEs4oN4gD37JVHz9Uwv1AQaleut5m/tW4BejdCyks9j41mdfB8AqC4+0PlXptnrYyQa5n4cbidpZ7jM=~1; _abck=F2920DB117607824AC32F9ABD87E4CF0~0~YAAQjDgvF3pSm4KDAQAA7kk+qwiI2supb0Wj6jjIVZu5Js77gCQOYAS6Cz5QkS00G8u5W4qQbAInqHTLJ2F54vEUvjFBYsnudLSolWZQ2uSRIOV3FG4VffT+zR2NDBYn+mFGr9Oi0v9ioiaE6xsjOGOwk4UtEc1Y73ft9q9ut4Dl+b1rfqGo1hEUdPSp+Ie2mefY0fFQmhtEJ722KeKJSDg/AmiCQWxrOytVt4V4fLTaDNzByMwQmBxL0GOovHnOo8xxvFpYHV3YE3+nFOBsImR3jPdMqRx833/BKU+EL4g9W87VmtdGBp3/MmBqKBTFJjcx2j59QLbqOHDXG45fLpApfi1ducqf3j9++utrry4yhEQaAr7U3td+W0XHi2xi20UuAyLMuxzwA5iQFMQn1rDlyJhy~-1~-1~1665028912; mbox=PC#40b79aa81c9a40cbaa4d6bda16734a30.35_0#1728270130|session#29f54aa1f68e41e18055c862bd4f0314#1665027190; adobeTransID=9536546868bc6444b2c19840b7ac69c0; s_ppvl=Cars%2C94%2C21%2C3503%2C853%2C805%2C1792%2C1120%2C2%2CP; gpv_v4=Cars; s_visit=1; s_vnc365=1696561330104%26vn%3D7; s_ivc=true; cto_bundle=Lnd0i19yJTJCU3BkWldGeVZGT0lPSkE1dFU1TGJqWE55Q2RBa3BTbUttaHBPWmlGcG5MZyUyQjVOaktVNzQ4eHFndWlBVlVGeiUyQm9HMEVGUWtIeGo2ZzMwWSUyQlh1dXRha2trRGdiaHI5RXZUZHhCJTJCU1Y4SnVYcTl0U3Y5bmtXakFnUjNsVG5jTm42RSUyRlpBSyUyRlpZTGZoeE51UUVXeGk2QzNBcjNPamtoN3gxR25jeFhKSU5qQ2doWTF2eEgxVXFWbllqa2hFbDF6Mg; _uetsid=3a9d8740432b11edb0f42d600c354438; _uetvid=1eca7e007c5911ec859199f79f07ee47; _ga=GA1.1.2103228906.1664808852; fs_uid=#J90WC#5786631356157952:5474652164165632:::#/1687899589; AMCV_0C1038B35278345B0A490D4C%40AdobeOrg=-1124106680%7CMCMID%7C64122133851286855691539858208532717137%7CMCIDTS%7C19271%7CMCAID%7C304705C596F1394B-6000151B443909A6%7CMCOPTOUT-1665032530s%7CNONE%7CMCAAMLH-1665630130%7C7%7CMCAAMB-1665630130%7Cj8Odv6LonN4r3an7LhD3WZrU1bUpAkFkkiY1ncBR96t2PTI%7CvVersion%7C5.2.0; QSI_HistorySession=https%3A%2F%2Fwww.carmax.com%2Fcars%3Fincludenontransferables%3DFalse%26year%3D2018-2023%26mileage%3D30000%26price%3D18000-30000~1664946926834%7Chttps%3A%2F%2Fwww.carmax.com%2Fcars%3Fincludenontransferables%3DFalse%26year%3D2018-2023%26mileage%3D30000%26price%3D18000-32000~1665003893328%7Chttps%3A%2F%2Fwww.carmax.com%2Fcars%3Fincludenontransferables%3DFalse%26year%3D2018-2023%26mileage%3D30000%26price%3D18000-30000~1665006891845%7Chttps%3A%2F%2Fwww.carmax.com%2Fcars%3Furi%3D%2Fcars%2Fcrossovers%3Fincludenontransferables%3DFalse%26year%3D2018-2023%26mileage%3D30000%26price%3D18000-32000~1665025330891; ak_bmsc=AA0B2B31F438AA727AE20B131E3F04B4~000000000000000000000000000000~YAAQjDgvF5xTm4KDAQAAdGs+qxHUP44aCXjWPXfnud5T2nWxIt03lGiHJyxa7I1CCz0VpriGdiwkPRZafBeMrrGr74RZLcTRkJxxFXlJLHIlaDlNL2C9++bvBKZvHCMekKb+3tTkH2Ik4pG05Uas/qdjnLd33R1RHvJZukc/EuIZVOs/hl7IfzrrlRgUk/FYZxpAasr8WlhB5yM6MFWiDUihvTOX7kDu03ti4HoCgoabfB6hPvqRkOiG2e5OTKGKmR13ZHbi9egXov8opwXnOzbCqvvKRJdULfCH1htnsHyJwoIMKgWwE5dF2xpjdKX55g4XE4H7KdeZOhPeVzAj1ElUvFaSALv0RH+IHysLyMpPq+bGMi74nVjwTUf1rfJiw05MpVwD/oUPjsCWZxNtBx+3rFPgF44zEVJ+LFMTHy5zeWR3E48rJCBc41s4sM+Loj+7Ox8y9bSB7GfZCUoCKLIXv8883NvuNIzapUyGLnrXpLzOiMOAJZ2qlEpzhU1ZEgOelVa9; _clsk=1mpyjk0|1665025366793|3|0|m.clarity.ms/collect; _ga_NTWN6LKPPS=GS1.1.1665025330.7.1.1665025421.0.0.0; ai_session=dwGcZfAd9Et49laIoOjdG+|1665025342556|1665025492736; KmxVisitor_0=StoreId=6095&Zip=76210&Lat=33.1508&Lon=-97.094&ZipConfirmed=True&ZipDate=10/6/2022 3:03:45 AM&VisitorID=509f6eb9-eddb-4472-b412-b7e4d73fa263&IsFirstVisit=False&UsingStoreProxy=false&AdCode=SEMGAAB&AdCodeDate=10/3/2022 2:54 PM&DistanceShippingTestBucket=2&sRadius=radius-nationwide&LastSearch=638006222940089089&Sort=lowest-price&Shipping=-1; bm_sv=A2572EFA5D77E5B9E212CF1F5E3EA1AA~YAAQjDgvFwJpm4KDAQAARsxAqxHE8+frQ64O+0FfncRNlVXCb+PpwuH3zPhQed95YyfQA7k6RmdSdyyRPy28Kh2w0pFvZqpnTi7tuolj+jSUtlS0Za3NunPBLI2e1cXOrd6kwLQ6YMOTBYeRZAvwwUxEFEm4gCa+BKfL6Wh5liEdEVPouU9MEqfK7EYrVfxPXPLNiK4yp40G3fAbZR01Tx+GgmagirDOo9fgoyGa2kjS7dQGnjESxyLKGBG6Dj8ywg==~1; s_ppv=Cars%2C99%2C22%2C9664%2C750%2C805%2C1792%2C1120%2C2%2CL; RT="z=1&dm=carmax.com&si=26ef3d4b-afa7-46e4-bc18-af86a66d0072&ss=l8wh3k9n&sl=4&tt=2jl&bcn=%2F%2F17de4c1c.akstat.io%2F&ld=1tbj&nu=9y8m6cy&cl=413k"; s_sq=carmaxadaptive%3D%2526c.%2526a.%2526activitymap.%2526page%253DCars%2526link%253DSEE%252520MORE%252520MATCHES%2526region%253Dsee-more%2526pageIDType%253D1%2526.activitymap%2526.a%2526.c%2526pid%253DCars%2526pidt%253D1%2526oid%253Dfunctionzr%252528%252529%25257B%25257D%2526oidt%253D2%2526ot%253DA
referer: https://www.carmax.com/cars/crossovers?year=2018-2023&mileage=30000&price=18000-32000
sec-ch-ua: "Google Chrome";v="105", "Not)A;Brand";v="8", "Chromium";v="105"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "macOS"
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: same-origin
user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36
Solution
You should remove with Options all the hints which indicate that you are an automated bot. They are simply freezing your session when JS verifies these flags. When initializing your bot use the following code and you will be fine,
options = Options()
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36")
driver = selenium.webdriver.Chrome(driver_path, options = options)
The complete code would be:
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
import selenium
import time
import bs4
# Spawn WebDriver:
options = Options()
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36")
driver = selenium.webdriver.Chrome("chromedriver.exe", options = options)
# Go-To page:
driver.get("https://www.carmax.com/cars?includenontransferables=false&year=2018-2023&mileage=30000&price=18000-30000")
wait = WebDriverWait(driver, 600)
# Click on See More:
ef = wait.until(EC.presence_of_element_located((By.XPATH, '//*[@id="see-more"]/div/a')))
time.sleep(2)
ef.click()
# Get the Page with Bs4:
soup = bs4.BeautifulSoup(driver.page_source, "lxml")
# Repeat the process...
Example of iteration of pages till the end:
while True:
if len(driver.find_elements_by_xpath('//*[@id="see-more"]/div/a')) > 0:
# Click on See More:
ef = wait.until(EC.presence_of_element_located((By.XPATH, '//*[@id="see-more"]/div/a')))
time.sleep(2)
ef.click()
see_more_text = bs4.BeautifulSoup(driver.page_source, "lxml").find("span", {"class": "see-more--blue"}).get_text()
total = int(regex.sub("[^\d+]", '', see_more_text.split(' ')[-1]))
current = int(regex.sub("[^\d+]", '', see_more_text.split(' ')[0]))
print(f"Status: Currently Viewing {current} of {total} Matches")
else:
print(f"Status: Currently Viewing {total} of {total} Matches")
break
Answered By - A259
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.