Issue
I am trying to extract values from a web page but it's getting me an AttributeError
. I am not sure why this error is printing. If you look at the code, you will not find something that is causing this error. In fact, the first value plan
is being extracted fine but the issue is with the second value price
. Below is my code, have a look and see what have I done wrong?
price = plan.xpath('normalize-space(.//h4)').get()
AttributeError: 'str' object has no attribute 'xpath'
Below is my code.
import requests
from scrapy import Selector
headers = {
'authority': 'www.spectrum.com',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-language': 'en-PK,en;q=0.9,ur-PK;q=0.8,ur;q=0.7,en-GB;q=0.6,en-US;q=0.5,sv;q=0.4,it;q=0.3',
'cache-control': 'max-age=0',
'cookie': "akaas_AB-Testing=2147483647~rv=43~id=c941a85b15ef281f13281b8048cf4ae5; bm_sz=2632BBEE923E9AD7E1EA69FF928F5DBB~YAAQjr3XF/bDqsiBAQAACQuP0xCDh7v649Sa+PSXCzISjJqKSYw3+PEzlGEQ08DzO/hwCZjGgWULy/XTuertMLmrj9NhRZGG3f4suAnQfrfqZMrXfLmCdP7IvopupSVOQ2qe1ZStkhlKIbXRwa42EKQDoBeZ5kbnOv9YmFeNKCmuSAZARPgAhHrVHy/R0/2eELZ/6yNHbaBBkuqrZOpbgcSPALVHVmebbaU6TcUdtN9wWkvC04SsZP1cByGcljFlokraEKx73zOWRmTXFAf40kHHYngR+diPWYNgpn+25gtQv81EjA==~4277812~4339012; akacd_RWASP-default-phased-release=3834564575~rv=100~id=46a77af27ddb8f9add7bb7387dd9bda5; PIM-SESSION-ID=JcAqb8mZ1FcAPeQJ; domain=%22spectrum.com%22; omnitureId=%22b9eff532-7818-4db6-b867-ef1f30f6ff13%22; akamaiHeader=%7B%22georegion%22%3A%22167%22%2C%22country_code%22%3A%22PK%22%2C%22city%22%3A%22ISLAMABAD%22%2C%22lat%22%3A%2233.70%22%2C%22long%22%3A%2273.17%22%2C%22timezone%22%3A%22GMT%2B5%22%2C%22continent%22%3A%22AS%22%2C%22asnum%22%3A%2217557%22%2C%22throughput%22%3A%22vhigh%22%2C%22bw%22%3A%225000%22%2C%22client_ip%22%3A%2239.40.46.103%22%2C%22device_os%22%3A%22Windows%20NT%22%2C%22brand_name%22%3A%22Chrome%22%2C%22is_wireless%22%3A%22false%22%2C%22internal_corp_traffic%22%3A%22false%22%2C%22zip%22%3A%22%22%7D; spectrum-residential-user-profile=%7B%22zipcode%22%3A%22%22%2C%22city%22%3A%22%22%2C%22state%22%3A%22%22%2C%22serviceVendorName%22%3A%22%22%2C%22isSPP1%22%3A%22%22%2C%22isSPP2%22%3A%22%22%2C%22isSPP3%22%3A%22%22%2C%22isSPP4%22%3A%22%22%2C%22isSPP5%22%3A%22%22%2C%22isSPP6%22%3A%22%22%2C%22isSPP7%22%3A%22%22%2C%22isSPP8%22%3A%22%22%2C%22isNPP%22%3A%22%22%2C%22isTwcD3%22%3A%22%22%2C%22isTwcSTDA%22%3A%22%22%2C%22isTwcSTD%22%3A%22%22%2C%22isTwcSELA%22%3A%22%22%2C%22isTwcSEL%22%3A%22%22%2C%22isCharterD3%22%3A%22%22%2C%22isCharterD3NCS%22%3A%22%22%2C%22isCharterSTDS%22%3A%22%22%2C%22isCharterD3STL%22%3A%22%22%2C%22isCharterSELS%22%3A%22%22%2C%22isBhnSTD%22%3A%22%22%2C%22isBhnSEL%22%3A%22%22%2C%22isBackToSchool%22%3A%22%22%2C%22isBHNMultipleMSO%22%3A%22%22%2C%22isCharterMultipleMSO%22%3A%22%22%2C%22isTWCMultipleMSO%22%3A%22%22%2C%22isServiceableHawaii%22%3A%22%22%2C%22isNYCOutOfFootprint%22%3A%22%22%2C%22isResi30%22%3A%22%22%2C%22isResi60%22%3A%22%22%2C%22isResi100%22%3A%22%22%2C%22isResi200%22%3A%22%22%2C%22isResi400%22%3A%22%22%2C%22isResi940%22%3A%22%22%2C%22isNewWaveSwitch%22%3A%22%22%2C%22isCDELightbandSwitch%22%3A%22%22%2C%22isMicrologicSwitch%22%3A%22%22%2C%22isLocalTelSwitch%22%3A%22%22%2C%22isSpectrumInternetAssist%22%3A%22%22%2C%22isMINet%22%3A%22%22%2C%22isMontanaOpticom%22%3A%22%22%2C%22isSilverStarCommunications%22%3A%22%22%2C%22isTCTWest%22%3A%22%22%2C%22isTSC%22%3A%22%22%2C%22isCPWS%22%3A%22%22%2C%22isCitiLinks%22%3A%22%22%2C%22isATMC%22%3A%22%22%2C%22isVast%22%3A%22%22%2C%22isHorizon%22%3A%22%22%2C%22isClevelandOHZipcode%22%3A%22%22%2C%22isColumbusOHZipcode%22%3A%22%22%2C%22isEvansvilleINZipcode%22%3A%22%22%7D; SERVERID=pub19ncw_aem65; ak_bmsc=7CFCF89508757C6F483E18BBF380D8F1~000000000000000000000000000000~YAAQjr3XF9XEqsiBAQAAdRuP0xC1bQwnbConyIesU3jSyxFV3s4s0K86rrYUyh3i/scryjxBAa5tSYNpQd7MZiVfNUsXmFooX+ASzzQ9qsJNng1o+6iPzfYLJoWaN6wt/1CaWpqaJP0DhvHHnX0MfKYITq8DgsHVIwd1EKwaCB+g67hQlS8kB1N3yIXyauX2Gpni1NgGsyRzMoALxQ4VdZEJ6WYV97Wg1vtPz/Lwh+9pb/4HrP7yGwwMBIANj+pPuazwkyQ2TI7DstOeEVdmZkMcW4YI2l8agJm6gS4F9k6kipH8TGAUDwSWbyuJvxpWoqBXLUhSdGxx+38Di0hIvV4FNyzb5/+BR4rrl2MBDC7yU1BHk1JW+/sTntHKSWxcABHImErdMA1C+WDmYFNKx4kWj9nDNomgN+vdh0FoT5G3B2TSyahQsqfKDaxv39FO4hZ+DCFtF/XiA8aKkPrxtaGLagk+bPnzWm/TSAgWx0TiaBIhZQ1NVUsev5U=; bm_sv=045DB25F286F7DA3315AE8AE1E1C7E71~YAAQjr3XF//EqsiBAQAAgh2P0xDZIIr9HHEmWcpdNzKev8fZcqKHcP8rdJpPncyVd1SK6nw+n4lHEEe02QIQ/ksIcZ5ICEIgijODVMciTV6oKVE8qcGMRqiUKHFGb7aT3lmGH15wVUS1DTF75Axpil33ILOnZ9y3UskYq7Ii+TzXr2S8u8pKKFuzonfdXjgo0/omvKHVQTj/+zGBLoRxdWwdpVwQ7MhwJQpo6XxKyeHGsgDAD9sOktUKAdhh8rRURLw=~1; _abck=9B09AE84C627BD6C56ED5324C2377BFC~0~YAAQjr3XFxHFqsiBAQAAQx6P0wgnXELg9/IFkwYW75PtoKE9E6Jc32OQ2EjSzgj2xQBs6VFVawJUKc3KWHVLU0zfebAO8EU4QvFCRduo3iBXp/wX3XeB3IHiwhFU+XQCu6vgWvZkXXcN02TIRleJV7BrEFYB8oTAKVGYNvfSq4gtXd+EfUwCXeML71VlUpqg+ux6tv9DzUxMIzjR6phg3sJkwvJdRSgQC8sHBxMGqO/bceL+pQvP3ocsAvcXBUHqpffNXN29NqMRgZXl1LnDYENX7/pM38sCYbgCk1wM1SX6CPR2RBxS1zTx5DQH7j9rdxR79eGPTeVT8thZ+LES0hbMDnbrCM25xifVxEhmVGUAE8GMZjliHpL5y3fgS5qrt+32P5nhOxOoeO4nAC0O61SBQbnsR77Asno=~-1~||-1||~-1; _cls_v=fd065e19-d04d-4037-95f1-2915722ce6fa; _cls_s=c2d97766-d6a9-400e-9256-cc7f306d74ff:0; _fbp=fb.1.1657111781756.1149817254; _uetsid=1a9139f0fd2a11ecac7a8b3cd8b5e640; _uetvid=1a917710fd2a11eca2847d6bd10b6b3f; _gcl_au=1.1.730465435.1657111782; com.silverpop.iMAWebCookie=ea054721-7458-28b9-e149-b14adde1da9d; com.silverpop.iMA.session=ade0dd24-3755-e2c5-5a79-b3a6053a5ce5; com.silverpop.iMA.page_visit=1519396976:; sc.ASP.NET_SESSIONID=vx0d41gedc4quweaf0dopvwd; _tq_id.18365409-1.91a5=a49be617f522fe17.1657111783.0.1657111783..; sc.UserId=bbe2768c-41cc-45d2-8e4f-d5f20bf1d6e0; _clck=hx5wof|1|f2x|0; btIdentify=13899909-7620-426b-df11-185cd019a2f0; _bts=94b9da89-0af8-4fb8-c8cf-4cfb1d262628; _clsk=14hzxd1|1657111783980|1|0|l.clarity.ms/collect; AMCVS_97C902BE53295FC80A490D4C%40AdobeOrg=1; AMCV_97C902BE53295FC80A490D4C%40AdobeOrg=-330454231%7CMCMID%7C22421919196769338640237431338961795837%7CMCAAMLH-1657716584%7C3%7CMCAAMB-1657716584%7CRKhpRz8krg2tLO6pguXWp5olkAcUniQYPHaMWWgdJ3xzPWQmdj0y%7CMCOPTOUT-1657118984s%7CNONE%7CvVersion%7C3.1.2; s_cc=true; _bti=%7B%22app_id%22%3A%22charter%22%2C%22bsin%22%3A%22pEXKvtttqTjjtgQef5zYKUDXao8SkpfqlrDanE%2F2zNR9slbf6%2FhE2Gyr4IjNsJC9w5PwIyuprzJtPvIM%2F2KPhw%3D%3D%22%2C%22is_identified%22%3Afalse%7D; aam_uuid=12467149554957211440944457970446103725; akavpau_Global=1657112106~id=e008851a51085afc5fa869b5b4bcbf42; s_sess=%20s_ppv%3D23%3B%20s_prop20%3Dbrowse%3B%20search_prop17%3DNo%2520Information%2520Avaliable%3B; s_pers=%20s_vnum%3D1688647784115%2526vn%253D1%7C1688647784115%3B%20s_previousPage%3Dcom%253Ainternet%7C1657113615499%3B%20s_nr%3D1657111815507-New%7C1659703815507%3B%20s_invisit%3Dtrue%7C1657113615510%3B%20s_dayslastvisit%3D1657111815512%7C1751719815512%3B%20s_dayslastvisit_s%3DFirst%2520Visit%7C1657113615512%3B; utag_main=v_id:0181d38f1b81000e58ce096006f10506f002106700bd0{_sn:1$_ss:1$_pn:1%3Bexp-session$_st:1657113615521$ses_id:1657111780225%3Bexp-session$_ga:3936920691.1657111781$vapi_domain:spectrum.com$dcsyncran:1%3Bexp-session$dc_visit:1$dc_event:1%3Bexp-session$dc_region:eu-central-1%3Bexp-session$aam_load:true%3Bexp-session;} RT=\"z=1&dm=www.spectrum.com&si=b2dc65b1-5a7d-4a10-896d-0e0a7e16f4ef&ss=l59lkcgo&sl=1&tt=8t5&bcn=%2F%2F684d0d44.akstat.io%2F&ld=8t8&ul=z6k\"",
'sec-ch-ua': '".Not/A)Brand";v="99", "Google Chrome";v="103", "Chromium";v="103"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'document',
'sec-fetch-mode': 'navigate',
'sec-fetch-site': 'none',
'sec-fetch-user': '?1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
r = requests.get('https://www.spectrum.com/internet', headers=headers)
response = Selector(text=r.text)
plans = response.xpath('(//div[@class="cardsContainer__body"])[1]/div')
for plan in plans:
plan = plan.xpath('normalize-space(.//div[@class="iconcard__subheader"])').get()
price = plan.xpath('normalize-space(.//h4)').get()
data = {
"Plan": plan,
"Price": price
}
print(data)
Solution
https://docs.scrapy.org/en/latest/topics/selectors.html
The .xpath(...)
method returns a Selector
that you can call .xpath
on again.
But calling .get()
on the selector returns a string value.
So plan = plan.xpath('normalize-space(.//div[@class="iconcard__subheader"])').get()
sets plan
to a string value, and then you can't call .xpath
on that on the following line.
Try this:
r = requests.get('https://www.spectrum.com/internet', headers=headers)
response = Selector(text=r.text)
plans = response.xpath('(//div[@class="cardsContainer__body"])[1]/div')
for plan in plans:
plan_selector = plan.xpath('normalize-space(.//div[@class="iconcard__subheader"])')
plan = plan_selector.get()
price = plan_selector.xpath('normalize-space(.//h4)').get()
data = {
"Plan": plan,
"Price": price
}
print(data)
Update
Based on what OP posted in their own answer it sounds like the intent of their original code price = plan.xpath('normalize-space(.//h4)').get()
was to use the original plan
instance and not the one they replaced with plan = plan.xpath(...)
in the line above.
So the code should be:
r = requests.get('https://www.spectrum.com/internet', headers=headers)
response = Selector(text=r.text)
plans = response.xpath('(//div[@class="cardsContainer__body"])[1]/div')
for plan in plans:
plan_value = plan.xpath('normalize-space(.//div[@class="iconcard__subheader"])').get()
price = plan.xpath('normalize-space(.//h4)').get()
data = {
"Plan": plan_value,
"Price": price
}
print(data)
Answered By - Anentropic
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.