Issue
Hey there is a website that I'm trying to scrape and there are values in the inputs that doesn't scrape as text ONLY HTML Like this
<input class="aspNetDisabled" disabled="disabled" id="ContentPlaceHolder1_EmpName" name="ctl00$ContentPlaceHolder1$EmpName" style="color:#003366;background-color:#CCCCCC;font-weight:bold;height:27px;width:150px;" type="text" value="John Doe"/>
So what I want to do is just getting the Value ( John Doe ) I tried to put.text But it's not scraping it This is the code
soup=BeautifulSoup(r.content,'lxml')
for name in soup.findAll('input', {'name':'ctl00$ContentPlaceHolder1$EmpName'}):
with io.open('x.txt', 'w', encoding="utf-8") as f:
f.write (name.prettify())
Solution
The reason you are not getting a result when calling .text
is since the "John Doe", is not in the text on the HTML, it's an HTML attribute: value="John Doe"
.
You can access the attribute like a Python dictionary (dict
) using tag[<attribute>]
. (See the BeautifulSoup documentation on attributes).
html = """<input class="aspNetDisabled" disabled="disabled" id="ContentPlaceHolder1_EmpName" name="ctl00$ContentPlaceHolder1$EmpName" style="color:#003366;background-color:#CCCCCC;font-weight:bold;height:27px;width:150px;" type="text" value="John Doe"/>"""
soup = BeautifulSoup(html, "lxml")
for name in soup.findAll("input", {"name": "ctl00$ContentPlaceHolder1$EmpName"}):
print(name["value"])
Output:
John Doe
Answered By - MendelG
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.