Issue
Is it possible to extract the embedded css properties from an html tag? For instance, suppose I want to find out what the vertical-align attribute for "s5" is.
I'm currently using beautifulsoup and have retrieved the span-tag with tag=soup.find(class_="s5")
. I've tried tag.attrs["class"]
but that just gives me s5
, with no way to link it to the embedded style. Is it possible to do this in python? Every question of this sort that I've found involves parsing inline css styles.
<html>
<head>
<style type="text/css">
* {margin:0; padding:0; text-indent:0; }
.s5 {color: #000; font-family:Verdana, sans-serif;
font-style: normal; font-weight: normal;
text-decoration: none; font-size: 17.5pt;
vertical-align: 10pt;}
</style>
</head>
<body>
<p class="s1" style="padding-left: 7pt; text-indent: 0pt; text-align:left;">
This is a sample sentence. <span class="s5"> 1</span>
</p>
</body>
</html>
Solution
You can use a css parser like cssutils. I don't know if there is a function in the package itself to do something like this (can someone comment regarding this?), but i made a custom function to get it.
from bs4 import BeautifulSoup
import cssutils
html='''
<html>
<head>
<style type="text/css">
* {margin:0; padding:0; text-indent:0; }
.s5 {color: #000; font-family:Verdana, sans-serif;
font-style: normal; font-weight: normal;
text-decoration: none; font-size: 17.5pt;
vertical-align: 10pt;}
</style>
</head>
<body>
<p class="s1" style="padding-left: 7pt; text-indent: 0pt; text-align:left;">
This is a sample sentence. <span class="s5"> 1</span>
</p>
</body>
</html>
'''
def get_property(class_name,property_name):
for rule in sheet:
if rule.selectorText=='.'+class_name:
for property in rule.style:
if property.name==property_name:
return property.value
soup=BeautifulSoup(html,'html.parser')
sheet=cssutils.parseString(soup.find('style').text)
vl=get_property('s5','vertical-align')
print(vl)
Output
10pt
This is not perfect but maybe you can improve upon it.
Answered By - Bitto Bennichan
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.