Issue
Here is the link to the page I am trying to scrape from: https://churchdwight.com/ingredient-disclosure/antiperspirant-deodorant/40002569-ultramax-clear-gel-cool-blast.aspx
Here is my code:
''' #Scraping a Data Table from the CHD Website #Load CHD Website HTML code result = requests.get(current_url, verify=False, headers={'User-Agent' : "Magic Browser"})
#Check and see if the page successfully loaded
result_status = result.status_code
if result.status_code == 200:
#Extract the HTML code and pass it through beautiful soup
source = result.content
document = BeautifulSoup(source, 'lxml')
#Since each page has one table for each product, we can use the table attribute to find the table
check = 0
table = document.find("table")
while check <= 0:
#Check to make sure that you got the right table by checking whether the text within the first header title is 'INGREDIENT'
if table.find("span").get_text() == "INGREDIENT NAME":
check += 1
else:
table = table.find_next("table")
#Since HTML uses tr for rows, we can use find all to get our rows
rows = table.find_all('span', style ='font-size:13px;font-family:"Arial",sans-serif;')
#Loop through the rows
for row in rows[3:]:
bar = row.find('span', style ='font-size:13px;font-family:"Arial",sans-serif;')
bar_text = row.get_text(strip = True)
cells_names.append(bar_text)
data_pandas = pd.DataFrame(cells_names, columns = ['Ingredients'])
return data_pandas
else:
#Print out an error if the result status is not 200
print("Status error" + " " + str(result_status) + "has occurred!")
'''
I am getting missing the lubricant/emulsifer in my data frame and I think it is because the span style has an extra bit saying color:black;background:white
Any help would be much appreciated!!!!
Solution
You can use only pandas
to grab table data
import pandas as pd
df =pd.read_html('https://churchdwight.com/ingredient-disclosure/antiperspirant-deodorant/40002569-ultramax-clear-gel-cool-blast.aspx')[2]
print(df)
Output:
0 INGREDIENT NAME FUNCTION
1 Water Solvent
2 Cyclopentasiloxane Lubricant/emulsifier
3 SD Alcohol 40 Drying agent
4 Propylene glycol Humectant
5 Dimethicone Skin protectant
6 PEG/PPG-18/18 dimethicone Emulsifier
7 Sodium bicarbonate (baking soda) Deodorizer
8 Fragrance Fragrance
9 Aluminium zirconium tetrachlorohydrex gly Active ingredient - antiperspirant
Answered By - F.Hoque
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.