Issue
I have the following XML. It is from a menu called Knives with sub-types like Bayonet, Classic Knife, etc.
<div class="group inline-block relative w-full lg:w-auto">
<button class="navbar-subitem-trigger text-left py-2 focus:outline-none hover:text-white w-full lg:w-auto block lg:inline-block lg:mr-2 xl:mr-4 text-blue-100" data-target="navbar-subitems-Knives" type="button">
Knives
</button>
<ul id="navbar-subitems-Knives" class="custom-scrollbar hidden bg-gray-700 rounded shadow-md text-blue-100 my-2 lg:my-0 overflow-hidden lg:overflow-y-auto lg:absolute lg:group-hover:block lg:max-h-[80vh]">
<li>
<a class="flex items-center outline outline-offset-0 outline-1 outline-gray-700 hover:bg-gray-600 hover:text-white py-2 px-4 whitespace-nowrap bg-gray-700" href="https://csgoskins.gg/weapons/bayonet">
<div class="w-10 h-7 mr-1">
<img loading="lazy" class="lazy-instant max-w-full max-h-full" src="https://cdn.csgoskins.gg/public/uih/weapons/aHR0cHM6Ly9jZG4uY3Nnb3NraW5zLmdnL3B1YmxpYy9pbWFnZXMvYnVja2V0cy9lY29uL3dlYXBvbnMvYmFzZV93ZWFwb25zL3dlYXBvbl9iYXlvbmV0LjE3YmIyNWM3NTg2N2QwMzlmYTc1MjRlOGM1ZmE2MzEzNGI2MjQ1MzQucG5n/50/auto/85/notrim/8965bc48871767721ae4a2bd3762f460.webp" alt="Bayonet">
</div>
Bayonet
</a>
</li>
<li>
<a class="flex items-center outline outline-offset-0 outline-1 outline-gray-700 hover:bg-gray-600 hover:text-white py-2 px-4 whitespace-nowrap bg-gray-700" href="https://csgoskins.gg/weapons/classic-knife">
<div class="w-10 h-7 mr-1">
<img loading="lazy" class="lazy-instant max-w-full max-h-full" src="https://cdn.csgoskins.gg/public/uih/weapons/aHR0cHM6Ly9jZG4uY3Nnb3NraW5zLmdnL3B1YmxpYy9pbWFnZXMvYnVja2V0cy9lY29uL3dlYXBvbnMvYmFzZV93ZWFwb25zL3dlYXBvbl9rbmlmZV9jc3MuMmZhNjZkMDEwMTMxYzA3NTMwYjA4ZTkwOTZlNGVmNGM4Y2NiODA4Ny5wbmc-/50/auto/85/notrim/8c546f8cb1f52b844f38fb4681c01dcb.webp" alt="Classic Knife">
</div>
Classic Knife
</a>
</li>
<li>
<a class="flex items-center outline outline-offset-0 outline-1 outline-gray-700 hover:bg-gray-600 hover:text-white py-2 px-4 whitespace-nowrap bg-gray-700" href="https://csgoskins.gg/weapons/falchion-knife">
<div class="w-10 h-7 mr-1">
<img loading="lazy" class="lazy-instant max-w-full max-h-full" src="https://cdn.csgoskins.gg/public/uih/weapons/aHR0cHM6Ly9jZG4uY3Nnb3NraW5zLmdnL3B1YmxpYy9pbWFnZXMvYnVja2V0cy9lY29uL3dlYXBvbnMvYmFzZV93ZWFwb25zL3dlYXBvbl9rbmlmZV9mYWxjaGlvbi43MzM0OTBkY2Q0YjZiMTJmMTk1MTJiM2I5YTFhMDlkOTM1ZTZhYWVhLnBuZw--/50/auto/85/notrim/92b97f3404ee5c97178b6fa7bf45ab42.webp" alt="Falchion Knife">
</div>
Falchion Knife
</a>
</li>
<li>
<a class="flex items-center outline outline-offset-0 outline-1 outline-gray-700 hover:bg-gray-600 hover:text-white py-2 px-4 whitespace-nowrap bg-gray-700" href="https://csgoskins.gg/weapons/flip-knife">
<div class="w-10 h-7 mr-1">
<img loading="lazy" class="lazy-instant max-w-full max-h-full" src="https://cdn.csgoskins.gg/public/uih/weapons/aHR0cHM6Ly9jZG4uY3Nnb3NraW5zLmdnL3B1YmxpYy9pbWFnZXMvYnVja2V0cy9lY29uL3dlYXBvbnMvYmFzZV93ZWFwb25zL3dlYXBvbl9rbmlmZV9mbGlwLjBlODhmNTZhODhlMTE1MjNhOGNhZGI2ZDcwMDNlZjMwOGM1MmZhYjkucG5n/50/auto/85/notrim/f831a3e8245eee182a4b3315c36f9df6.webp" alt="Flip Knife">
</div>
Flip Knife
</a>
</li>
<li>
<a class="flex items-center outline outline-offset-0 outline-1 outline-gray-700 hover:bg-gray-600 hover:text-white py-2 px-4 whitespace-nowrap bg-gray-700" href="https://csgoskins.gg/weapons/gut-knife">
<div class="w-10 h-7 mr-1">
<img loading="lazy" class="lazy-instant max-w-full max-h-full" src="https://cdn.csgoskins.gg/public/uih/weapons/aHR0cHM6Ly9jZG4uY3Nnb3NraW5zLmdnL3B1YmxpYy9pbWFnZXMvYnVja2V0cy9lY29uL3dlYXBvbnMvYmFzZV93ZWFwb25zL3dlYXBvbl9rbmlmZV9ndXQuMWIwYTYyZjEwZDliYjcwZmRiMDY5NWU3MDI3NDI0ODZlNGNjZWJkZC5wbmc-/50/auto/85/notrim/78ce07820ddd2a888d6f20a2f39b46d0.webp" alt="Gut Knife">
</div>
Gut Knife
</a>
</li>
<li>
<a class="flex items-center outline outline-offset-0 outline-1 outline-gray-700 hover:bg-gray-600 hover:text-white py-2 px-4 whitespace-nowrap bg-gray-700" href="https://csgoskins.gg/weapons/huntsman-knife">
<div class="w-10 h-7 mr-1">
<img loading="lazy" class="lazy-instant max-w-full max-h-full" src="https://cdn.csgoskins.gg/public/uih/weapons/aHR0cHM6Ly9jZG4uY3Nnb3NraW5zLmdnL3B1YmxpYy9pbWFnZXMvYnVja2V0cy9lY29uL3dlYXBvbnMvYmFzZV93ZWFwb25zL3dlYXBvbl9rbmlmZV90YWN0aWNhbC42YzI0NTM3ZjVlMzA2NGNmNDA4MTNlOTNmOWZjYmFkYzk5MjA1Y2ExLnBuZw--/50/auto/85/notrim/3520c4123a62eb9550ccc7f3745eb53f.webp" alt="Huntsman Knife">
</div>
Huntsman Knife
</a>
</li>
</ul>
</div>
With the following code I try to get the names of the different sub-types:
import requests
from bs4 import BeautifulSoup
import pandas as pd
# URL of the webpage to scrape
url = 'https://csgoskins.gg/' # Replace with the URL of the page you want to scrape
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36"
}
# Send a GET request to the URL
r = requests.get(url, headers = headers)
soup = BeautifulSoup(r.content, 'lxml')
print(r)
knives_section = soup.find("ul",{"id":"navbar-subitems-Knives"}).findAll("w-10 h-7 mr-1")
print(knives_section)
but it returns nothing. I tried to use elements from the following answer: Scraping from dropdown option value Python BeautifulSoup
What am I doing wrong?
Solution
The issue with your code is how you are trying to find elements within the knives_section
. The findAll
method is not being utilized properly, that is, it won't get the result you require. You are passing the class names "w-10 h-7 mr-1"
as a single string, but these classes should be separated and passed as a list. Moreover, these classes belong to the div
that contains the img
tag, not the actual knife names. The knife names are the text contents of the a
tags within the li
elements.
Here's how you can modify your code to correctly scrape the names of the different sub-types of knives:
import requests
from bs4 import BeautifulSoup
# URL of the webpage to scrape
url = 'https://csgoskins.gg/' # Replace with the URL of the page you want to scrape
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36"
}
# Send a GET request to the URL
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.content, 'lxml')
# Find the knives section
knives_section = soup.find("ul", {"id": "navbar-subitems-Knives"})
# Find all knife names
knife_names = knives_section.find_all("li")
for knife in knife_names:
# Extract and print the knife name
name = knife.get_text(strip=True)
print(name)
The above code will find the ul
element with the ID navbar-subitems-Knives
, then find all li
elements within it, and, finally, extract the text from each of those li
elements, which is the name of the knife. The get_text(strip=True)
method is used to extract the text content of each li
element and remove any leading and/or trailing whitespace(s).
Answered By - Bilesh Ganguly
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.