Issue
I'm having a problem to extract an image from a "Manga" website using python. Below is the element example on the website:
- img id="comic" class="loading" onerror="this.src='data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7'; this.removeAttribute('onerror'); this.className = 'loaderror';" src="https://example_on_the_image.jpg"> == $0"
I'm able to parse out the "src" link & the image aspect ratio supposed to be as follow if using normal browser to view:
- Rendered size: 920 × 1301 px
- Rendered aspect ratio: 920∶1301
- Intrinsic size: 720 × 1018 px
- Intrinsic aspect ratio: 360∶509
- File size: 101 kB
- Current source: (url of the image)
Yet, the image that I have downloaded become "160 x 160px" & file size is lesser. I have tried using Beautifulsoup, Selenium etc, still getting the same result.
But if I using:
- the browser & right click to "Save Image As"
- Inspect -> on the image element -> right click -> Capture node screenshot
I was able to save "Rendered size" as the above 2 method using normal browsers. Why using python to scrape, I cannot get the correct aspect ratio??
Hope that somebody can guide me on this or where I did wrong, thanks.
Solution
''' Here's my Playwright code:
from playwright.sync_api import sync_playwright
manga_url = ("the url that u going to scrape")
dwn_path = your_directory
os.chdir(dwn_path)
with sync_playwright() as p:
browser = p.chromium.launch(headless=False, slow_mo=500)
page = browser.new_page()
page.goto(manga_url)
page.locator("#comic").screenshot(path="screenshot.png")
print(page.title())
browser.close()
Answered By - Tang Chee Ming
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.