Issue
I've been experimenting with web scraping using Scrapy, and I was interested in retrieving all text messages from all chats on Whatsapp to use as training data for a Machine Learning project. I know there are websites that block web crawlers/scrapers, so I would like to know if it is possible to use Scrapy to obtain these messages, and if it isn't possible, what are some alternatives I can use? I understand that I can click on the "Email chat" option for each chat, but this might not be feasible if I want to obtain a large amount of data, not just from my own chats, but from other people who are willing to let me use their chats for the project.
Solution
I think WhatsApp do not block crawlers and scrapers. You have access only to your web.whatsapp.com. It's your matter what will you do with your messages. When I write code to read/write WhatsApp messages I used Selenium WebDriver, which can fully automate any browser actions. It worked too stable for WhatsUpp. It was not fully automation, be course of QR code. If you press F12 and go to "network" tab in web browser, you will notice XHR packets with messages inside. You can see it when you load new messages during scrolling or opening person. It look like byte data.
Thank you to Mohit Jindal. You are right there is a way to use browser profile like that:
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('user-data-dir=selenium/')
driver = webdriver.Chrome(options=chrome_options)
It will crate Chrom profile in "selenium/" folder. This way allow you to login using your phone just initial time.
Answered By - Oleg T.
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.