Issue
I'm trying to build a scraper that will print out the name of the current daily leetcode question, in python using BeautifulSoup
import requests
import datetime
from bs4 import BeautifulSoup
LEETCODE_URL = "https://leetcode.com/problemset/all"
def main():
response = requests.get(LEETCODE_URL)
soup = BeautifulSoup(response.content, "html.parser")
current_date = datetime.datetime.now()
formatted_date = current_date.strftime("%Y-%m-%d")
daily_question = soup.find_all(string=formatted_date)
print(daily_question)
if __name__ == "__main__":
main()
Printing out soup, I notice that there's the text "2023-07-14"
and a little bit after that, the name of the question. So I just have to get the index of that and search from that index for the title
string.
But using the code I provided above, it prints out an empty list... Even though the string should be present in soup
, and I do not understand why.
Any help will be appreciated.
Solution
As you might have known leetcode uses script with json to load data. You can just grab that json to python and use it.
import requests
import datetime
from bs4 import BeautifulSoup
import json
LEETCODE_URL = "https://leetcode.com/problemset/all"
def main():
response = requests.get(LEETCODE_URL)
soup = BeautifulSoup(response.content, "html.parser")
current_date = datetime.datetime.now()
formatted_date = current_date.strftime("%Y-%m-%d")
json_text = soup.find(id='__NEXT_DATA__').text
json_data = json.loads(json_text)
for challenge in json_data["props"]["pageProps"]["dehydratedState"]["queries"][2]["state"]["data"]["dailyCodingChallengeV2"]["challenges"]:
print(challenge["date"])
print(challenge["userStatus"])
print(challenge["link"])
print(challenge["question"]["title"])
print('*'*50)
if __name__ == "__main__":
main()
Output:
2023-07-01
NotStart
/problems/fair-distribution-of-cookies/
Fair Distribution of Cookies
**************************************************
2023-07-02
NotStart
/problems/maximum-number-of-achievable-transfer-requests/
Maximum Number of Achievable Transfer Requests
**************************************************
2023-07-03
NotStart
/problems/buddy-strings/
Buddy Strings
**************************************************
2023-07-04
NotStart
/problems/single-number-ii/
Single Number II
**************************************************
2023-07-05
NotStart
/problems/longest-subarray-of-1s-after-deleting-one-element/
Longest Subarray of 1's After Deleting One Element
**************************************************
2023-07-06
NotStart
/problems/minimum-size-subarray-sum/
Minimum Size Subarray Sum
**************************************************
2023-07-07
NotStart
/problems/maximize-the-confusion-of-an-exam/
Maximize the Confusion of an Exam
**************************************************
2023-07-08
NotStart
/problems/put-marbles-in-bags/
Put Marbles in Bags
**************************************************
2023-07-09
NotStart
/problems/substring-with-largest-variance/
Substring With Largest Variance
**************************************************
2023-07-10
NotStart
/problems/minimum-depth-of-binary-tree/
Minimum Depth of Binary Tree
**************************************************
2023-07-11
NotStart
/problems/all-nodes-distance-k-in-binary-tree/
All Nodes Distance K in Binary Tree
**************************************************
2023-07-12
NotStart
/problems/find-eventual-safe-states/
Find Eventual Safe States
**************************************************
2023-07-13
NotStart
/problems/course-schedule/
Course Schedule
**************************************************
2023-07-14
NotStart
/problems/longest-arithmetic-subsequence-of-given-difference/
Longest Arithmetic Subsequence of Given Difference
**************************************************
You can copy/paste the json-response in console or https://codebeautify.org/jsonviewer to see json more clearly.
Answered By - Reyot
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.