Issue
So this sounds very specific, but I'm not sure what seems to cause the issue. I've tried adding in tweaks like "tostring" and "str" throughout my code, but no luck (maybe I'm just not putting in the right place)...
Basically, I'm trying to take this script I was helped with yesterday (which prints website data):
import requests
from bs4 import BeautifulSoup as bs
import json
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}
url = 'https://www.nba.com/game/bkn-vs-phi-0022100993'
r = requests.get(url, headers=headers)
soup = bs(r.text, 'html.parser')
page_obj = soup.select_one('script#__NEXT_DATA__')
json_obj = json.loads(page_obj.text)
print('Title:', json_obj['props']['pageProps']['story']['header']['headline'])
print('Date:', json_obj['props']['pageProps']['story']['date'])
print('Content:', json_obj['props']['pageProps']['story']['content'])
and alter it so the information is pushed to a Google Sheet.
I went through and created the credentials, etc, and verified the connection between Python and this sheet is active (just by adding values, not the results of my script).. the broken script (in it's current state) is below:
import requests
from bs4 import BeautifulSoup as bs
import json
import gspread
gc = gspread.service_account(filename='creds.json')
sh = gc.open_by_key('1NFrhsJT7T0zm3dRaP5J8OY0FryBHy5W_wEEGvwBg58I')
worksheet = sh.sheet1
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}
url = 'https://www.nba.com/game/bkn-vs-phi-0022100993'
r = requests.get(url, headers=headers)
soup = bs(r.text, 'html.parser')
page_obj = soup.select_one('script#__NEXT_DATA__')
json_obj = json.loads(page_obj.text)
title = (json_obj['props']['pageProps']
['story']['header']['headline'])
date = (json_obj['props']['pageProps']['story']['date'])
content = (json_obj['props']['pageProps']['story']['content'])
AddData = [title, date, content]
worksheet.append_row(AddData)
While the first script runs so great (pushes clean results in the terminal like below):
The same script (for the most part) alters the data, and leaves the Google Sheet BLANK (while like I mentioned it runs just adding in values)
This is what I'm seeing in the results here (many errors, and \n representing the line breaks, rather than the clean text).
Like I said, I thought maybe putting the variables as .string or .text would help - maybe I just wasn't doing it correctly.
Further testing - when I JUST print the DATE variable, it moves correctly to the sheet. Trying to figure out why the other variables won't.
Any thoughts are greatly appreciated!
Solution
You can try this option. It worked for me
content = str(json_obj['props']['pageProps']['story']['content'])
Answered By - jlik
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.