Issue
I am trying to parse some json from an API using python. The results are paginated into groups of 100, with a NextPageLink
entry in the json linking to the next page.
I have a class, with a parser that should call itself on the new response when hitting the NextPageLink
, but it doesnt seem to work.
Can someone explain why?
import requests
from requests.exceptions import HTTPError
class MyParser():
def __init__(self):
try:
self.response = requests.get("API end point url")
except HTTPError as http_err:
print("HTTP Error")
except:
print("Other Error")
def parse(self):
print("Called outer")
for item in self.response.json()["Items"]:
yield {
item["Thing"]:item["Entry"]
}
next_page = self.response.json()["NextPageLink"]
if next_page is not None:
self.response=requests.get(next_page)
print("about to call again")
self.parse()
print("Called")
Doesn't seem to work. I get from:
test = MyParser()
for i in test.parse():
print(i)
Output
Called outer
list of things yielded
about to call again
Called
Solution
according to this What does the "yield" keyword do?
The first time the for calls the generator object created from your function, it will run the code in your function from the beginning until it hits yield, then it'll return the first value of the loop. Then, each subsequent call will run another iteration of the loop you have written in the function and return the next value. This will continue until the generator is considered empty, which happens when the function runs without hitting yield. That can be because the loop has come to an end, or because you no longer satisfy an "if/else".
I think it means that when you use the yield keyword, the function will return a generator. And if you use the generator, it will search for the keyword yield, and try to run the code until hit one of the yield. Since there is no "yield" in lower part of the parse function, the for loop will simply stop.
I try to yield the call of self.parse(), but it just print the generator object like "<generator object xxx at 0x7f3afa5287d0>"
I haven`t tested it yet, but probably you can use a while loop:
def parse(self):
print("Called outer")
items = self.response.json()["Items"]
while items:
if not items:
break
for item in items:
yield {
item["Thing"]: item["Entry"]
}
link = self.response.json()["NextPageLink"]
resp = requests.get(link)
items = resp.json()['Items']
Answered By - wu hoyt
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.