Issue
I am trying to parse page A, download files listed in the page to local disk, replace URL in page A with URL to the files I saved, and finally save page A to local disk.
I tried file pipeline but it just does not work. The URL in page A looks like http:...php?id=1234 so build-in file_path() returns an error. Overriding file_path() just stops pipeline working without any debug output.
So I found this post:
After I applied I found the parsing function won't change the data I passed in meta. My code is like:
def ParseClientCaseNote(self,response):
# The function is to download all attachments and replace URL inside pointing to local files
TestMeta='this is to test meta argu'
for a in AttachmentList:
yield scrapy.Request(a,callback=self.DownClientCaseNoteAttach,meta={'test':TestMeta})
self.logger.info('ParseClientCaseNote: after call DownClientCaseNoteAttach, testmeta is: ' + TestMeta)
return
def DownClientCaseNoteAttach(self,response):
TestArg=response.meta['test']
self.logger.info('DownClientCaseNoteAttach: test meta')
self.logger.info(TestArg)
TestArg='this is revised from DownClientCaseNoteAttach'
with open(AbsPath,'wb') as f:
f.write(response.body)
return
I got below result in log:
2018-09-29 09:26:13 [debug] INFO: ParseClientCaseNote: after call DownClientCaseNoteAttach, testmeta is: this is to test meta argu 2018-09-29 09:26:17 [debug] INFO: DownClientCaseNoteAttach: test meta 2018-09-29 09:26:17 [debug] INFO: this is to test meta argu
It seems parsing function is deferred. How can I get the result correctly?
Thanks
Solution
I used a workaround to address this. In page A I get file name on web and pass the name to own download function change the url pointing to local file with name on web. In download function I verify the file name from response.headers['Content-Disposition'].decode(response.headers.encoding) to ensure it is the same as I find on page A before save it.
Answered By - Hua Gong
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.