Issue
I have a dataframe containing leads (names). I am trying to search the web for relevant data regarding those leads.
I am using beautifulsoup and urllib to scrape the data. The url looks like this :
url = u'https://www.website.com/SearchResults?query=' + quote(str(df['name']))
The problem is that for each lead i get the exact same data, which is the data for the last lead in the dataframe of which data was retrieved.
whenever i use a string name instead of str(df['name']), i get for the specific lead the right data, and it looks like this :
url = u'https://www.website.com/SearchResults?query=' + quote('this+is+a+leads+name')
The reason i think the problem is specifically related to str(df['name']) is because whenever i remove it, i successfuly aquire data, otherwise, i get for 100,000 leads the same data. Only problem is, in order to use the leads from the dataframe i need to use str.
Solution
Thank you everyone.
I tried what you guys suggested. What worked for me was indeed creating a function that will get the data from the web, and then using a for loop to call the function for each lead's name in a row.
In a nutshell this is what i did :
Function:
def getdata(name):
url = u'https://www.website.com/search?q=' + quote(str(name))
.
.
.
return (data)
Loop :
for i, row in df.iterrows():
leaddata = getdata(df.name[i])
list1.append(leaddata)
I then proceeded to insert the list into the dataframe.
Answered By - Daniel Millionshik
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.