Issue
I want to achieve a function that can fetch data in parallel.
The background is the information of 100 sites can be fetched from site A.
the same account can't be used more than once at a time, so I created 5 different accounts on site A that eanble me to fetch information with 5 accounts.
account info like
worker1 pawd
worker2 pawd
worker3 pawd
worker4 pawd
worker5 pawd
if you want to get information of site B from site A .
then you need to type cmd like get info for siteB_IP
on site A.
suppose there are 100 IPs are stored in a list names IPlist
how to fetch information of 100 IPs with 5 avaliable accounts in parallel by threading , and then all of the information can be sotored in a variable without conflict.
what I have tried is below , below codes can not be executed due to I have no way to achieve the solution:
import threading
user = 'root'
pwd = 'Changeme123'
# the first step is to logon with default account
rs = link.send_cmd(r':lognew:' + '"' + user + '","' + pwd + '"')
# then get all nebor ip from the logon site, the function parse_multi is used for parsing data
IPlist = parse_multi(link.send_cmd('get-IP-info:0xffff'))
def Fetchinfo(user, ip):
rs = link.send_cmd(r':lognew:' + '"' + user + '","' + pwd + '"')
areainfo = link.send_cmd('get info for ' + site_IP)
for ip in IPlist:
# how to handle 100 IPs in the situstion of 5 accounts avaliable ?
thread = threading.thread(target = Fetchinfo, args = [worker, ip]
Solution
Since you don't want calls from the same account id and passwords to happen concurrently, you can define a function that sequentially loops through a sub-list of IPs to fetch synchronously:
def fetch_data_for_ips(account_id, account_password, ips_to_fetch):
results = list()
for ip_to_fetch in ips_to_fetch:
# fetch with the account_id and password synchronously
result = ...
results.append(result)
return results // Added this
Then, use a thread pool, to run the different batches concurrently for each account:
from concurrent.futures import ThreadPoolExecutor, as_completed
# Split the workload for each account to fetch
num, remainder = divmod(len(ip_list), len(accounts))
num_ips_for_each_account = num + bool(remainder)
# This gives e.g. [[1,2,3], [4,5,6]], where each sublist is for each account to fetch
ip_lists_for_each_account = [ip_list[i: i + num_ips_for_each_account] for i in range(0, len(ip_list), num_ips_for_each_account)]
# You should only need number of threads = to the number of accounts you have
with ThreadPoolExecutor(len(accounts)) as executor:
# Feel free to use a set instead if you don't need to know which result came from which thread
futures = dict()
results = list()
for (account_id, account_password), ips_to_fetch in zip(accounts, ip_lists_for_each_account):
future = executor.submit(fetch_data_for_ips, account_id, account_password, ips_to_fetch)
futures[future] = account_id
for future in as_completed(futures):
result = future.result()
account_id = futures[future]
print(f'{account_id} fetched these:', result)
results.extend(result)
Answered By - rcshon
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.