Issue
I have a tuple (seedTuple
in the code below) that itself contains 8 tuples, inside of each of which is a pair of 0D numpy arrays. A function, conflateDistributions, combines each tuple of arrays sequentially to yield a single pair of 0D numpy arrays as a result. The code below accomplishes the desired operations and gives the output in the desired format:
import numpy as np
from functools import reduce
numBins=100
rng=np.random.default_rng(1) #Random number generator object
probArray=rng.random((8,numBins)) #Random numbers over 8x100 array
probArray=probArray/np.sum(probArray,axis=1,keepdims=True) #Normalize each row so that probability sums to 1
valueArray=rng.integers(0,numBins,(8,numBins)) #Make array of values corresponding to probability array
seedTuple=tuple(zip(probArray,valueArray))
def conflateDistributions(tuple1,tuple2):
#Each tuple should be of length 2, with the first entry being a histogram count frequency and the second being bin centers
conflateProbability=np.multiply.outer(tuple1[0],tuple2[0])
#conflateBinCenters takes the bin centers provided and adds every combination of them
conflateBinCenters=np.add.outer(tuple1[1],tuple2[1])
leftBinEdge1=np.amin(tuple1[1])
leftBinEdge2=np.amin(tuple2[1])
rightBinEdge1=np.amax(tuple1[1])
rightBinEdge2=np.amax(tuple2[1])
newProbs,binsConflated=np.histogram(conflateBinCenters,bins=numBins,\
range=(leftBinEdge1+leftBinEdge2-((rightBinEdge1-leftBinEdge1+rightBinEdge2-leftBinEdge2)/(2*(numBins-2))),\
rightBinEdge1+rightBinEdge2+((rightBinEdge1-leftBinEdge1+rightBinEdge2-leftBinEdge2)/(2*(numBins-2)))),\
weights=conflateProbability)
centersConflated=0.5*(binsConflated[:-1]+binsConflated[1:])
return (newProbs,centersConflated)
combinedWeights,combinedBins=reduce(conflateDistributions,seedTuple)
Now the twist: instead of just 1 seedTuple
, I actually have a list of ~2000 seedTuple
s on which I need to execute the same operation (reduce(conflateDistribution)) as above. I will be repeating this sequence of operations numerous times, so I am looking for an efficient non-for-loop approach to run the reduce(conflateDistributions) operations on all 2000 elements. I wanted to use something along the lines of:
actualDataSizeList=[seedTuple for ii in np.arange(2000)] #Example only, data in each seedTuple is not typically identical
overallCombinedWeights,overallCombinedBins=map(reduce(conflateDistributions),actualDataSizeList)
But I receive the error "TypeError: reduce expected at least 2 arguments, got 1". I understand why reduce is throwing that error, but I would love some help with how to correct this syntax such that the output is a size 2000 list or array with each element containing 2 arrays (combinedWeights
& combinedBins
).
Python version 3.9.7
Solution
functools.partial()
is probably what you want.
iterator = map(functools.partial(reduce, conflateDistributions), actualDataSizeList)
for combinedWeights, combinedBins in iterator:
# do stuff with each result
Note that this probably won't be much faster than just doing it inside the for loop (python still runs it in serial), but it will use less memory than making a list immediately. If you want to do it in parallel, take a look at multiprocessing.Pool
.
Answered By - yut23
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.