Tuesday, June 28, 2022

[FIXED] Using Python Map and Reduce Functions Together Correctly

June 28, 2022 numpy, python No comments

Issue

I have a tuple (seedTuple in the code below) that itself contains 8 tuples, inside of each of which is a pair of 0D numpy arrays. A function, conflateDistributions, combines each tuple of arrays sequentially to yield a single pair of 0D numpy arrays as a result. The code below accomplishes the desired operations and gives the output in the desired format:

import numpy as np
from functools import reduce
numBins=100
rng=np.random.default_rng(1)    #Random number generator object
probArray=rng.random((8,numBins))   #Random numbers over 8x100 array
probArray=probArray/np.sum(probArray,axis=1,keepdims=True)    #Normalize each row so that probability sums to 1
valueArray=rng.integers(0,numBins,(8,numBins))  #Make array of values corresponding to probability array
seedTuple=tuple(zip(probArray,valueArray))  
def conflateDistributions(tuple1,tuple2):
    #Each tuple should be of length 2, with the first entry being a histogram count frequency and the second being bin centers
    conflateProbability=np.multiply.outer(tuple1[0],tuple2[0])
    #conflateBinCenters takes the bin centers provided and adds every combination of them
    conflateBinCenters=np.add.outer(tuple1[1],tuple2[1])
    leftBinEdge1=np.amin(tuple1[1])
    leftBinEdge2=np.amin(tuple2[1])
    rightBinEdge1=np.amax(tuple1[1])
    rightBinEdge2=np.amax(tuple2[1])
    newProbs,binsConflated=np.histogram(conflateBinCenters,bins=numBins,\
            range=(leftBinEdge1+leftBinEdge2-((rightBinEdge1-leftBinEdge1+rightBinEdge2-leftBinEdge2)/(2*(numBins-2))),\
                    rightBinEdge1+rightBinEdge2+((rightBinEdge1-leftBinEdge1+rightBinEdge2-leftBinEdge2)/(2*(numBins-2)))),\
                        weights=conflateProbability)
    centersConflated=0.5*(binsConflated[:-1]+binsConflated[1:])
    return (newProbs,centersConflated)
combinedWeights,combinedBins=reduce(conflateDistributions,seedTuple)

Now the twist: instead of just 1 seedTuple, I actually have a list of ~2000 seedTuples on which I need to execute the same operation (reduce(conflateDistribution)) as above. I will be repeating this sequence of operations numerous times, so I am looking for an efficient non-for-loop approach to run the reduce(conflateDistributions) operations on all 2000 elements. I wanted to use something along the lines of:

actualDataSizeList=[seedTuple for ii in np.arange(2000)]  #Example only, data in each seedTuple is not typically identical
overallCombinedWeights,overallCombinedBins=map(reduce(conflateDistributions),actualDataSizeList)

But I receive the error "TypeError: reduce expected at least 2 arguments, got 1". I understand why reduce is throwing that error, but I would love some help with how to correct this syntax such that the output is a size 2000 list or array with each element containing 2 arrays (combinedWeights & combinedBins).

Python version 3.9.7

Solution

functools.partial() is probably what you want.

iterator = map(functools.partial(reduce, conflateDistributions), actualDataSizeList)
for combinedWeights, combinedBins in iterator:
    # do stuff with each result

Note that this probably won't be much faster than just doing it inside the for loop (python still runs it in serial), but it will use less memory than making a list immediately. If you want to do it in parallel, take a look at multiprocessing.Pool.

Answered By - yut23

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, June 28, 2022

[FIXED] Using Python Map and Reduce Functions Together Correctly

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels