Tuesday, February 8, 2022

[FIXED] multiprocessing issue with Windows

February 08, 2022 multiprocessing, multithreading, python, spyder No comments

Issue

When using multiprocessing with Windows, we must have if __name__ == '__main__':. For example:

# Script1.py

import multiprocessing
class Test(object):
    def __init__(self, x):
        self.x = x

    def square(self, i, return_jobs, x):
        i = x**2
        return_jobs[i] = i 

    def run(self):
        if __name__ == "__main__":
            manager = multiprocessing.Manager()
            return_jobs = manager.dict()
            jobs = []
            for i in range(len(self.x)):
                p = multiprocessing.Process(target = self.square, args=(i, return_jobs , self.x[i]) )
                jobs.append(p)
                p.start()
            for proc in jobs:
                print(proc)
                proc.join()
            print('result',return_jobs.values())

Test([2, 3]) .run()

This simple example script ran fine and return something like: result [4, 9]. However, if I have have a different script and import Script1 and use Test then it won't work. That is,

# Script2.py

from Script1.py import Test 
class Test2(object):
    def __init__(self, y):
        self.y = y
        
    def run(self):
        z = Test(self.y).run()

will not invoke the function Test(self.y).run() at all. However, if I place the class Test2 in the same script as Test (# Script1.py ) then all is fine.

What is the best way to fix this? The Script1.py is a subprocess of the overall code. I don't want to have to combine these scripts together...

I should also note that I am using Spyder as well. This could be a problem.

Solution

First, in Script1.py, where you have placed the if __name__ == "__main__": check is not the correct place. It should be placed as follows:

if __name__ == "__main__":
    Test([2, 3]) .run()

This is for two reasons. First, when the new processes are created, any statements at global scope will be executed by these processes. If you do not put the check as I have above, you will be needlessly creating instances of Test objects. It's true that when run is invoked against these objects run will immediately return because of where you did place the check, but why create the objects to begin with?

But the real reason for moving the check as I have done is that you only want to execute the statement Test([2, 3]).run() when you are executing Script1.py as the "main" script and not when it is being imported by some other script. By placing the check as I have done, when it is imported its name will not be "__main__" any more and therefore that statement will not be executed, which gives you more flexibility.

This now allows you in Script2.py to add your own if __name__ == '__main__': check as follows:

from Script1 import Test

class Test2(object):
    def __init__(self, y):
        self.y = y

    def run(self):
        z = Test(self.y).run()

if __name__ == '__main__':
    Test2([3, 6]).run()

Prints:

<Process name='Process-2' pid=9200 parent=4492 started>
<Process name='Process-3' pid=16428 parent=4492 started>
result [9, 36]

So that when Script2.py is the "main" script being executed, you have control over what object gets created and run.

Explanation

The important thing to remember with Windows is that when a script launches a new process that process starts execution of the source from the top so all statements at global scope (import statements, function declarations, variable assignments, etc.) are executed. Thus you want to avoid having at global scope things that don't need to be there since they will be re-executed by the new process and you might be doing for instance a calculation or creation of a large data structure that the newly created process does not use and you have wasted CPU cycles or memory for nothing. But you absolutely must not have any statements at global scope that when executed end up re-creating recursively the process you just created. That is why we have the need for the if __name__ == "__main__": around such statements (__name__ will not be "__main__" in the newly created process). So there is no need to have such a check in the run method, which is not at global scope. But eventually, in whatever script you run to starts things off, you will need that check for any code at global scope code that creates a process or invokes a function or method that creates a process.

Note that when Script2.py imports Script1.py, Script1.py is now a module and it's __name__ value will be "Script1", and again the code Test([2, 3]).run() will not execute. So that also explains why when we create a module we can place testing code within an if __name__ == "__main__": block -- it will not be executed when the module is imported.

Answered By - Booboo

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, February 8, 2022

[FIXED] multiprocessing issue with Windows

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels