Issue
I write a lot of cli tools in python. Most of my tools have something like this to get arguments:
import argparse
parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument('integers', metavar='N', type=int, nargs='+',
help='an integer for the accumulator')
parser.add_argument('--sum', dest='accumulate', action='store_const',
const=sum, default=max,
help='sum the integers (default: find the max)')
args = parser.parse_args()
I write code that is half data science (we're bioinformaticians). This args bit normally lives in "main.py" then gets passed to some sort of "run experiment" function/method which will often use multiprocessing Pool to break the task apart (by passing off to other functions/classes). So most of the arguments from the command line need to be parsed to the run function then the new process. This architecture is not not up for debate for various reasons.
I'm cognizant of this https://python-docs.readthedocs.io/en/latest/writing/style.html
BAD
def make_complex(*args):
x, y = args
return dict(**locals())
GOOD
def make_complex(x, y):
return {'x': x, 'y': y}
My tools often have 10-20 parameters stored in args, from argparse. So the question is - should I parse them packaged up as args or unpack them and pass them individually? If I explicitly parse each of them to each function/class that they are intended to end up in it means I end up with a lot of redundant code (at least one function will have a massive parameter list that is identical to args). Conversely, if I just pass args around it goes against the zen...
Solution
Here's my opinion.
args
is a Namespace
object, a simple class that holds the values as attributes. It's easily converted to a dict
with vars
.
Internally argparse
code uses a lot of foo(*args, **kwargs)
, signature, especially for the add_argument
method. This gives a lot of flexibility in what parameters it accepts, but also leaves things open for errors.
In the subcommands section, the argparse
docs has an example of calling functions with args.func(args)
. This allows the different func
to use different parameters.
Often the args
contains control parameters, things like debugging
, logging
, etc. They aren't the primary parameters, but auxiliary ones that the function may, or may not, use. Passing those through several layers with args
(or a dict
) can be convenient.
On the other hand if a function might be called from other functions and other user interfaces, it may be a pain to have to create a Namespace
like object just to pass in parameters.
In short, if the function (or class) is written primarily for use by the CLI, passing the whole args
can be convenient and reasonable. But if the function might be used with different CLI, or other interfaces (such as in an imported module), more explicit positional and keyword parameters are better.
If your code is organized around classes, it may be reasonable to accept the args
namespace, and then assign the desired attributes to instance variables. The instance doesn't use args
except during initiation.
Answered By - hpaulj
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.