Issue
I'm trying to run a simple Ada-boosted Decision Tree regressor on GCP Vertex AI. To parse hyperparams and other arguments I use Click for Python, a very simple CLI library. Here's the setup for my task function:
@click.command()
@click.argument("input_path", type=str)
@click.option("--output-path", type=str, envvar='AIP_MODEL_DIR')
@click.option('--gcloud', is_flag=True, help='Run as if in Google Cloud Vertex AI Pipeline')
@click.option('--grid', is_flag=True, help='Perform a grid search instead of a single run. Ignored with --gcloud')
@click.option("--max_depth", type=int, default=4, help='Max depth of decision tree', show_default=True)
@click.option("--n_estimators", type=int, default=50, help='Number of AdaBoost boosts', show_default=True)
def click_main(input_path, output_path, gcloud, grid, max_depth, n_estimators):
train_model(input_path, output_path, gcloud, grid, max_depth, n_estimators)
def train_model(input_path, output_path, gcloud, grid, max_depth, n_estimators):
print(input_path, output_path, gcloud)
logger = logging.getLogger(__name__)
logger.info("training models from processed data")
...
When I run it locally like below, Click correctly grabs the params both from console and environment and proceeds with model training (AIP_MODEL_DIR
is gs://(BUCKET_NAME)/models
)
❯ python3 -m src.models.train_model gs://(BUCKET_NAME)/data/processed --gcloud
gs://(BUCKET_NAME)/data/processed gs://(BUCKET_NAME)/models True
However, when I put this code on the Vertex AI Pipeline, it throws an error, namely
FileNotFoundError: b/(BUCKET_NAME)/o/data%2Fprocessed%20%20--gcloud%2Fprocessed_features.csv
As it is clearly seen, Click grabs both the parameter and the --gcloud
option and assigns it to input_path
. The print statement before that confirms it, both by having one too many spaces and --gcloud
being parsed as false.
gs://(BUCKET_NAME)/data/processed --gcloud gs://(BUCKET_NAME)/models/1/model/ False
Has anyone here encountered this issue or have any idea how to solve it?
Solution
I think is due the nature of arguments and options, you are mixing arguments and options although is not implicit stated in the documentation but argument will eat up the options that follow. If nargs is not allocated it will default to 1 considering everything after it follows as string which it looks like this is the case.
nargs – the number of arguments to match. If not 1 the return value is a tuple instead of single value. The default for nargs is 1 (except if the type is a tuple, then it’s the arity of the tuple).
I think you should first use options followed by the argument as display on the documentation page. Other approach is to group it under a command as show on this link.
Answered By - Betjens
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.