Issue
I have a problem training a computer vision Model in google could, I am sure that the problem is related to GPU. I know that google say be default you have 1 GPU put the training fails with this message error : "The request for 8 K80 accelerators exceeds the allowed maximum of 0 A100, 0 K80, 0 P100, 0 P4, 0 T4, 0 TPU_V2, 0 TPU_V2_POD, 0 TPU_V3, 0 TPU_V3_POD, 0 V100 accelerators."
you can se i have 0 from all accelerators
here is my full command i am trying to run :
gcloud ai-platform jobs submit training segmentation_maskrcnn_test_0 ^
--runtime-version 2.1 ^
--python-version 3.7 ^
--job-dir=gs://image-segmentation-b/training-process ^
--package-path ./object_detection ^
--module-name object_detection.model_main_tf2 ^
--region us-central1 ^
--scale-tier CUSTOM ^
--master-machine-type n1-highcpu-32 ^
--master-accelerator count=8,type=nvidia-tesla-k80 ^
-- ^
--model_dir=gs://image-segmentation-b/training-process ^
--pipeline_config_path=gs:gs://image-segmentation-b/mask_rcnn_inception_resnet_v2_1024x1024_coco17_gpu-8 - cloud.config
and here is the full error :
ERROR: (gcloud.ai-platform.jobs.submit.training) HttpError accessing <https://ml.googleapis.com/v1/projects/project id/jobs?alt=json>: response: <{'vary': 'Origin, X-Origin, Referer', 'content-type': 'application/json; charset=UTF-8', 'content-encoding': 'gzip', 'date': 'Tue, 18 Jan 2022 11:12:39 GMT', 'server': 'ESF', 'cache-control': 'private', 'x-xss-protection': '0', 'x-frame-options': 'SAMEORIGIN', 'x-content-type-options': 'nosniff', 'alt-svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"', 'transfer-encoding': 'chunked', 'status': 429}>, content <{
"error": {
"code": 429,
"message": "Quota failure for project project id. The request for 8 K80 accelerators exceeds the allowed maximum of 0 A100, 0 K80, 0 P100, 0 P4, 0 T4, 0 TPU_V2, 0 TPU_V2_POD, 0 TPU_V3, 0 TPU_V3_POD, 0 V100 accelerators. To read more about Cloud ML Engine quota, see https://cloud.google.com/ml-engine/quotas.",
"status": "RESOURCE_EXHAUSTED",
"details": [
{
"@type": "type.googleapis.com/google.rpc.QuotaFailure",
"violations": [
{
"subject": "project id",
"description": "The request for 8 K80 accelerators exceeds the allowed maximum of 0 A100, 0 K80, 0 P100, 0 P4, 0 T4, 0 TPU_V2, 0 TPU_V2_POD, 0 TPU_V3, 0 TPU_V3_POD, 0 V100 accelerators."
}
]
}
]
}
}
>
This may be due to network connectivity issues. Please check your network settings, and the status of the service you are trying to reach.
How can I fix this error? Do I have to go somewhere and enable GPU for the project?
Solution
You need to raise your GPU quota before you can train your models.
Either your project, or your account does not have enough GPU quota to fulfill your request.
You can check your quotas here: API Quotas
Answered By - Iñigo González
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.