High failed rate when running GCP gemini batch inference jobs #4485

westonli-thu · 2024-09-29T01:41:54Z

Hi, we are using batch inference with gemini-1.5-pro, while we found:

when we submit > 1 tasks, at most 1 task can succeed, the other will failed (is this because the quota?)
when we run batch inference job (even though we only submit 1 task), it sometimes occurs the following issue & the whole task will fail:

RESOURCE_EXHAUSTED error occurred: {"error": {"code": 429, "message": "Online prediction request quota exceeded for gemini-1.5-pro. Please try again later with backoff.", "status": "RESOURCE_EXHAUSTED"}}

Can we have any methods that:

if the quota exceed, the job can automatically pending & retry

if some of the samples fails to be inferred, keep the results of other samples, do not let the complete job failed.

The text was updated successfully, but these errors were encountered:

product-auto-label bot added the api: vertex-ai Issues related to the googleapis/python-aiplatform API. label Sep 29, 2024

jaycee-li assigned weichungw Oct 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High failed rate when running GCP gemini batch inference jobs #4485

High failed rate when running GCP gemini batch inference jobs #4485

westonli-thu commented Sep 29, 2024

High failed rate when running GCP gemini batch inference jobs #4485

High failed rate when running GCP gemini batch inference jobs #4485

Comments

westonli-thu commented Sep 29, 2024