Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High failed rate when running GCP gemini batch inference jobs #4485

Open
westonli-thu opened this issue Sep 29, 2024 · 0 comments
Open

High failed rate when running GCP gemini batch inference jobs #4485

westonli-thu opened this issue Sep 29, 2024 · 0 comments
Assignees
Labels
api: vertex-ai Issues related to the googleapis/python-aiplatform API.

Comments

@westonli-thu
Copy link

Hi, we are using batch inference with gemini-1.5-pro, while we found:

  1. when we submit > 1 tasks, at most 1 task can succeed, the other will failed (is this because the quota?)

  2. when we run batch inference job (even though we only submit 1 task), it sometimes occurs the following issue & the whole task will fail:

RESOURCE_EXHAUSTED error occurred: {"error": {"code": 429, "message": "Online prediction request quota exceeded for gemini-1.5-pro. Please try again later with backoff.", "status": "RESOURCE_EXHAUSTED"}}

Can we have any methods that:

if the quota exceed, the job can automatically pending & retry

if some of the samples fails to be inferred, keep the results of other samples, do not let the complete job failed.

@product-auto-label product-auto-label bot added the api: vertex-ai Issues related to the googleapis/python-aiplatform API. label Sep 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: vertex-ai Issues related to the googleapis/python-aiplatform API.
Projects
None yet
Development

No branches or pull requests

2 participants