Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GeminiMultimodalLiveLLMService went wrong when use TEXT #1028

Closed
fatwang2 opened this issue Jan 17, 2025 · 7 comments
Closed

GeminiMultimodalLiveLLMService went wrong when use TEXT #1028

fatwang2 opened this issue Jan 17, 2025 · 7 comments
Assignees

Comments

@fatwang2
Copy link

it will turn wrong when you use TEXT mode and share your video or screen, it is okay before, I doubut that it is a bug of Gemini, beacause it doesn't work on google ai studio now

@fatwang2
Copy link
Author

Why it is incorrect when sharing video or screen? it worked before.

@fatwang2
Copy link
Author

fatwang2 commented Jan 19, 2025

@aconchillo it doesn't work in v0.0.53 if I open the camera at the same time

@fatwang2
Copy link
Author

here is the respense

2025-01-19 11:00:36.102 | ERROR    | pipecat.services.gemini_multimodal_live.gemini:_receive_task_handler:497 - SentryGeminiService#1 exception: received 1007 (invalid frame payload data) Request trace id: 7655f1e9a7fe132a, [ORIGINAL ERROR] generic::invalid_argument: Image tensors read from serialized content ; then sent 1007 (invalid frame payload data) Request trace id: 7655f1e9a7fe132a, [ORIGINAL ERROR] generic::invalid_argument: Image tensors read from serialized content 
2025-01-19 11:00:36.115 | ERROR    | pipecat.services.gemini_multimodal_live.gemini:_ws_send:459 - Error sending message to websocket: received 1007 (invalid frame payload data) Request trace id: 7655f1e9a7fe132a, [ORIGINAL ERROR] generic::invalid_argument: Image tensors read from serialized content ; then sent 1007 (invalid frame payload data) Request trace id: 7655f1e9a7fe132a, [ORIGINAL ERROR] generic::invalid_argument: Image tensors read from serialized content 
2025-01-19 11:00:36.119 | ERROR    | pipecat.pipeline.task:_process_up_queue:288 - Error running app: ErrorFrame#0(error: Error sending client event: received 1007 (invalid frame payload data) Request trace id: 7655f1e9a7fe132a, [ORIGINAL ERROR] generic::invalid_argument: Image tensors read from serialized content ; then sent 1007 (invalid frame payload data) Request trace id: 7655f1e9a7fe132a, [ORIGINAL ERROR] generic::invalid_argument: Image tensors read from serialized content , fatal: True)

@vipyne
Copy link
Member

vipyne commented Jan 20, 2025

Hi @fatwang2 can you show what code you are running?

I was only able to reproduce this issue 1 time out of many, many runs. (I started with the examples/foundational/26c-gemini-multimodal-live-video.py demo, then added params=InputParams(modalities=GeminiMultimodalModalities.TEXT) to GeminiMultimodalLiveLLMService and added a separate TTS service.)

Looking at this issue- this may be a gemini multimodal live bug.

@fatwang2
Copy link
Author

  1. Use the TEXT mode not in input,but in output, your demo have also mentioned that
        # Optionally, you can set the response modalities via a function
        llm.set_model_modalities(
            GeminiMultimodalModalities.TEXT
         )
  1. It will happen everytime when you share your camera or screen, and it is a gemini bug. I have reported to them too, just feel like you guys need to know this too.

@fatwang2
Copy link
Author

google has fixed it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@vipyne @fatwang2 and others