GeminiMultimodalLiveLLMService went wrong when use TEXT #1028

fatwang2 · 2025-01-17T13:40:43Z

it will turn wrong when you use TEXT mode and share your video or screen, it is okay before, I doubut that it is a bug of Gemini, beacause it doesn't work on google ai studio now

fatwang2 · 2025-01-17T14:12:46Z

Why it is incorrect when sharing video or screen? it worked before.

fatwang2 · 2025-01-19T01:37:33Z

@aconchillo it doesn't work in v0.0.53 if I open the camera at the same time

fatwang2 · 2025-01-19T03:02:55Z

here is the respense

2025-01-19 11:00:36.102 | ERROR    | pipecat.services.gemini_multimodal_live.gemini:_receive_task_handler:497 - SentryGeminiService#1 exception: received 1007 (invalid frame payload data) Request trace id: 7655f1e9a7fe132a, [ORIGINAL ERROR] generic::invalid_argument: Image tensors read from serialized content ; then sent 1007 (invalid frame payload data) Request trace id: 7655f1e9a7fe132a, [ORIGINAL ERROR] generic::invalid_argument: Image tensors read from serialized content 
2025-01-19 11:00:36.115 | ERROR    | pipecat.services.gemini_multimodal_live.gemini:_ws_send:459 - Error sending message to websocket: received 1007 (invalid frame payload data) Request trace id: 7655f1e9a7fe132a, [ORIGINAL ERROR] generic::invalid_argument: Image tensors read from serialized content ; then sent 1007 (invalid frame payload data) Request trace id: 7655f1e9a7fe132a, [ORIGINAL ERROR] generic::invalid_argument: Image tensors read from serialized content 
2025-01-19 11:00:36.119 | ERROR    | pipecat.pipeline.task:_process_up_queue:288 - Error running app: ErrorFrame#0(error: Error sending client event: received 1007 (invalid frame payload data) Request trace id: 7655f1e9a7fe132a, [ORIGINAL ERROR] generic::invalid_argument: Image tensors read from serialized content ; then sent 1007 (invalid frame payload data) Request trace id: 7655f1e9a7fe132a, [ORIGINAL ERROR] generic::invalid_argument: Image tensors read from serialized content , fatal: True)

vipyne · 2025-01-20T22:53:51Z

Hi @fatwang2 can you show what code you are running?

I was only able to reproduce this issue 1 time out of many, many runs. (I started with the examples/foundational/26c-gemini-multimodal-live-video.py demo, then added params=InputParams(modalities=GeminiMultimodalModalities.TEXT) to GeminiMultimodalLiveLLMService and added a separate TTS service.)

Looking at this issue- this may be a gemini multimodal live bug.

fatwang2 · 2025-01-20T23:53:08Z

Use the TEXT mode not in input,but in output, your demo have also mentioned that

        # Optionally, you can set the response modalities via a function
        llm.set_model_modalities(
            GeminiMultimodalModalities.TEXT
         )

It will happen everytime when you share your camera or screen, and it is a gemini bug. I have reported to them too, just feel like you guys need to know this too.

fatwang2 · 2025-01-23T01:01:20Z

google has fixed it

chadbailey59 assigned vipyne Jan 20, 2025

fatwang2 closed this as completed Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GeminiMultimodalLiveLLMService went wrong when use TEXT #1028

GeminiMultimodalLiveLLMService went wrong when use TEXT #1028

fatwang2 commented Jan 17, 2025

fatwang2 commented Jan 17, 2025

fatwang2 commented Jan 19, 2025 •

edited

Loading

fatwang2 commented Jan 19, 2025

vipyne commented Jan 20, 2025

fatwang2 commented Jan 20, 2025

fatwang2 commented Jan 23, 2025

GeminiMultimodalLiveLLMService went wrong when use TEXT #1028

GeminiMultimodalLiveLLMService went wrong when use TEXT #1028

Comments

fatwang2 commented Jan 17, 2025

fatwang2 commented Jan 17, 2025

fatwang2 commented Jan 19, 2025 • edited Loading

fatwang2 commented Jan 19, 2025

vipyne commented Jan 20, 2025

fatwang2 commented Jan 20, 2025

fatwang2 commented Jan 23, 2025

fatwang2 commented Jan 19, 2025 •

edited

Loading