Example Dataset for Inference with DrCaps_Zeroshot_Audio_Captioning #169

javanasse · 2024-11-08T18:21:48Z

Can the developers provide an example JSONL file for running inference on unlabeled audio using DrCaps_Zeroshot_Audio_Captioning?

It appears that the dataset JSONL must have this form:

{"source": "/path/to/a_file.wav", "key": "", "target": "", "text": "", "similar_captions": ""}

but the content for each field is not clear to me. What should populate "target", "text" and ""similar_captions"?

Thank you!

The text was updated successfully, but these errors were encountered:

ddlBoJack · 2024-11-09T07:07:04Z

Please refer to #170

Andreas-Xi · 2024-11-09T10:14:42Z

Hi thanks for following our work, we have uploaded an example inference data for Audiocaps and Clotho in examples/drcap_zeroshot_aac/data_examples/ . Feel free to check it out. For each filed, "target" is the ground truth caption, "text" is the caption fed to CLAP text encoder during training. "Text" and "target" are the same in the last version. But we previously conducted experiments on replacing certain words in the ground truth captions to enhance model robustness, which is why there are both 'text' and 'target' fields. And similar_captions are captions similar to "target" (i.e. GT captions) to perform RAG.

javanasse · 2024-11-10T21:45:09Z

Thanks for your timely response. Is it possible to infer the caption for an audio file when "text" and "target" are unknown? If I have misunderstood, please correct me.

Andreas-Xi · 2024-11-10T21:49:22Z

Yes, as long as you have audio_source and similar_captions it is possible to perform inference.

ddlBoJack assigned ddlBoJack and Andreas-Xi and unassigned ddlBoJack Nov 8, 2024

ddlBoJack mentioned this issue Nov 9, 2024

Lxq drcap #170

Merged

7 tasks

ddlBoJack closed this as completed Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example Dataset for Inference with DrCaps_Zeroshot_Audio_Captioning #169

Example Dataset for Inference with DrCaps_Zeroshot_Audio_Captioning #169

javanasse commented Nov 8, 2024

ddlBoJack commented Nov 9, 2024

Andreas-Xi commented Nov 9, 2024 •

edited

Loading

javanasse commented Nov 10, 2024

Andreas-Xi commented Nov 10, 2024

Example Dataset for Inference with DrCaps_Zeroshot_Audio_Captioning #169

Example Dataset for Inference with DrCaps_Zeroshot_Audio_Captioning #169

Comments

javanasse commented Nov 8, 2024

ddlBoJack commented Nov 9, 2024

Andreas-Xi commented Nov 9, 2024 • edited Loading

javanasse commented Nov 10, 2024

Andreas-Xi commented Nov 10, 2024

Andreas-Xi commented Nov 9, 2024 •

edited

Loading