LLaVA-Med Performance

Performance comparison of mulitmodal chat instruction-following abilities, measured by the relative score via language GPT-4 evaluation.

Example 1: comparison of medical visual chat. The language-only GPT-4 is considered as the performance upper bound, as the golden captions and inline mentions are fed into GPT-4 as the context, without requiring the model to understand the raw image.

Example 2: comparison of medical visual chat. LLaVA tends to halluciate or refuse to provide domain-specific knowledgable response.

Performance comparison of fine-tuned LLaVA-Med on established Medical QVA datasets.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llava_med_performance.md

llava_med_performance.md

LLaVA-Med Performance

Files

llava_med_performance.md

Latest commit

History

llava_med_performance.md

File metadata and controls

LLaVA-Med Performance