Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Sync Features] add vila, add wildvision, add vibe-eval, add interleave bench #138

Merged
merged 35 commits into from
Jul 13, 2024

Conversation

Luodian
Copy link
Contributor

@Luodian Luodian commented Jul 9, 2024

Before you open a pull-request, please check if a similar issue already exists or has been closed before.

When you open a pull-request, please be sure to include the following

  • A descriptive title: [xxx] XXXX
  • A detailed description

Thank you for your contributions!

kcz358 and others added 30 commits June 13, 2024 02:33
Update videomme task [w,w/o subtitle] and modified prompt for ablations
commit 050b2c3
Merge: 74facb4 ef30651
Author: Li Bo <[email protected]>
Date:   Tue Jun 18 13:13:38 2024 +0800

    Merge pull request #114 from zjysteven/add-tinyllava

    add tinyllava

commit ef30651
Author: Jingyang Zhang <[email protected]>
Date:   Mon Jun 17 17:57:02 2024 -0400

    fix typo

commit 9bab677
Merge: dbfb238 74facb4
Author: Jingyang Zhang <[email protected]>
Date:   Sun Jun 16 10:56:05 2024 -0400

    Merge branch 'EvolvingLMMs-Lab:main' into add-tinyllava

commit 74facb4
Merge: 8ba192f d5df72d
Author: Li Bo <[email protected]>
Date:   Sun Jun 16 17:59:19 2024 +0800

    Merge pull request #118 from teowu/main

    Fix the potential risk by PR #117

commit d5df72d
Merge: 5bf59ed 8ba192f
Author: Teo (Timothy) Wu Haoning <[email protected]>
Date:   Sun Jun 16 15:32:13 2024 +0800

    Merge branch 'EvolvingLMMs-Lab:main' into main

commit 5bf59ed
Author: teowu <[email protected]>
Date:   Sun Jun 16 07:27:28 2024 +0000

    fix #117, allow auto download with tar format videos

commit 98b3955
Merge: a056f11 be9dada
Author: teowu <[email protected]>
Date:   Sun Jun 16 07:25:07 2024 +0000

    Merge branch 'main' of https://github.com/teowu/lmms-eval into main

commit a056f11
Author: teowu <[email protected]>
Date:   Sun Jun 16 07:23:54 2024 +0000

    fix #117, allow auto download with tar format videos

commit 8ba192f
Merge: 7cc2890 be9dada
Author: Li Bo <[email protected]>
Date:   Sat Jun 15 17:30:59 2024 +0800

    Merge pull request #117 from teowu/main

    LongVideoBench for LMMs-Eval

commit be9dada
Merge: 62ea8ce 7cc2890
Author: Teo (Timothy) Wu Haoning <[email protected]>
Date:   Sat Jun 15 16:39:20 2024 +0800

    Merge pull request #1 from EvolvingLMMs-Lab/main

    Merge pull request #113 from teowu/main

commit 62ea8ce
Author: teowu <[email protected]>
Date:   Sat Jun 15 08:30:11 2024 +0000

    LongVideoBench support: image LMMs (idefics2, phi3) and video LMMs (LLaVA-Next-Video-34B)

commit 7cc2890
Merge: 4bc7224 ea14cd4
Author: Li Bo <[email protected]>
Date:   Sat Jun 15 14:10:22 2024 +0800

    Merge pull request #113 from teowu/main

    Q-Bench, Q-Bench2, A-Bench

commit dbfb238
Author: Jingyang <[email protected]>
Date:   Fri Jun 14 16:20:42 2024 -0400

    add tinyllava

commit ea14cd4
Author: teowu <[email protected]>
Date:   Fri Jun 14 15:01:52 2024 +0000

    Add qbench, qbench2, abench; fix phi3v as its current implementation does not support multi-image

commit 4bc7224
Merge: 2797987 bf14cb8
Author: Li Bo <[email protected]>
Date:   Fri Jun 14 02:14:43 2024 +0800

    Merge pull request #111 from XinrunDu/main

    add II-Bench

commit bf14cb8
Author: XinrunDu <[email protected]>
Date:   Thu Jun 13 09:37:02 2024 +0000

    fix dataset_path

commit 6248113
Author: XinrunDu <[email protected]>
Date:   Thu Jun 13 09:32:06 2024 +0000

    add II-Bench

commit 2797987
Merge: 63d82f1 66d4bb2
Author: Li Bo <[email protected]>
Date:   Thu Jun 13 11:14:47 2024 +0800

    Merge pull request #109 from EvolvingLMMs-Lab/pufanyi/update_version

    [Small Update] Update the version of LMMs-Eval

commit 66d4bb2
Author: Fanyi Pu <[email protected]>
Date:   Thu Jun 13 11:13:00 2024 +0800

    update version

commit 63d82f1
Author: Li Bo <[email protected]>
Date:   Thu Jun 13 11:04:32 2024 +0800

    Update README.md

commit 44a3379
Merge: 5ed0035 0ce46d0
Author: Li Bo <[email protected]>
Date:   Thu Jun 13 04:00:12 2024 +0800

    Merge pull request #105 from tianyu-z/main

    Include VCR

commit 0ce46d0
Author: Suyuchen <[email protected]>
Date:   Wed Jun 12 15:56:34 2024 -0400

    update README.md

commit 46a88d8
Merge: 47b13b9 5ed0035
Author: Suyuchen <[email protected]>
Date:   Wed Jun 12 15:50:26 2024 -0400

    merged readme.md

commit 47b13b9
Author: Suyuchen <[email protected]>
Date:   Wed Jun 12 15:30:52 2024 -0400

    update aggregation function for vcr_wiki

commit 5ed0035
Author: Li Bo <[email protected]>
Date:   Thu Jun 13 03:21:42 2024 +0800

    Update README.md

commit ed88068
Author: Li Bo <[email protected]>
Date:   Thu Jun 13 03:13:59 2024 +0800

    Update README.md

commit fea3806
Merge: d99a24a 05dc8e8
Author: Li Bo <[email protected]>
Date:   Thu Jun 13 03:11:49 2024 +0800

    Merge pull request #108 from EvolvingLMMs-Lab/internal_main_dev

    [Upgrade to v0.2] Embracing Video Evaluations with LMMs-Eval

commit 05dc8e8
Author: Bo Li <[email protected]>
Date:   Wed Jun 12 15:56:04 2024 +0000

    chore: Update lmms-eval to support video evaluations for LLaVA models

commit cbeee20
Author: Bo Li <[email protected]>
Date:   Wed Jun 12 15:50:30 2024 +0000

    chore: Update lmms-eval to support video evaluations for LLaVA models

commit f00d549
Author: Bo Li <[email protected]>
Date:   Wed Jun 12 15:46:33 2024 +0000

    Update image alignment in README.md

commit 3415633
Author: Bo Li <[email protected]>
Date:   Wed Jun 12 15:43:16 2024 +0000

    Update llava conv_template in lmms_eval/models/llava.py

commit 50575a9
Author: Bo Li <[email protected]>
Date:   Wed Jun 12 15:39:03 2024 +0000

    chore: Update lmms-eval to support video evaluations for LLaVA models

commit c9b2252
Author: Bo Li <[email protected]>
Date:   Wed Jun 12 15:33:48 2024 +0000

    Bump version to 0.2.0.dev0

commit 465bd42
Merge: e43bd84 d99a24a
Author: Bo Li <[email protected]>
Date:   Wed Jun 12 15:04:25 2024 +0000

    Merge branch 'main' of https://github.com/EvolvingLMMs-Lab/lmms-eval into internal_main_dev

commit e43bd84
Author: Bo Li <[email protected]>
Date:   Wed Jun 12 14:54:06 2024 +0000

    chore: Remove unnecessary files and code related to live_bench and sft_eval tasks

commit d99a24a
Merge: 374590b a66003b
Author: Li Bo <[email protected]>
Date:   Wed Jun 12 19:45:57 2024 +0800

    Merge pull request #107 from AtsuMiyai/new_task/upd_update

    update gpt-3.5-turbo version

commit a66003b
Author: AtsuMiyai <[email protected]>
Date:   Wed Jun 12 17:05:17 2024 +0900

    update gpt-3.5-turbo version

commit ee91f27
Author: AtsuMiyai <[email protected]>
Date:   Wed Jun 12 16:50:53 2024 +0900

    update gpt-3.5-turbo version

commit 326b969
Author: tianyu-z <[email protected]>
Date:   Mon Jun 10 20:07:40 2024 -0400

    include std and confidence interval

commit cd050d4
Author: Suyuchen <[email protected]>
Date:   Mon Jun 10 18:49:47 2024 -0400

    update vcr_wiki tasks in README.md

commit 205721e
Author: Suyuchen <[email protected]>
Date:   Mon Jun 10 18:43:15 2024 -0400

    update vcr_wiki tasks

commit db8e718
Author: tianyu-z <[email protected]>
Date:   Mon Jun 10 16:13:58 2024 -0400

    include the try-except logic for spacy

commit 427dabb
Author: Suyuchen <[email protected]>
Date:   Mon Jun 10 15:51:05 2024 -0400

    add crossed_text to vcr_wiki output

commit 043b483
Author: tianyu-z <[email protected]>
Date:   Mon Jun 10 15:47:00 2024 -0400

    switch logic

commit e1f04db
Author: tianyu-z <[email protected]>
Date:   Mon Jun 10 02:38:21 2024 -0400

    modify the form of VCR

commit 96e8d98
Author: tianyu-z <[email protected]>
Date:   Mon Jun 10 00:10:30 2024 -0400

    init include vcr

commit 374590b
Merge: 504685e cb3b9ce
Author: Kaichen Zhang - NTU <[email protected]>
Date:   Fri Jun 7 20:25:48 2024 +0800

    Merge pull request #101 from Gumpest/main

    Update conbench in README

commit 504685e
Author: Li Bo <[email protected]>
Date:   Thu Jun 6 15:42:15 2024 +0800

    Update README.md

commit cb3b9ce
Merge: c9793b3 67b64ea
Author: Yuan Zhang <[email protected]>
Date:   Thu Jun 6 11:22:24 2024 +0800

    Merge branch 'EvolvingLMMs-Lab:main' into main

commit c9793b3
Author: Yuan Zhang <[email protected]>
Date:   Thu Jun 6 11:21:05 2024 +0800

    update README

commit 67b64ea
Merge: 8ee7848 5fd6845
Author: Li Bo <[email protected]>
Date:   Wed Jun 5 23:12:58 2024 +0800

    Merge pull request #100 from Gumpest/main

    add Conbench

commit 5fd6845
Author: Yuan Zhang <[email protected]>
Date:   Wed Jun 5 21:52:31 2024 +0800

    add conbench

commit 8ee7848
Merge: 747e197 6fefaf7
Author: Li Bo <[email protected]>
Date:   Tue Jun 4 17:09:33 2024 +0800

    Merge pull request #95 from AtsuMiyai/new_task/upd

    add MM-UPD

commit 747e197
Merge: 4854a34 0584307
Author: Li Bo <[email protected]>
Date:   Tue Jun 4 17:09:04 2024 +0800

    Merge pull request #97 from CaraJ7/update

    Add MathVerse in README.md

commit 6fefaf7
Author: AtsuMiyai <[email protected]>
Date:   Tue Jun 4 17:36:39 2024 +0900

    update utils.py for leaderboard submission

commit 5f4fe36
Author: AtsuMiyai <[email protected]>
Date:   Sun Jun 2 23:28:27 2024 +0900

    slightly change query_prompt for the reproduction

commit 0584307
Author: CaraJ7 <[email protected]>
Date:   Sun Jun 2 17:05:28 2024 +0800

    Add MathVerse in README.md

commit 0581ab3
Author: AtsuMiyai <[email protected]>
Date:   Fri May 31 16:09:45 2024 +0900

    merge model_specific_prompt_kwargs and dataset_name into each task yaml

commit 4854a34
Author: Pu Fanyi <[email protected]>
Date:   Sat May 4 19:23:39 2024 +0800

    Group MMMU images into one image (#83)

    * update

    * update font

    * Add matplotlib.font_manager import in utils.py

    * Refactor font handling in add_order_label function in utils.py

    * group mmmu

    ---------

    Co-authored-by: Li Bo <[email protected]>

commit d224794
Author: AtsuMiyai <[email protected]>
Date:   Wed May 29 15:15:59 2024 +0900

    add upd

commit 453e793
Author: AtsuMiyai <[email protected]>
Date:   Wed May 29 15:03:30 2024 +0900

    add upd

commit 909edd6
Author: AtsuMiyai <[email protected]>
Date:   Wed May 29 12:52:21 2024 +0900

    add upd

commit 7c1ac97
Author: AtsuMiyai <[email protected]>
Date:   Wed May 29 12:50:32 2024 +0900

    add upd

commit 811301c
Author: AtsuMiyai <[email protected]>
Date:   Wed May 29 12:46:58 2024 +0900

    add upd

commit 71401ba
Author: AtsuMiyai <[email protected]>
Date:   Wed May 29 12:41:21 2024 +0900

    add upd

commit 24dc435
Author: Bo Li <[email protected]>
Date:   Mon May 27 10:17:32 2024 +0000

    fix compatibility issue of older version llava

commit 616edf4
Author: Bo Li <[email protected]>
Date:   Mon May 27 09:32:26 2024 +0000

    [Fix] import issues of multilingual llava and olympiadbench

commit 4c5a99e
Merge: 45c05b2 b05c3e2
Author: Li Bo <[email protected]>
Date:   Mon May 27 14:19:53 2024 +0800

    Merge pull request #87 from vfragoso/vifragos/phi3v

    Adding microsoft/Phi-3-vision-128k-instruct model.

commit b05c3e2
Author: Victor Fragoso <[email protected]>
Date:   Fri May 24 16:36:37 2024 +0000

    Adding documentation of Phi3v class.

commit c200897
Author: Victor Fragoso <[email protected]>
Date:   Fri May 24 16:25:02 2024 +0000

    Adding prompt arguments for Phi3v on MathVista-TestMini

commit 7f9fb6b
Author: Victor Fragoso <[email protected]>
Date:   Fri May 24 13:24:16 2024 +0000

    Adding Phi3v model.

commit 45c05b2
Author: kcz358 <[email protected]>
Date:   Thu May 23 03:47:36 2024 +0000

    Set printing info for llava_hf to debug level

commit 53f013e
Author: kcz358 <[email protected]>
Date:   Thu May 23 03:41:39 2024 +0000

    Fix pope random name in pope full

commit 22520a9
Author: kcz358 <[email protected]>
Date:   Thu May 23 03:41:14 2024 +0000

    Add separated pope tasks by category

commit d1eefb1
Author: kcz358 <[email protected]>
Date:   Thu May 9 08:36:02 2024 +0000

    Update gitignore

commit b2b4dbd
Author: kcz358 <[email protected]>
Date:   Mon May 20 07:45:11 2024 +0000

    Comment out Spice in caption task so that don't need to download stanford nlp model

commit 662f05c
Author: kcz358 <[email protected]>
Date:   Mon May 20 03:13:13 2024 +0000

    Comment out parse result in xcomposer

commit 0932932
Author: kcz358 <[email protected]>
Date:   Thu May 16 03:55:39 2024 +0000

    Fix instructblip qformer size mismatch and multi-images problem

commit 557a6a3
Author: kcz358 <[email protected]>
Date:   Thu May 16 03:11:41 2024 +0000

    Remove redundant code in fuyu

commit 6aeb550
Author: kcz358 <[email protected]>
Date:   Thu May 16 01:45:24 2024 +0000

    Fix idefics2 llava in the wild bugs

commit aea80e6
Author: kcz358 <[email protected]>
Date:   Wed May 15 11:07:35 2024 +0000

    Better task list_with_num

commit 3c12a08
Author: Li Bo <[email protected]>
Date:   Sat May 18 02:35:52 2024 +0800

    Update LICENSE

commit 82317a6
Author: Li Bo <[email protected]>
Date:   Sat May 18 02:29:09 2024 +0800

    Update LICENSE

commit a8bba1c
Author: Li Bo <[email protected]>
Date:   Sat May 18 02:28:03 2024 +0800

    Create LICENSE

commit caa5893
Merge: c094448 423b006
Author: Li Bo <[email protected]>
Date:   Mon May 13 11:45:26 2024 +0800

    Merge pull request #73 from EvolvingLMMs-Lab/kc/qwen_vl_api

    [Feat] Add qwen vl api

commit c094448
Author: kcz358 <[email protected]>
Date:   Sat May 11 06:11:19 2024 +0000

    Fix llava_hf image tokens number issue

commit 64f07e4
Author: kcz358 <[email protected]>
Date:   Thu May 9 02:04:10 2024 +0000

    Fix endless warning for llava_hf generation

commit 8aaa828
Author: Bo Li <[email protected]>
Date:   Thu May 2 06:13:56 2024 +0000

    Add model_name parameter to Llava constructor

commit 7847dc4
Author: kcz358 <[email protected]>
Date:   Tue May 7 03:15:59 2024 +0000

    Parse result for llava_hf 1.6

commit 3e56b4f
Author: kcz358 <[email protected]>
Date:   Tue May 7 03:09:56 2024 +0000

    Fix llava_hf generation for 1.6

commit fa3ff92
Author: kcz358 <[email protected]>
Date:   Mon May 6 08:32:57 2024 +0000

    Fix llava conv template for llama3

commit 423b006
Author: kcz358 <[email protected]>
Date:   Sun May 5 07:54:52 2024 +0000

    Add qwen vl api

commit b7fd7a9
Merge: 986139a c5a130b
Author: Li Bo <[email protected]>
Date:   Sun May 5 13:19:48 2024 +0800

    Merge pull request #59 from EvolvingLMMs-Lab/add_idefics2

    add idefics2

commit 986139a
Merge: b46239c 8d3526c
Author: Li Bo <[email protected]>
Date:   Fri May 3 01:18:18 2024 +0800

    Merge pull request #36 from cocoshe/main

    [Fix] repr llava doc

commit b46239c
Merge: bc69a74 373265f
Author: Li Bo <[email protected]>
Date:   Fri May 3 01:17:34 2024 +0800

    Merge pull request #56 from gagan3012/main

    Multilingual LLava bench

commit bc69a74
Merge: eef3aeb 626e8a9
Author: Li Bo <[email protected]>
Date:   Fri May 3 01:12:14 2024 +0800

    Merge pull request #70 from hunterheiden/hsh/new_task/WebSRC

    Bugfix: WebSRC should be token-level F1 NOT character-level

commit 626e8a9
Author: Hunter Heidenreich <[email protected]>
Date:   Thu May 2 09:31:03 2024 -0400

    Bugfix: WebSRC should be token-level F1 NOT character-level

commit eef3aeb
Merge: c4e9dd9 9bca441
Author: Li Bo <[email protected]>
Date:   Thu May 2 14:38:17 2024 +0800

    Merge pull request #69 from hunterheiden/hsh/new_task/WebSRC

    [New Task] WebSRC (multimodal Q&A on web screenshots)

commit 9bca441
Author: Hunter Heidenreich <[email protected]>
Date:   Wed May 1 11:07:29 2024 -0400

    Add code to enable compilation of submission for WebSRC test split

commit 7687495
Author: Hunter Heidenreich <[email protected]>
Date:   Wed May 1 10:47:32 2024 -0400

    Draft and validate websrc eval on dev split

commit 4eebd3e
Author: Hunter Heidenreich <[email protected]>
Date:   Wed May 1 10:46:54 2024 -0400

    Update main README with new task names

commit 35fe80b
Author: Hunter Heidenreich <[email protected]>
Date:   Wed May 1 10:46:20 2024 -0400

    Draft README for WebSRC

commit 955bd06
Author: Hunter Heidenreich <[email protected]>
Date:   Tue Apr 30 10:16:21 2024 -0400

    Init webSRC

commit c4e9dd9
Merge: d8a3a99 319afcc
Author: Li Bo <[email protected]>
Date:   Fri Apr 26 14:37:22 2024 +0800

    Merge pull request #63 from hunterheiden/hsh/new_task/screenspot

    New Task: ScreenSpot - Grounding (REC) and instruction generation (REG) on screens

commit 319afcc
Author: Hunter Heidenreich <[email protected]>
Date:   Thu Apr 25 11:44:34 2024 -0400

    slight update

commit 2f3811c
Author: Hunter Heidenreich <[email protected]>
Date:   Thu Apr 25 11:41:04 2024 -0400

    Add README file specific to ScreenSpot

commit 28962cb
Author: Hunter Heidenreich <[email protected]>
Date:   Wed Apr 24 11:52:33 2024 -0400

    Update README to reflect new tasks

commit e457cfb
Author: Hunter Heidenreich <[email protected]>
Date:   Tue Apr 23 18:33:16 2024 -0400

    Create ScreenSpot on clean branch

commit d8a3a99
Merge: 3dcd015 ed17129
Author: Li Bo <[email protected]>
Date:   Tue Apr 23 10:34:03 2024 +0800

    Merge pull request #61 from tupini07/patch-1

    Fix typo in Qwen-VL that was causing "reference before assignment"

commit ed17129
Author: Andrea Tupini <[email protected]>
Date:   Mon Apr 22 14:56:41 2024 -0600

    refactor query construction for clarity

commit cd87420
Author: Andrea Tupini <[email protected]>
Date:   Mon Apr 22 14:54:29 2024 -0600

    convert contexts to list if necessary and remove unnecessary construction of `questions`

commit 8557367
Author: Andrea Tupini <[email protected]>
Date:   Mon Apr 22 14:47:33 2024 -0600

    Fix typo in qwen_vl that was causing "reference before assignment"

commit 3dcd015
Merge: 95df9fe 743673a
Author: Li Bo <[email protected]>
Date:   Sat Apr 20 22:03:16 2024 +0800

    Merge pull request #60 from CaraJ7/main

    Add MathVerse

commit 743673a
Merge: c1a5472 95df9fe
Author: CaraJ7 <[email protected]>
Date:   Sat Apr 20 21:49:02 2024 +0800

    Merge branch 'main' of https://github.com/EvolvingLMMs-Lab/lmms-eval

commit c1a5472
Author: CaraJ7 <[email protected]>
Date:   Sat Apr 20 21:45:34 2024 +0800

    Add MathVerse

commit 373265f
Author: Gagan Bhatia <[email protected]>
Date:   Fri Apr 12 17:21:39 2024 -0700

    Add files via upload

commit d853051
Author: Gagan Bhatia <[email protected]>
Date:   Fri Apr 12 17:19:49 2024 -0700

    Create README.md

commit 8d3526c
Author: cocoshe <[email protected]>
Date:   Thu Mar 28 13:38:36 2024 +0800

    fix doc
commit 8f9d620
Author: Li Bo <[email protected]>
Date:   Sun Jun 23 14:02:25 2024 +0800

    Update pyproject.toml

commit 6341b7c
Merge: fce85f1 903b042
Author: Li Bo <[email protected]>
Date:   Sun Jun 23 14:02:02 2024 +0800

    Merge pull request #125 from EvolvingLMMs-Lab/dev/interleave

    [Model] aligned llava-interleave model results on video tasks

commit 903b042
Author: kcz358 <[email protected]>
Date:   Sat Jun 22 12:07:13 2024 +0000

    Remove unnecessary lines for video llava

commit d78ec86
Merge: ebe7217 fce85f1
Author: Li Bo <[email protected]>
Date:   Sat Jun 22 13:57:31 2024 +0800

    Merge branch 'main' into dev/interleave

commit ebe7217
Author: kcz358 <[email protected]>
Date:   Sat Jun 22 02:57:08 2024 +0000

    Delete unnecessary lines

commit 120c474
Author: kcz358 <[email protected]>
Date:   Fri Jun 21 08:38:41 2024 +0000

    Revise model registry for llava_hf and longva

commit 7d6201f
Author: kcz358 <[email protected]>
Date:   Fri Jun 21 08:38:24 2024 +0000

    Add longva

commit 12f4806
Author: kcz358 <[email protected]>
Date:   Fri Jun 21 08:35:39 2024 +0000

    Remove unnecessary lines since use batched visuals now in llava

commit 12cea76
Author: Bo Li <[email protected]>
Date:   Thu Jun 20 18:15:32 2024 +0000

    chore: Add loguru for logging in lmms_eval package

commit 8ef2474
Author: Bo Li <[email protected]>
Date:   Thu Jun 20 12:11:03 2024 +0000

    chore: Remove unused models from lmms_eval package

commit af38885
Author: Bo Li <[email protected]>
Date:   Thu Jun 20 12:07:09 2024 +0000

    chore: Handle ImportError when importing models

    Handle the ImportError exception when importing models in the lmms_eval package. This change adds a try-except block to catch the ImportError and print an error message indicating the failed import. This will help with troubleshooting and identifying any issues with the model imports.

commit fce85f1
Merge: dbe6329 d94f83c
Author: Li Bo <[email protected]>
Date:   Thu Jun 20 20:02:12 2024 +0800

    Merge pull request #120 from EvolvingLMMs-Lab/pufanyi/hf_dataset_docs

    Add docs for datasets upload to HF

commit dbe6329
Author: choiszt <[email protected]>
Date:   Thu Jun 20 15:14:21 2024 +0800

    update ablation for videomme datasets

commit d94f83c
Author: Li Bo <[email protected]>
Date:   Thu Jun 20 13:30:59 2024 +0800

    Update README.md

commit cab8159
Author: Li Bo <[email protected]>
Date:   Thu Jun 20 13:30:29 2024 +0800

    Update README.md

commit 4587665
Author: kcz358 <[email protected]>
Date:   Thu Jun 20 03:55:30 2024 +0000

    Add llava_hf back to registry

commit 3463651
Author: kcz358 <[email protected]>
Date:   Thu Jun 20 03:54:33 2024 +0000

    Remove handling non-visual loop in llava

commit cb0d3f4
Author: Fanyi Pu <[email protected]>
Date:   Thu Jun 20 02:11:18 2024 +0800

    update readme

commit 813877b
Author: Fanyi Pu <[email protected]>
Date:   Wed Jun 19 15:37:52 2024 +0800

    to sh script

commit a14684b
Author: Fanyi Pu <[email protected]>
Date:   Wed Jun 19 15:37:04 2024 +0800

    lint

commit d0f8851
Author: Fanyi Pu <[email protected]>
Date:   Wed Jun 19 15:36:48 2024 +0800

    small fix

commit 63748e9
Author: Fanyi Pu <[email protected]>
Date:   Wed Jun 19 15:36:43 2024 +0800

    small fix

commit 7f1159a
Author: Fanyi Pu <[email protected]>
Date:   Wed Jun 19 15:35:05 2024 +0800

    update preparation

commit 19f9bd6
Author: Fanyi Pu <[email protected]>
Date:   Wed Jun 19 15:23:24 2024 +0800

    docs

commit ce6f889
Author: Fanyi Pu <[email protected]>
Date:   Wed Jun 19 15:04:16 2024 +0800

    tutorial

commit f513c52
Author: Bo Li <[email protected]>
Date:   Wed Jun 19 06:51:19 2024 +0000

    chore: Update dependencies to fix potential risks and improve compatibility

commit efb5295
Author: kcz358 <[email protected]>
Date:   Wed Jun 19 10:25:58 2024 +0800

    Release llava-wilder

commit 742651f
Author: Fanyi Pu <[email protected]>
Date:   Wed Jun 19 07:44:26 2024 +0800

    feat: Add support for auto downloading tar format videos

commit 511b625
Merge: 22a4958 050b2c3
Author: Bo Li <[email protected]>
Date:   Tue Jun 18 17:01:03 2024 +0000

    Merge branch 'main' of https://github.com/EvolvingLMMs-Lab/lmms-eval

commit 050b2c3
Merge: 74facb4 ef30651
Author: Li Bo <[email protected]>
Date:   Tue Jun 18 13:13:38 2024 +0800

    Merge pull request #114 from zjysteven/add-tinyllava

    add tinyllava

commit ef30651
Author: Jingyang Zhang <[email protected]>
Date:   Mon Jun 17 17:57:02 2024 -0400

    fix typo

commit 9bab677
Merge: dbfb238 74facb4
Author: Jingyang Zhang <[email protected]>
Date:   Sun Jun 16 10:56:05 2024 -0400

    Merge branch 'EvolvingLMMs-Lab:main' into add-tinyllava

commit 74facb4
Merge: 8ba192f d5df72d
Author: Li Bo <[email protected]>
Date:   Sun Jun 16 17:59:19 2024 +0800

    Merge pull request #118 from teowu/main

    Fix the potential risk by PR #117

commit d5df72d
Merge: 5bf59ed 8ba192f
Author: Teo (Timothy) Wu Haoning <[email protected]>
Date:   Sun Jun 16 15:32:13 2024 +0800

    Merge branch 'EvolvingLMMs-Lab:main' into main

commit 5bf59ed
Author: teowu <[email protected]>
Date:   Sun Jun 16 07:27:28 2024 +0000

    fix #117, allow auto download with tar format videos

commit 98b3955
Merge: a056f11 be9dada
Author: teowu <[email protected]>
Date:   Sun Jun 16 07:25:07 2024 +0000

    Merge branch 'main' of https://github.com/teowu/lmms-eval into main

commit a056f11
Author: teowu <[email protected]>
Date:   Sun Jun 16 07:23:54 2024 +0000

    fix #117, allow auto download with tar format videos

commit 8ba192f
Merge: 7cc2890 be9dada
Author: Li Bo <[email protected]>
Date:   Sat Jun 15 17:30:59 2024 +0800

    Merge pull request #117 from teowu/main

    LongVideoBench for LMMs-Eval

commit be9dada
Merge: 62ea8ce 7cc2890
Author: Teo (Timothy) Wu Haoning <[email protected]>
Date:   Sat Jun 15 16:39:20 2024 +0800

    Merge pull request #1 from EvolvingLMMs-Lab/main

    Merge pull request #113 from teowu/main

commit 62ea8ce
Author: teowu <[email protected]>
Date:   Sat Jun 15 08:30:11 2024 +0000

    LongVideoBench support: image LMMs (idefics2, phi3) and video LMMs (LLaVA-Next-Video-34B)

commit 7cc2890
Merge: 4bc7224 ea14cd4
Author: Li Bo <[email protected]>
Date:   Sat Jun 15 14:10:22 2024 +0800

    Merge pull request #113 from teowu/main

    Q-Bench, Q-Bench2, A-Bench

commit dbfb238
Author: Jingyang <[email protected]>
Date:   Fri Jun 14 16:20:42 2024 -0400

    add tinyllava

commit ea14cd4
Author: teowu <[email protected]>
Date:   Fri Jun 14 15:01:52 2024 +0000

    Add qbench, qbench2, abench; fix phi3v as its current implementation does not support multi-image

commit 4bc7224
Merge: 2797987 bf14cb8
Author: Li Bo <[email protected]>
Date:   Fri Jun 14 02:14:43 2024 +0800

    Merge pull request #111 from XinrunDu/main

    add II-Bench

commit bf14cb8
Author: XinrunDu <[email protected]>
Date:   Thu Jun 13 09:37:02 2024 +0000

    fix dataset_path

commit 6248113
Author: XinrunDu <[email protected]>
Date:   Thu Jun 13 09:32:06 2024 +0000

    add II-Bench

commit 2797987
Merge: 63d82f1 66d4bb2
Author: Li Bo <[email protected]>
Date:   Thu Jun 13 11:14:47 2024 +0800

    Merge pull request #109 from EvolvingLMMs-Lab/pufanyi/update_version

    [Small Update] Update the version of LMMs-Eval

commit 66d4bb2
Author: Fanyi Pu <[email protected]>
Date:   Thu Jun 13 11:13:00 2024 +0800

    update version

commit 63d82f1
Author: Li Bo <[email protected]>
Date:   Thu Jun 13 11:04:32 2024 +0800

    Update README.md

commit 44a3379
Merge: 5ed0035 0ce46d0
Author: Li Bo <[email protected]>
Date:   Thu Jun 13 04:00:12 2024 +0800

    Merge pull request #105 from tianyu-z/main

    Include VCR

commit 0ce46d0
Author: Suyuchen <[email protected]>
Date:   Wed Jun 12 15:56:34 2024 -0400

    update README.md

commit 46a88d8
Merge: 47b13b9 5ed0035
Author: Suyuchen <[email protected]>
Date:   Wed Jun 12 15:50:26 2024 -0400

    merged readme.md

commit 47b13b9
Author: Suyuchen <[email protected]>
Date:   Wed Jun 12 15:30:52 2024 -0400

    update aggregation function for vcr_wiki

commit 5ed0035
Author: Li Bo <[email protected]>
Date:   Thu Jun 13 03:21:42 2024 +0800

    Update README.md

commit ed88068
Author: Li Bo <[email protected]>
Date:   Thu Jun 13 03:13:59 2024 +0800

    Update README.md

commit fea3806
Merge: d99a24a 05dc8e8
Author: Li Bo <[email protected]>
Date:   Thu Jun 13 03:11:49 2024 +0800

    Merge pull request #108 from EvolvingLMMs-Lab/internal_main_dev

    [Upgrade to v0.2] Embracing Video Evaluations with LMMs-Eval

commit 05dc8e8
Author: Bo Li <[email protected]>
Date:   Wed Jun 12 15:56:04 2024 +0000

    chore: Update lmms-eval to support video evaluations for LLaVA models

commit cbeee20
Author: Bo Li <[email protected]>
Date:   Wed Jun 12 15:50:30 2024 +0000

    chore: Update lmms-eval to support video evaluations for LLaVA models

commit f00d549
Author: Bo Li <[email protected]>
Date:   Wed Jun 12 15:46:33 2024 +0000

    Update image alignment in README.md

commit 3415633
Author: Bo Li <[email protected]>
Date:   Wed Jun 12 15:43:16 2024 +0000

    Update llava conv_template in lmms_eval/models/llava.py

commit 50575a9
Author: Bo Li <[email protected]>
Date:   Wed Jun 12 15:39:03 2024 +0000

    chore: Update lmms-eval to support video evaluations for LLaVA models

commit c9b2252
Author: Bo Li <[email protected]>
Date:   Wed Jun 12 15:33:48 2024 +0000

    Bump version to 0.2.0.dev0

commit 465bd42
Merge: e43bd84 d99a24a
Author: Bo Li <[email protected]>
Date:   Wed Jun 12 15:04:25 2024 +0000

    Merge branch 'main' of https://github.com/EvolvingLMMs-Lab/lmms-eval into internal_main_dev

commit e43bd84
Author: Bo Li <[email protected]>
Date:   Wed Jun 12 14:54:06 2024 +0000

    chore: Remove unnecessary files and code related to live_bench and sft_eval tasks

commit d99a24a
Merge: 374590b a66003b
Author: Li Bo <[email protected]>
Date:   Wed Jun 12 19:45:57 2024 +0800

    Merge pull request #107 from AtsuMiyai/new_task/upd_update

    update gpt-3.5-turbo version

commit a66003b
Author: AtsuMiyai <[email protected]>
Date:   Wed Jun 12 17:05:17 2024 +0900

    update gpt-3.5-turbo version

commit ee91f27
Author: AtsuMiyai <[email protected]>
Date:   Wed Jun 12 16:50:53 2024 +0900

    update gpt-3.5-turbo version

commit 326b969
Author: tianyu-z <[email protected]>
Date:   Mon Jun 10 20:07:40 2024 -0400

    include std and confidence interval

commit cd050d4
Author: Suyuchen <[email protected]>
Date:   Mon Jun 10 18:49:47 2024 -0400

    update vcr_wiki tasks in README.md

commit 205721e
Author: Suyuchen <[email protected]>
Date:   Mon Jun 10 18:43:15 2024 -0400

    update vcr_wiki tasks

commit db8e718
Author: tianyu-z <[email protected]>
Date:   Mon Jun 10 16:13:58 2024 -0400

    include the try-except logic for spacy

commit 427dabb
Author: Suyuchen <[email protected]>
Date:   Mon Jun 10 15:51:05 2024 -0400

    add crossed_text to vcr_wiki output

commit 043b483
Author: tianyu-z <[email protected]>
Date:   Mon Jun 10 15:47:00 2024 -0400

    switch logic

commit e1f04db
Author: tianyu-z <[email protected]>
Date:   Mon Jun 10 02:38:21 2024 -0400

    modify the form of VCR

commit 96e8d98
Author: tianyu-z <[email protected]>
Date:   Mon Jun 10 00:10:30 2024 -0400

    init include vcr

commit 374590b
Merge: 504685e cb3b9ce
Author: Kaichen Zhang - NTU <[email protected]>
Date:   Fri Jun 7 20:25:48 2024 +0800

    Merge pull request #101 from Gumpest/main

    Update conbench in README

commit 504685e
Author: Li Bo <[email protected]>
Date:   Thu Jun 6 15:42:15 2024 +0800

    Update README.md

commit cb3b9ce
Merge: c9793b3 67b64ea
Author: Yuan Zhang <[email protected]>
Date:   Thu Jun 6 11:22:24 2024 +0800

    Merge branch 'EvolvingLMMs-Lab:main' into main

commit c9793b3
Author: Yuan Zhang <[email protected]>
Date:   Thu Jun 6 11:21:05 2024 +0800

    update README

commit 67b64ea
Merge: 8ee7848 5fd6845
Author: Li Bo <[email protected]>
Date:   Wed Jun 5 23:12:58 2024 +0800

    Merge pull request #100 from Gumpest/main

    add Conbench

commit 5fd6845
Author: Yuan Zhang <[email protected]>
Date:   Wed Jun 5 21:52:31 2024 +0800

    add conbench

commit 8ee7848
Merge: 747e197 6fefaf7
Author: Li Bo <[email protected]>
Date:   Tue Jun 4 17:09:33 2024 +0800

    Merge pull request #95 from AtsuMiyai/new_task/upd

    add MM-UPD

commit 747e197
Merge: 4854a34 0584307
Author: Li Bo <[email protected]>
Date:   Tue Jun 4 17:09:04 2024 +0800

    Merge pull request #97 from CaraJ7/update

    Add MathVerse in README.md

commit 6fefaf7
Author: AtsuMiyai <[email protected]>
Date:   Tue Jun 4 17:36:39 2024 +0900

    update utils.py for leaderboard submission

commit 5f4fe36
Author: AtsuMiyai <[email protected]>
Date:   Sun Jun 2 23:28:27 2024 +0900

    slightly change query_prompt for the reproduction

commit 0584307
Author: CaraJ7 <[email protected]>
Date:   Sun Jun 2 17:05:28 2024 +0800

    Add MathVerse in README.md

commit 0581ab3
Author: AtsuMiyai <[email protected]>
Date:   Fri May 31 16:09:45 2024 +0900

    merge model_specific_prompt_kwargs and dataset_name into each task yaml

commit 4854a34
Author: Pu Fanyi <[email protected]>
Date:   Sat May 4 19:23:39 2024 +0800

    Group MMMU images into one image (#83)

    * update

    * update font

    * Add matplotlib.font_manager import in utils.py

    * Refactor font handling in add_order_label function in utils.py

    * group mmmu

    ---------

    Co-authored-by: Li Bo <[email protected]>

commit d224794
Author: AtsuMiyai <[email protected]>
Date:   Wed May 29 15:15:59 2024 +0900

    add upd

commit 453e793
Author: AtsuMiyai <[email protected]>
Date:   Wed May 29 15:03:30 2024 +0900

    add upd

commit 909edd6
Author: AtsuMiyai <[email protected]>
Date:   Wed May 29 12:52:21 2024 +0900

    add upd

commit 7c1ac97
Author: AtsuMiyai <[email protected]>
Date:   Wed May 29 12:50:32 2024 +0900

    add upd

commit 811301c
Author: AtsuMiyai <[email protected]>
Date:   Wed May 29 12:46:58 2024 +0900

    add upd

commit 71401ba
Author: AtsuMiyai <[email protected]>
Date:   Wed May 29 12:41:21 2024 +0900

    add upd

commit 24dc435
Author: Bo Li <[email protected]>
Date:   Mon May 27 10:17:32 2024 +0000

    fix compatibility issue of older version llava

commit 616edf4
Author: Bo Li <[email protected]>
Date:   Mon May 27 09:32:26 2024 +0000

    [Fix] import issues of multilingual llava and olympiadbench

commit 4c5a99e
Merge: 45c05b2 b05c3e2
Author: Li Bo <[email protected]>
Date:   Mon May 27 14:19:53 2024 +0800

    Merge pull request #87 from vfragoso/vifragos/phi3v

    Adding microsoft/Phi-3-vision-128k-instruct model.

commit b05c3e2
Author: Victor Fragoso <[email protected]>
Date:   Fri May 24 16:36:37 2024 +0000

    Adding documentation of Phi3v class.

commit c200897
Author: Victor Fragoso <[email protected]>
Date:   Fri May 24 16:25:02 2024 +0000

    Adding prompt arguments for Phi3v on MathVista-TestMini

commit 7f9fb6b
Author: Victor Fragoso <[email protected]>
Date:   Fri May 24 13:24:16 2024 +0000

    Adding Phi3v model.

commit 45c05b2
Author: kcz358 <[email protected]>
Date:   Thu May 23 03:47:36 2024 +0000

    Set printing info for llava_hf to debug level

commit 53f013e
Author: kcz358 <[email protected]>
Date:   Thu May 23 03:41:39 2024 +0000

    Fix pope random name in pope full

commit 22520a9
Author: kcz358 <[email protected]>
Date:   Thu May 23 03:41:14 2024 +0000

    Add separated pope tasks by category

commit d1eefb1
Author: kcz358 <[email protected]>
Date:   Thu May 9 08:36:02 2024 +0000

    Update gitignore

commit b2b4dbd
Author: kcz358 <[email protected]>
Date:   Mon May 20 07:45:11 2024 +0000

    Comment out Spice in caption task so that don't need to download stanford nlp model

commit 662f05c
Author: kcz358 <[email protected]>
Date:   Mon May 20 03:13:13 2024 +0000

    Comment out parse result in xcomposer

commit 0932932
Author: kcz358 <[email protected]>
Date:   Thu May 16 03:55:39 2024 +0000

    Fix instructblip qformer size mismatch and multi-images problem

commit 557a6a3
Author: kcz358 <[email protected]>
Date:   Thu May 16 03:11:41 2024 +0000

    Remove redundant code in fuyu

commit 6aeb550
Author: kcz358 <[email protected]>
Date:   Thu May 16 01:45:24 2024 +0000

    Fix idefics2 llava in the wild bugs

commit aea80e6
Author: kcz358 <[email protected]>
Date:   Wed May 15 11:07:35 2024 +0000

    Better task list_with_num

commit 3c12a08
Author: Li Bo <[email protected]>
Date:   Sat May 18 02:35:52 2024 +0800

    Update LICENSE

commit 82317a6
Author: Li Bo <[email protected]>
Date:   Sat May 18 02:29:09 2024 +0800

    Update LICENSE

commit a8bba1c
Author: Li Bo <[email protected]>
Date:   Sat May 18 02:28:03 2024 +0800

    Create LICENSE

commit caa5893
Merge: c094448 423b006
Author: Li Bo <[email protected]>
Date:   Mon May 13 11:45:26 2024 +0800

    Merge pull request #73 from EvolvingLMMs-Lab/kc/qwen_vl_api

    [Feat] Add qwen vl api

commit c094448
Author: kcz358 <[email protected]>
Date:   Sat May 11 06:11:19 2024 +0000

    Fix llava_hf image tokens number issue

commit 64f07e4
Author: kcz358 <[email protected]>
Date:   Thu May 9 02:04:10 2024 +0000

    Fix endless warning for llava_hf generation

commit 8aaa828
Author: Bo Li <[email protected]>
Date:   Thu May 2 06:13:56 2024 +0000

    Add model_name parameter to Llava constructor

commit 7847dc4
Author: kcz358 <[email protected]>
Date:   Tue May 7 03:15:59 2024 +0000

    Parse result for llava_hf 1.6

commit 3e56b4f
Author: kcz358 <[email protected]>
Date:   Tue May 7 03:09:56 2024 +0000

    Fix llava_hf generation for 1.6

commit fa3ff92
Author: kcz358 <[email protected]>
Date:   Mon May 6 08:32:57 2024 +0000

    Fix llava conv template for llama3

commit 423b006
Author: kcz358 <[email protected]>
Date:   Sun May 5 07:54:52 2024 +0000

    Add qwen vl api

commit b7fd7a9
Merge: 986139a c5a130b
Author: Li Bo <[email protected]>
Date:   Sun May 5 13:19:48 2024 +0800

    Merge pull request #59 from EvolvingLMMs-Lab/add_idefics2

    add idefics2

commit 986139a
Merge: b46239c 8d3526c
Author: Li Bo <[email protected]>
Date:   Fri May 3 01:18:18 2024 +0800

    Merge pull request #36 from cocoshe/main

    [Fix] repr llava doc

commit b46239c
Merge: bc69a74 373265f
Author: Li Bo <[email protected]>
Date:   Fri May 3 01:17:34 2024 +0800

    Merge pull request #56 from gagan3012/main

    Multilingual LLava bench

commit bc69a74
Merge: eef3aeb 626e8a9
Author: Li Bo <[email protected]>
Date:   Fri May 3 01:12:14 2024 +0800

    Merge pull request #70 from hunterheiden/hsh/new_task/WebSRC

    Bugfix: WebSRC should be token-level F1 NOT character-level

commit 626e8a9
Author: Hunter Heidenreich <[email protected]>
Date:   Thu May 2 09:31:03 2024 -0400

    Bugfix: WebSRC should be token-level F1 NOT character-level

commit eef3aeb
Merge: c4e9dd9 9bca441
Author: Li Bo <[email protected]>
Date:   Thu May 2 14:38:17 2024 +0800

    Merge pull request #69 from hunterheiden/hsh/new_task/WebSRC

    [New Task] WebSRC (multimodal Q&A on web screenshots)

commit 9bca441
Author: Hunter Heidenreich <[email protected]>
Date:   Wed May 1 11:07:29 2024 -0400

    Add code to enable compilation of submission for WebSRC test split

commit 7687495
Author: Hunter Heidenreich <[email protected]>
Date:   Wed May 1 10:47:32 2024 -0400

    Draft and validate websrc eval on dev split

commit 4eebd3e
Author: Hunter Heidenreich <[email protected]>
Date:   Wed May 1 10:46:54 2024 -0400

    Update main README with new task names

commit 35fe80b
Author: Hunter Heidenreich <[email protected]>
Date:   Wed May 1 10:46:20 2024 -0400

    Draft README for WebSRC

commit 955bd06
Author: Hunter Heidenreich <[email protected]>
Date:   Tue Apr 30 10:16:21 2024 -0400

    Init webSRC

commit c4e9dd9
Merge: d8a3a99 319afcc
Author: Li Bo <[email protected]>
Date:   Fri Apr 26 14:37:22 2024 +0800

    Merge pull request #63 from hunterheiden/hsh/new_task/screenspot

    New Task: ScreenSpot - Grounding (REC) and instruction generation (REG) on screens

commit 319afcc
Author: Hunter Heidenreich <[email protected]>
Date:   Thu Apr 25 11:44:34 2024 -0400

    slight update

commit 2f3811c
Author: Hunter Heidenreich <[email protected]>
Date:   Thu Apr 25 11:41:04 2024 -0400

    Add README file specific to ScreenSpot

commit 28962cb
Author: Hunter Heidenreich <[email protected]>
Date:   Wed Apr 24 11:52:33 2024 -0400

    Update README to reflect new tasks

commit e457cfb
Author: Hunter Heidenreich <[email protected]>
Date:   Tue Apr 23 18:33:16 2024 -0400

    Create ScreenSpot on clean branch

commit d8a3a99
Merge: 3dcd015 ed17129
Author: Li Bo <[email protected]>
Date:   Tue Apr 23 10:34:03 2024 +0800

    Merge pull request #61 from tupini07/patch-1

    Fix typo in Qwen-VL that was causing "reference before assignment"

commit ed17129
Author: Andrea Tupini <[email protected]>
Date:   Mon Apr 22 14:56:41 2024 -0600

    refactor query construction for clarity

commit cd87420
Author: Andrea Tupini <[email protected]>
Date:   Mon Apr 22 14:54:29 2024 -0600

    convert contexts to list if necessary and remove unnecessary construction of `questions`

commit 8557367
Author: Andrea Tupini <[email protected]>
Date:   Mon Apr 22 14:47:33 2024 -0600

    Fix typo in qwen_vl that was causing "reference before assignment"

commit 3dcd015
Merge: 95df9fe 743673a
Author: Li Bo <[email protected]>
Date:   Sat Apr 20 22:03:16 2024 +0800

    Merge pull request #60 from CaraJ7/main

    Add MathVerse

commit 743673a
Merge: c1a5472 95df9fe
Author: CaraJ7 <[email protected]>
Date:   Sat Apr 20 21:49:02 2024 +0800

    Merge branch 'main' of https://github.com/EvolvingLMMs-Lab/lmms-eval

commit c1a5472
Author: CaraJ7 <[email protected]>
Date:   Sat Apr 20 21:45:34 2024 +0800

    Add MathVerse

commit 373265f
Author: Gagan Bhatia <[email protected]>
Date:   Fri Apr 12 17:21:39 2024 -0700

    Add files via upload

commit d853051
Author: Gagan Bhatia <[email protected]>
Date:   Fri Apr 12 17:19:49 2024 -0700

    Create README.md

commit 22a4958
Author: Bo Li <[email protected]>
Date:   Thu Apr 4 17:12:43 2024 +0000

    [WIP] adding mmbench dev evaluation (#75)

    * WIP

    * Update GPT evaluation model name and sys prompt

    * 🛠️ Scale accuracy to percentage

    The accuracy value is now multiplied by 100 in the aggregation function to represent it as a percentage. Regarding the evaluation process, `math` module importation and refactoring reduce progress log verbosity by logging every 100 evaluations instead of 10. It prevents potential logging overflow. Handling of NaN values is added to ensure 'default_value' is set in case of missing data, avoiding errors in split, category, and l2-category assignments. Finally, reporting of categorical and l2-categorical accuracies is streamlined through a new `calculate_hit_rates` function, improving code readability and maintenance.

    Issue refs: #1427, #1533

    * Update GPT evaluation model name and API configuration

    * Refactor MMBench_Evaluator class to handle missing columns

    * Add print statements for detailed results in MMBench-CN(CC), MMBench-CN(Dev), and MMBench-EN(Dev) evaluations

    * Refactor MMBench-CN and MMBench-EN evaluation functions

    * 🔄 Refactor result processing and logging logic

    - Simplified the result processing functions across different utility modules (`cc_utils.py`, `cn_utils.py`, `en_utils.py`) to unify the handling of multiple-choice options. Now, all options ("A" to "E") are dynamically added to the result data, and default to "nan" if not provided in the document.
    - Removed redundant keys directly from the process results dict creation to avoid clutter and align with the new dynamic addition of options.
    - In `mmbench_evals.py`, removed the unnecessary check for all splits being 'dev' and streamlined the evaluation loop by eliminating the progress bar (tqdm) for a cleaner log output.
    - Commented-out code and verbose logging during evaluation, which may have interfered with performance, has been removed for a more efficient and less intrusive logging experience.

    This cleanup reduces redundancy in the codebase and improves evaluation performance.

    Refs #2045

    ---------

    Co-authored-by: Bo Li <[email protected]>
    (cherry picked from commit a19278c)

commit 8d3526c
Author: cocoshe <[email protected]>
Date:   Thu Mar 28 13:38:36 2024 +0800

    fix doc
* init live bench

* update path

* chore: Refactor live_bench package structure and update dependencies

* update

* Merge remote-tracking branch 'origin/internal_main_dev'

* Refactor live_bench package structure and update dependencies

* Refactor live_bench package structure and update dependencies

* Fix execution count in example.ipynb

* extract_infomation

* Refactor extract_infomation.py to improve text extraction from HTML

* fix

* fix

* extract infomation

* chore: Refactor extract_infomation.py for improved readability and maintainability

* chore: Refactor data_generator prompt.md and check_prompt.md for improved clarity and instructions

* lint

* update

* update prompt

* extract infomation

* add info

* lint

* update

* filter

* update version of live_bench

* Update model version to gemini-1.5-pro

* update

* livebench_eval

* livebench

* update
* small fix

* lint
* fix doc

* [WIP] adding mmbench dev evaluation (#75)

* WIP

* Update GPT evaluation model name and sys prompt

* 🛠️ Scale accuracy to percentage

The accuracy value is now multiplied by 100 in the aggregation function to represent it as a percentage. Regarding the evaluation process, `math` module importation and refactoring reduce progress log verbosity by logging every 100 evaluations instead of 10. It prevents potential logging overflow. Handling of NaN values is added to ensure 'default_value' is set in case of missing data, avoiding errors in split, category, and l2-category assignments. Finally, reporting of categorical and l2-categorical accuracies is streamlined through a new `calculate_hit_rates` function, improving code readability and maintenance.

Issue refs: #1427, #1533

* Update GPT evaluation model name and API configuration

* Refactor MMBench_Evaluator class to handle missing columns

* Add print statements for detailed results in MMBench-CN(CC), MMBench-CN(Dev), and MMBench-EN(Dev) evaluations

* Refactor MMBench-CN and MMBench-EN evaluation functions

* 🔄 Refactor result processing and logging logic

- Simplified the result processing functions across different utility modules (`cc_utils.py`, `cn_utils.py`, `en_utils.py`) to unify the handling of multiple-choice options. Now, all options ("A" to "E") are dynamically added to the result data, and default to "nan" if not provided in the document.
- Removed redundant keys directly from the process results dict creation to avoid clutter and align with the new dynamic addition of options.
- In `mmbench_evals.py`, removed the unnecessary check for all splits being 'dev' and streamlined the evaluation loop by eliminating the progress bar (tqdm) for a cleaner log output.
- Commented-out code and verbose logging during evaluation, which may have interfered with performance, has been removed for a more efficient and less intrusive logging experience.

This cleanup reduces redundancy in the codebase and improves evaluation performance.

Refs #2045

---------

Co-authored-by: Bo Li <[email protected]>
(cherry picked from commit a19278c)

* Create README.md

* Add files via upload

* Add MathVerse

* Fix typo in qwen_vl that was causing "reference before assignment"

* convert contexts to list if necessary and remove unnecessary construction of `questions`

* refactor query construction for clarity

* Create ScreenSpot on clean branch

* Update README to reflect new tasks

* Add README file specific to ScreenSpot

* slight update

* Init webSRC

* Draft README for WebSRC

* Update main README with new task names

* Draft and validate websrc eval on dev split

* Add code to enable compilation of submission for WebSRC test split

* Bugfix: WebSRC should be token-level F1 NOT character-level

* Add qwen vl api

* Fix llava conv template for llama3

* Fix llava_hf generation for 1.6

* Parse result for llava_hf 1.6

* Add model_name parameter to Llava constructor

* Fix endless warning for llava_hf generation

* Fix llava_hf image tokens number issue

* Create LICENSE

* Update LICENSE

* Update LICENSE

* Better task list_with_num

* Fix idefics2 llava in the wild bugs

* Remove redundant code in fuyu

* Fix instructblip qformer size mismatch and multi-images problem

* Comment out parse result in xcomposer

* Comment out Spice in caption task so that don't need to download stanford nlp model

* Update gitignore

* Add separated pope tasks by category

* Fix pope random name in pope full

* Set printing info for llava_hf to debug level

* Adding Phi3v model.

* Adding prompt arguments for Phi3v on MathVista-TestMini

* Adding documentation of Phi3v class.

* [Fix] import issues of multilingual llava and olympiadbench

* fix compatibility issue of older version llava

* add upd

* add upd

* add upd

* add upd

* add upd

* add upd

* Group MMMU images into one image (#83)

* update

* update font

* Add matplotlib.font_manager import in utils.py

* Refactor font handling in add_order_label function in utils.py

* group mmmu

---------

Co-authored-by: Li Bo <[email protected]>

* merge model_specific_prompt_kwargs and dataset_name into each task yaml

* Add MathVerse in README.md

* slightly change query_prompt for the reproduction

* update utils.py for leaderboard submission

* add conbench

* update README

* Update README.md

* init include vcr

* modify the form of VCR

* switch logic

* add crossed_text to vcr_wiki output

* include the try-except logic for spacy

* update vcr_wiki tasks

* update vcr_wiki tasks in README.md

* include std and confidence interval

* update gpt-3.5-turbo version

* update gpt-3.5-turbo version

* chore: Remove unnecessary files and code related to live_bench and sft_eval tasks

* Bump version to 0.2.0.dev0

* chore: Update lmms-eval to support video evaluations for LLaVA models

* Update llava conv_template in lmms_eval/models/llava.py

* Update image alignment in README.md

* chore: Update lmms-eval to support video evaluations for LLaVA models

* chore: Update lmms-eval to support video evaluations for LLaVA models

* Update README.md

* Update README.md

* update aggregation function for vcr_wiki

* update README.md

* Update README.md

* update version

* add II-Bench

* fix dataset_path

* Add qbench, qbench2, abench; fix phi3v as its current implementation does not support multi-image

* add tinyllava

* LongVideoBench support: image LMMs (idefics2, phi3) and video LMMs (LLaVA-Next-Video-34B)

* fix #117, allow auto download with tar format videos

* fix #117, allow auto download with tar format videos

* fix typo

* feat: Add support for auto downloading tar format videos

* Release llava-wilder

* chore: Update dependencies to fix potential risks and improve compatibility

* tutorial

* docs

* update preparation

* small fix

* small fix

* lint

* to sh script

* update readme

* Remove handling non-visual loop in llava

* Add llava_hf back to registry

* Update README.md

* Update README.md

* update ablation for videomme datasets

* chore: Handle ImportError when importing models

Handle the ImportError exception when importing models in the lmms_eval package. This change adds a try-except block to catch the ImportError and print an error message indicating the failed import. This will help with troubleshooting and identifying any issues with the model imports.

* chore: Remove unused models from lmms_eval package

* feat: Allow loading model configurations from other packages

* feat: Allow including external tasks from plugins

* chore: Add loguru for logging in lmms_eval package

* Remove unnecessary lines since use batched visuals now in llava

* Add longva

* Revise model registry for llava_hf and longva

* Delete unnecessary lines

* Remove unnecessary lines for video llava

* Update pyproject.toml

* Update activitynetqa_generation.yaml

* Fix vid mme post prompt issue

* Add wild vision 0617

* Hardcode to keep image for wild vision

* Fixing scoring logic

* Fixing dataset name

* Fixing handling None filtered score

---------

Co-authored-by: cocoshe <[email protected]>
Co-authored-by: Bo Li <[email protected]>
Co-authored-by: Gagan Bhatia <[email protected]>
Co-authored-by: CaraJ7 <[email protected]>
Co-authored-by: Li Bo <[email protected]>
Co-authored-by: Andrea Tupini <[email protected]>
Co-authored-by: Hunter Heidenreich <[email protected]>
Co-authored-by: Victor Fragoso <[email protected]>
Co-authored-by: AtsuMiyai <[email protected]>
Co-authored-by: Pu Fanyi <[email protected]>
Co-authored-by: Yuan Zhang <[email protected]>
Co-authored-by: Yuan Zhang <[email protected]>
Co-authored-by: tianyu-z <[email protected]>
Co-authored-by: Suyuchen <[email protected]>
Co-authored-by: XinrunDu <[email protected]>
Co-authored-by: teowu <[email protected]>
Co-authored-by: Jingyang <[email protected]>
Co-authored-by: Teo (Timothy) Wu Haoning <[email protected]>
Co-authored-by: choiszt <[email protected]>
Co-authored-by: Lorenzo Mammana <[email protected]>
* feat: Update LMMS evaluation configuration and models

- Update `activitynetqa_generation.yaml` to remove `dataset_name` field and update `task` field to "activitynetqa"
- Update `utils.py` to add default values for `API_URL` and `API_KEY` when `API_TYPE` is not "openai" or "azure"
- Update `batch_gpt4.py` and `gpt4v.py` to rename `max_frames_for_video` parameter to `max_frames_num`
- Update `reka.py` to rename `max_frames_for_video` parameter to `max_frames_num` and add support for `continual_mode` with a persistent response cache

This commit updates the LMMS evaluation configuration and models to improve compatibility and add new features.

* Update LMMS evaluation configuration and models

* Update LMMS evaluation configuration and models

* feat: Update LMMS evaluation configuration and models

- Update `activitynetqa_generation.yaml` to remove `dataset_name` field and update `task` field to "activitynetqa"
- Update `utils.py` to add default values for `API_URL` and `API_KEY` when `API_TYPE` is not "openai" or "azure"
- Update `batch_gpt4.py` and `gpt4v.py` to rename `max_frames_for_video` parameter to `max_frames_num`
- Update `reka.py` to rename `max_frames_for_video` parameter to `max_frames_num` and add support for `continual_mode` with a persistent response cache

This commit updates the LMMS evaluation configuration and models to improve compatibility and add new features.

* Refactor error handling in GPT4V model evaluation

* Refactor error handling in GPT4V model evaluation

* Refactor video decoding backend to use "decord" instead of "pyav"

* Refactor image aspect ratio handling in Llava_OneVision model

* Refactor GPT4V model to fix bug in visuals encoding

* add exception for azure gpt
* chore: Update lmms-eval to support video evaluations for LLaVA models

* lint
* internvl2

* fix some bugs

* fix

* lint
* feat: Update LMMS evaluation configuration and models

- Update `activitynetqa_generation.yaml` to remove `dataset_name` field and update `task` field to "activitynetqa"
- Update `utils.py` to add default values for `API_URL` and `API_KEY` when `API_TYPE` is not "openai" or "azure"
- Update `batch_gpt4.py` and `gpt4v.py` to rename `max_frames_for_video` parameter to `max_frames_num`
- Update `reka.py` to rename `max_frames_for_video` parameter to `max_frames_num` and add support for `continual_mode` with a persistent response cache

This commit updates the LMMS evaluation configuration and models to improve compatibility and add new features.

* Update LMMS evaluation configuration and models

* Update LMMS evaluation configuration and models

* feat: Update LMMS evaluation configuration and models

- Update `activitynetqa_generation.yaml` to remove `dataset_name` field and update `task` field to "activitynetqa"
- Update `utils.py` to add default values for `API_URL` and `API_KEY` when `API_TYPE` is not "openai" or "azure"
- Update `batch_gpt4.py` and `gpt4v.py` to rename `max_frames_for_video` parameter to `max_frames_num`
- Update `reka.py` to rename `max_frames_for_video` parameter to `max_frames_num` and add support for `continual_mode` with a persistent response cache

This commit updates the LMMS evaluation configuration and models to improve compatibility and add new features.

* Refactor error handling in GPT4V model evaluation

* Refactor error handling in GPT4V model evaluation

* Refactor video decoding backend to use "decord" instead of "pyav"

* Refactor image aspect ratio handling in Llava_OneVision model

* Refactor GPT4V model to fix bug in visuals encoding

* add exception for azure gpt

* feat: fix bugs

* feat: update

* Refactor image aspect ratio handling in Llava_OneVision model

* Refactor image aspect ratio handling in Llava_OneVision model

* update interleave bench
(cherry picked from commit 701d12e570c08f45a91b2700fd10dd349b5f683a)
@Luodian Luodian changed the title [Video Features] add vila [Sync Features] add vila, add wildvision, add vibe-eval, add interleave bench Jul 9, 2024
@Luodian Luodian merged commit c65118d into main Jul 13, 2024
2 checks passed
@Luodian Luodian deleted the internal_main_dev branch July 13, 2024 00:51
@Luodian Luodian restored the internal_main_dev branch August 7, 2024 00:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants