-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Sync Features] add vila, add wildvision, add vibe-eval, add interleave bench #138
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…e import for llava and llavavid
Update videomme task [w,w/o subtitle] and modified prompt for ablations
commit 050b2c3 Merge: 74facb4 ef30651 Author: Li Bo <[email protected]> Date: Tue Jun 18 13:13:38 2024 +0800 Merge pull request #114 from zjysteven/add-tinyllava add tinyllava commit ef30651 Author: Jingyang Zhang <[email protected]> Date: Mon Jun 17 17:57:02 2024 -0400 fix typo commit 9bab677 Merge: dbfb238 74facb4 Author: Jingyang Zhang <[email protected]> Date: Sun Jun 16 10:56:05 2024 -0400 Merge branch 'EvolvingLMMs-Lab:main' into add-tinyllava commit 74facb4 Merge: 8ba192f d5df72d Author: Li Bo <[email protected]> Date: Sun Jun 16 17:59:19 2024 +0800 Merge pull request #118 from teowu/main Fix the potential risk by PR #117 commit d5df72d Merge: 5bf59ed 8ba192f Author: Teo (Timothy) Wu Haoning <[email protected]> Date: Sun Jun 16 15:32:13 2024 +0800 Merge branch 'EvolvingLMMs-Lab:main' into main commit 5bf59ed Author: teowu <[email protected]> Date: Sun Jun 16 07:27:28 2024 +0000 fix #117, allow auto download with tar format videos commit 98b3955 Merge: a056f11 be9dada Author: teowu <[email protected]> Date: Sun Jun 16 07:25:07 2024 +0000 Merge branch 'main' of https://github.com/teowu/lmms-eval into main commit a056f11 Author: teowu <[email protected]> Date: Sun Jun 16 07:23:54 2024 +0000 fix #117, allow auto download with tar format videos commit 8ba192f Merge: 7cc2890 be9dada Author: Li Bo <[email protected]> Date: Sat Jun 15 17:30:59 2024 +0800 Merge pull request #117 from teowu/main LongVideoBench for LMMs-Eval commit be9dada Merge: 62ea8ce 7cc2890 Author: Teo (Timothy) Wu Haoning <[email protected]> Date: Sat Jun 15 16:39:20 2024 +0800 Merge pull request #1 from EvolvingLMMs-Lab/main Merge pull request #113 from teowu/main commit 62ea8ce Author: teowu <[email protected]> Date: Sat Jun 15 08:30:11 2024 +0000 LongVideoBench support: image LMMs (idefics2, phi3) and video LMMs (LLaVA-Next-Video-34B) commit 7cc2890 Merge: 4bc7224 ea14cd4 Author: Li Bo <[email protected]> Date: Sat Jun 15 14:10:22 2024 +0800 Merge pull request #113 from teowu/main Q-Bench, Q-Bench2, A-Bench commit dbfb238 Author: Jingyang <[email protected]> Date: Fri Jun 14 16:20:42 2024 -0400 add tinyllava commit ea14cd4 Author: teowu <[email protected]> Date: Fri Jun 14 15:01:52 2024 +0000 Add qbench, qbench2, abench; fix phi3v as its current implementation does not support multi-image commit 4bc7224 Merge: 2797987 bf14cb8 Author: Li Bo <[email protected]> Date: Fri Jun 14 02:14:43 2024 +0800 Merge pull request #111 from XinrunDu/main add II-Bench commit bf14cb8 Author: XinrunDu <[email protected]> Date: Thu Jun 13 09:37:02 2024 +0000 fix dataset_path commit 6248113 Author: XinrunDu <[email protected]> Date: Thu Jun 13 09:32:06 2024 +0000 add II-Bench commit 2797987 Merge: 63d82f1 66d4bb2 Author: Li Bo <[email protected]> Date: Thu Jun 13 11:14:47 2024 +0800 Merge pull request #109 from EvolvingLMMs-Lab/pufanyi/update_version [Small Update] Update the version of LMMs-Eval commit 66d4bb2 Author: Fanyi Pu <[email protected]> Date: Thu Jun 13 11:13:00 2024 +0800 update version commit 63d82f1 Author: Li Bo <[email protected]> Date: Thu Jun 13 11:04:32 2024 +0800 Update README.md commit 44a3379 Merge: 5ed0035 0ce46d0 Author: Li Bo <[email protected]> Date: Thu Jun 13 04:00:12 2024 +0800 Merge pull request #105 from tianyu-z/main Include VCR commit 0ce46d0 Author: Suyuchen <[email protected]> Date: Wed Jun 12 15:56:34 2024 -0400 update README.md commit 46a88d8 Merge: 47b13b9 5ed0035 Author: Suyuchen <[email protected]> Date: Wed Jun 12 15:50:26 2024 -0400 merged readme.md commit 47b13b9 Author: Suyuchen <[email protected]> Date: Wed Jun 12 15:30:52 2024 -0400 update aggregation function for vcr_wiki commit 5ed0035 Author: Li Bo <[email protected]> Date: Thu Jun 13 03:21:42 2024 +0800 Update README.md commit ed88068 Author: Li Bo <[email protected]> Date: Thu Jun 13 03:13:59 2024 +0800 Update README.md commit fea3806 Merge: d99a24a 05dc8e8 Author: Li Bo <[email protected]> Date: Thu Jun 13 03:11:49 2024 +0800 Merge pull request #108 from EvolvingLMMs-Lab/internal_main_dev [Upgrade to v0.2] Embracing Video Evaluations with LMMs-Eval commit 05dc8e8 Author: Bo Li <[email protected]> Date: Wed Jun 12 15:56:04 2024 +0000 chore: Update lmms-eval to support video evaluations for LLaVA models commit cbeee20 Author: Bo Li <[email protected]> Date: Wed Jun 12 15:50:30 2024 +0000 chore: Update lmms-eval to support video evaluations for LLaVA models commit f00d549 Author: Bo Li <[email protected]> Date: Wed Jun 12 15:46:33 2024 +0000 Update image alignment in README.md commit 3415633 Author: Bo Li <[email protected]> Date: Wed Jun 12 15:43:16 2024 +0000 Update llava conv_template in lmms_eval/models/llava.py commit 50575a9 Author: Bo Li <[email protected]> Date: Wed Jun 12 15:39:03 2024 +0000 chore: Update lmms-eval to support video evaluations for LLaVA models commit c9b2252 Author: Bo Li <[email protected]> Date: Wed Jun 12 15:33:48 2024 +0000 Bump version to 0.2.0.dev0 commit 465bd42 Merge: e43bd84 d99a24a Author: Bo Li <[email protected]> Date: Wed Jun 12 15:04:25 2024 +0000 Merge branch 'main' of https://github.com/EvolvingLMMs-Lab/lmms-eval into internal_main_dev commit e43bd84 Author: Bo Li <[email protected]> Date: Wed Jun 12 14:54:06 2024 +0000 chore: Remove unnecessary files and code related to live_bench and sft_eval tasks commit d99a24a Merge: 374590b a66003b Author: Li Bo <[email protected]> Date: Wed Jun 12 19:45:57 2024 +0800 Merge pull request #107 from AtsuMiyai/new_task/upd_update update gpt-3.5-turbo version commit a66003b Author: AtsuMiyai <[email protected]> Date: Wed Jun 12 17:05:17 2024 +0900 update gpt-3.5-turbo version commit ee91f27 Author: AtsuMiyai <[email protected]> Date: Wed Jun 12 16:50:53 2024 +0900 update gpt-3.5-turbo version commit 326b969 Author: tianyu-z <[email protected]> Date: Mon Jun 10 20:07:40 2024 -0400 include std and confidence interval commit cd050d4 Author: Suyuchen <[email protected]> Date: Mon Jun 10 18:49:47 2024 -0400 update vcr_wiki tasks in README.md commit 205721e Author: Suyuchen <[email protected]> Date: Mon Jun 10 18:43:15 2024 -0400 update vcr_wiki tasks commit db8e718 Author: tianyu-z <[email protected]> Date: Mon Jun 10 16:13:58 2024 -0400 include the try-except logic for spacy commit 427dabb Author: Suyuchen <[email protected]> Date: Mon Jun 10 15:51:05 2024 -0400 add crossed_text to vcr_wiki output commit 043b483 Author: tianyu-z <[email protected]> Date: Mon Jun 10 15:47:00 2024 -0400 switch logic commit e1f04db Author: tianyu-z <[email protected]> Date: Mon Jun 10 02:38:21 2024 -0400 modify the form of VCR commit 96e8d98 Author: tianyu-z <[email protected]> Date: Mon Jun 10 00:10:30 2024 -0400 init include vcr commit 374590b Merge: 504685e cb3b9ce Author: Kaichen Zhang - NTU <[email protected]> Date: Fri Jun 7 20:25:48 2024 +0800 Merge pull request #101 from Gumpest/main Update conbench in README commit 504685e Author: Li Bo <[email protected]> Date: Thu Jun 6 15:42:15 2024 +0800 Update README.md commit cb3b9ce Merge: c9793b3 67b64ea Author: Yuan Zhang <[email protected]> Date: Thu Jun 6 11:22:24 2024 +0800 Merge branch 'EvolvingLMMs-Lab:main' into main commit c9793b3 Author: Yuan Zhang <[email protected]> Date: Thu Jun 6 11:21:05 2024 +0800 update README commit 67b64ea Merge: 8ee7848 5fd6845 Author: Li Bo <[email protected]> Date: Wed Jun 5 23:12:58 2024 +0800 Merge pull request #100 from Gumpest/main add Conbench commit 5fd6845 Author: Yuan Zhang <[email protected]> Date: Wed Jun 5 21:52:31 2024 +0800 add conbench commit 8ee7848 Merge: 747e197 6fefaf7 Author: Li Bo <[email protected]> Date: Tue Jun 4 17:09:33 2024 +0800 Merge pull request #95 from AtsuMiyai/new_task/upd add MM-UPD commit 747e197 Merge: 4854a34 0584307 Author: Li Bo <[email protected]> Date: Tue Jun 4 17:09:04 2024 +0800 Merge pull request #97 from CaraJ7/update Add MathVerse in README.md commit 6fefaf7 Author: AtsuMiyai <[email protected]> Date: Tue Jun 4 17:36:39 2024 +0900 update utils.py for leaderboard submission commit 5f4fe36 Author: AtsuMiyai <[email protected]> Date: Sun Jun 2 23:28:27 2024 +0900 slightly change query_prompt for the reproduction commit 0584307 Author: CaraJ7 <[email protected]> Date: Sun Jun 2 17:05:28 2024 +0800 Add MathVerse in README.md commit 0581ab3 Author: AtsuMiyai <[email protected]> Date: Fri May 31 16:09:45 2024 +0900 merge model_specific_prompt_kwargs and dataset_name into each task yaml commit 4854a34 Author: Pu Fanyi <[email protected]> Date: Sat May 4 19:23:39 2024 +0800 Group MMMU images into one image (#83) * update * update font * Add matplotlib.font_manager import in utils.py * Refactor font handling in add_order_label function in utils.py * group mmmu --------- Co-authored-by: Li Bo <[email protected]> commit d224794 Author: AtsuMiyai <[email protected]> Date: Wed May 29 15:15:59 2024 +0900 add upd commit 453e793 Author: AtsuMiyai <[email protected]> Date: Wed May 29 15:03:30 2024 +0900 add upd commit 909edd6 Author: AtsuMiyai <[email protected]> Date: Wed May 29 12:52:21 2024 +0900 add upd commit 7c1ac97 Author: AtsuMiyai <[email protected]> Date: Wed May 29 12:50:32 2024 +0900 add upd commit 811301c Author: AtsuMiyai <[email protected]> Date: Wed May 29 12:46:58 2024 +0900 add upd commit 71401ba Author: AtsuMiyai <[email protected]> Date: Wed May 29 12:41:21 2024 +0900 add upd commit 24dc435 Author: Bo Li <[email protected]> Date: Mon May 27 10:17:32 2024 +0000 fix compatibility issue of older version llava commit 616edf4 Author: Bo Li <[email protected]> Date: Mon May 27 09:32:26 2024 +0000 [Fix] import issues of multilingual llava and olympiadbench commit 4c5a99e Merge: 45c05b2 b05c3e2 Author: Li Bo <[email protected]> Date: Mon May 27 14:19:53 2024 +0800 Merge pull request #87 from vfragoso/vifragos/phi3v Adding microsoft/Phi-3-vision-128k-instruct model. commit b05c3e2 Author: Victor Fragoso <[email protected]> Date: Fri May 24 16:36:37 2024 +0000 Adding documentation of Phi3v class. commit c200897 Author: Victor Fragoso <[email protected]> Date: Fri May 24 16:25:02 2024 +0000 Adding prompt arguments for Phi3v on MathVista-TestMini commit 7f9fb6b Author: Victor Fragoso <[email protected]> Date: Fri May 24 13:24:16 2024 +0000 Adding Phi3v model. commit 45c05b2 Author: kcz358 <[email protected]> Date: Thu May 23 03:47:36 2024 +0000 Set printing info for llava_hf to debug level commit 53f013e Author: kcz358 <[email protected]> Date: Thu May 23 03:41:39 2024 +0000 Fix pope random name in pope full commit 22520a9 Author: kcz358 <[email protected]> Date: Thu May 23 03:41:14 2024 +0000 Add separated pope tasks by category commit d1eefb1 Author: kcz358 <[email protected]> Date: Thu May 9 08:36:02 2024 +0000 Update gitignore commit b2b4dbd Author: kcz358 <[email protected]> Date: Mon May 20 07:45:11 2024 +0000 Comment out Spice in caption task so that don't need to download stanford nlp model commit 662f05c Author: kcz358 <[email protected]> Date: Mon May 20 03:13:13 2024 +0000 Comment out parse result in xcomposer commit 0932932 Author: kcz358 <[email protected]> Date: Thu May 16 03:55:39 2024 +0000 Fix instructblip qformer size mismatch and multi-images problem commit 557a6a3 Author: kcz358 <[email protected]> Date: Thu May 16 03:11:41 2024 +0000 Remove redundant code in fuyu commit 6aeb550 Author: kcz358 <[email protected]> Date: Thu May 16 01:45:24 2024 +0000 Fix idefics2 llava in the wild bugs commit aea80e6 Author: kcz358 <[email protected]> Date: Wed May 15 11:07:35 2024 +0000 Better task list_with_num commit 3c12a08 Author: Li Bo <[email protected]> Date: Sat May 18 02:35:52 2024 +0800 Update LICENSE commit 82317a6 Author: Li Bo <[email protected]> Date: Sat May 18 02:29:09 2024 +0800 Update LICENSE commit a8bba1c Author: Li Bo <[email protected]> Date: Sat May 18 02:28:03 2024 +0800 Create LICENSE commit caa5893 Merge: c094448 423b006 Author: Li Bo <[email protected]> Date: Mon May 13 11:45:26 2024 +0800 Merge pull request #73 from EvolvingLMMs-Lab/kc/qwen_vl_api [Feat] Add qwen vl api commit c094448 Author: kcz358 <[email protected]> Date: Sat May 11 06:11:19 2024 +0000 Fix llava_hf image tokens number issue commit 64f07e4 Author: kcz358 <[email protected]> Date: Thu May 9 02:04:10 2024 +0000 Fix endless warning for llava_hf generation commit 8aaa828 Author: Bo Li <[email protected]> Date: Thu May 2 06:13:56 2024 +0000 Add model_name parameter to Llava constructor commit 7847dc4 Author: kcz358 <[email protected]> Date: Tue May 7 03:15:59 2024 +0000 Parse result for llava_hf 1.6 commit 3e56b4f Author: kcz358 <[email protected]> Date: Tue May 7 03:09:56 2024 +0000 Fix llava_hf generation for 1.6 commit fa3ff92 Author: kcz358 <[email protected]> Date: Mon May 6 08:32:57 2024 +0000 Fix llava conv template for llama3 commit 423b006 Author: kcz358 <[email protected]> Date: Sun May 5 07:54:52 2024 +0000 Add qwen vl api commit b7fd7a9 Merge: 986139a c5a130b Author: Li Bo <[email protected]> Date: Sun May 5 13:19:48 2024 +0800 Merge pull request #59 from EvolvingLMMs-Lab/add_idefics2 add idefics2 commit 986139a Merge: b46239c 8d3526c Author: Li Bo <[email protected]> Date: Fri May 3 01:18:18 2024 +0800 Merge pull request #36 from cocoshe/main [Fix] repr llava doc commit b46239c Merge: bc69a74 373265f Author: Li Bo <[email protected]> Date: Fri May 3 01:17:34 2024 +0800 Merge pull request #56 from gagan3012/main Multilingual LLava bench commit bc69a74 Merge: eef3aeb 626e8a9 Author: Li Bo <[email protected]> Date: Fri May 3 01:12:14 2024 +0800 Merge pull request #70 from hunterheiden/hsh/new_task/WebSRC Bugfix: WebSRC should be token-level F1 NOT character-level commit 626e8a9 Author: Hunter Heidenreich <[email protected]> Date: Thu May 2 09:31:03 2024 -0400 Bugfix: WebSRC should be token-level F1 NOT character-level commit eef3aeb Merge: c4e9dd9 9bca441 Author: Li Bo <[email protected]> Date: Thu May 2 14:38:17 2024 +0800 Merge pull request #69 from hunterheiden/hsh/new_task/WebSRC [New Task] WebSRC (multimodal Q&A on web screenshots) commit 9bca441 Author: Hunter Heidenreich <[email protected]> Date: Wed May 1 11:07:29 2024 -0400 Add code to enable compilation of submission for WebSRC test split commit 7687495 Author: Hunter Heidenreich <[email protected]> Date: Wed May 1 10:47:32 2024 -0400 Draft and validate websrc eval on dev split commit 4eebd3e Author: Hunter Heidenreich <[email protected]> Date: Wed May 1 10:46:54 2024 -0400 Update main README with new task names commit 35fe80b Author: Hunter Heidenreich <[email protected]> Date: Wed May 1 10:46:20 2024 -0400 Draft README for WebSRC commit 955bd06 Author: Hunter Heidenreich <[email protected]> Date: Tue Apr 30 10:16:21 2024 -0400 Init webSRC commit c4e9dd9 Merge: d8a3a99 319afcc Author: Li Bo <[email protected]> Date: Fri Apr 26 14:37:22 2024 +0800 Merge pull request #63 from hunterheiden/hsh/new_task/screenspot New Task: ScreenSpot - Grounding (REC) and instruction generation (REG) on screens commit 319afcc Author: Hunter Heidenreich <[email protected]> Date: Thu Apr 25 11:44:34 2024 -0400 slight update commit 2f3811c Author: Hunter Heidenreich <[email protected]> Date: Thu Apr 25 11:41:04 2024 -0400 Add README file specific to ScreenSpot commit 28962cb Author: Hunter Heidenreich <[email protected]> Date: Wed Apr 24 11:52:33 2024 -0400 Update README to reflect new tasks commit e457cfb Author: Hunter Heidenreich <[email protected]> Date: Tue Apr 23 18:33:16 2024 -0400 Create ScreenSpot on clean branch commit d8a3a99 Merge: 3dcd015 ed17129 Author: Li Bo <[email protected]> Date: Tue Apr 23 10:34:03 2024 +0800 Merge pull request #61 from tupini07/patch-1 Fix typo in Qwen-VL that was causing "reference before assignment" commit ed17129 Author: Andrea Tupini <[email protected]> Date: Mon Apr 22 14:56:41 2024 -0600 refactor query construction for clarity commit cd87420 Author: Andrea Tupini <[email protected]> Date: Mon Apr 22 14:54:29 2024 -0600 convert contexts to list if necessary and remove unnecessary construction of `questions` commit 8557367 Author: Andrea Tupini <[email protected]> Date: Mon Apr 22 14:47:33 2024 -0600 Fix typo in qwen_vl that was causing "reference before assignment" commit 3dcd015 Merge: 95df9fe 743673a Author: Li Bo <[email protected]> Date: Sat Apr 20 22:03:16 2024 +0800 Merge pull request #60 from CaraJ7/main Add MathVerse commit 743673a Merge: c1a5472 95df9fe Author: CaraJ7 <[email protected]> Date: Sat Apr 20 21:49:02 2024 +0800 Merge branch 'main' of https://github.com/EvolvingLMMs-Lab/lmms-eval commit c1a5472 Author: CaraJ7 <[email protected]> Date: Sat Apr 20 21:45:34 2024 +0800 Add MathVerse commit 373265f Author: Gagan Bhatia <[email protected]> Date: Fri Apr 12 17:21:39 2024 -0700 Add files via upload commit d853051 Author: Gagan Bhatia <[email protected]> Date: Fri Apr 12 17:19:49 2024 -0700 Create README.md commit 8d3526c Author: cocoshe <[email protected]> Date: Thu Mar 28 13:38:36 2024 +0800 fix doc
commit 8f9d620 Author: Li Bo <[email protected]> Date: Sun Jun 23 14:02:25 2024 +0800 Update pyproject.toml commit 6341b7c Merge: fce85f1 903b042 Author: Li Bo <[email protected]> Date: Sun Jun 23 14:02:02 2024 +0800 Merge pull request #125 from EvolvingLMMs-Lab/dev/interleave [Model] aligned llava-interleave model results on video tasks commit 903b042 Author: kcz358 <[email protected]> Date: Sat Jun 22 12:07:13 2024 +0000 Remove unnecessary lines for video llava commit d78ec86 Merge: ebe7217 fce85f1 Author: Li Bo <[email protected]> Date: Sat Jun 22 13:57:31 2024 +0800 Merge branch 'main' into dev/interleave commit ebe7217 Author: kcz358 <[email protected]> Date: Sat Jun 22 02:57:08 2024 +0000 Delete unnecessary lines commit 120c474 Author: kcz358 <[email protected]> Date: Fri Jun 21 08:38:41 2024 +0000 Revise model registry for llava_hf and longva commit 7d6201f Author: kcz358 <[email protected]> Date: Fri Jun 21 08:38:24 2024 +0000 Add longva commit 12f4806 Author: kcz358 <[email protected]> Date: Fri Jun 21 08:35:39 2024 +0000 Remove unnecessary lines since use batched visuals now in llava commit 12cea76 Author: Bo Li <[email protected]> Date: Thu Jun 20 18:15:32 2024 +0000 chore: Add loguru for logging in lmms_eval package commit 8ef2474 Author: Bo Li <[email protected]> Date: Thu Jun 20 12:11:03 2024 +0000 chore: Remove unused models from lmms_eval package commit af38885 Author: Bo Li <[email protected]> Date: Thu Jun 20 12:07:09 2024 +0000 chore: Handle ImportError when importing models Handle the ImportError exception when importing models in the lmms_eval package. This change adds a try-except block to catch the ImportError and print an error message indicating the failed import. This will help with troubleshooting and identifying any issues with the model imports. commit fce85f1 Merge: dbe6329 d94f83c Author: Li Bo <[email protected]> Date: Thu Jun 20 20:02:12 2024 +0800 Merge pull request #120 from EvolvingLMMs-Lab/pufanyi/hf_dataset_docs Add docs for datasets upload to HF commit dbe6329 Author: choiszt <[email protected]> Date: Thu Jun 20 15:14:21 2024 +0800 update ablation for videomme datasets commit d94f83c Author: Li Bo <[email protected]> Date: Thu Jun 20 13:30:59 2024 +0800 Update README.md commit cab8159 Author: Li Bo <[email protected]> Date: Thu Jun 20 13:30:29 2024 +0800 Update README.md commit 4587665 Author: kcz358 <[email protected]> Date: Thu Jun 20 03:55:30 2024 +0000 Add llava_hf back to registry commit 3463651 Author: kcz358 <[email protected]> Date: Thu Jun 20 03:54:33 2024 +0000 Remove handling non-visual loop in llava commit cb0d3f4 Author: Fanyi Pu <[email protected]> Date: Thu Jun 20 02:11:18 2024 +0800 update readme commit 813877b Author: Fanyi Pu <[email protected]> Date: Wed Jun 19 15:37:52 2024 +0800 to sh script commit a14684b Author: Fanyi Pu <[email protected]> Date: Wed Jun 19 15:37:04 2024 +0800 lint commit d0f8851 Author: Fanyi Pu <[email protected]> Date: Wed Jun 19 15:36:48 2024 +0800 small fix commit 63748e9 Author: Fanyi Pu <[email protected]> Date: Wed Jun 19 15:36:43 2024 +0800 small fix commit 7f1159a Author: Fanyi Pu <[email protected]> Date: Wed Jun 19 15:35:05 2024 +0800 update preparation commit 19f9bd6 Author: Fanyi Pu <[email protected]> Date: Wed Jun 19 15:23:24 2024 +0800 docs commit ce6f889 Author: Fanyi Pu <[email protected]> Date: Wed Jun 19 15:04:16 2024 +0800 tutorial commit f513c52 Author: Bo Li <[email protected]> Date: Wed Jun 19 06:51:19 2024 +0000 chore: Update dependencies to fix potential risks and improve compatibility commit efb5295 Author: kcz358 <[email protected]> Date: Wed Jun 19 10:25:58 2024 +0800 Release llava-wilder commit 742651f Author: Fanyi Pu <[email protected]> Date: Wed Jun 19 07:44:26 2024 +0800 feat: Add support for auto downloading tar format videos commit 511b625 Merge: 22a4958 050b2c3 Author: Bo Li <[email protected]> Date: Tue Jun 18 17:01:03 2024 +0000 Merge branch 'main' of https://github.com/EvolvingLMMs-Lab/lmms-eval commit 050b2c3 Merge: 74facb4 ef30651 Author: Li Bo <[email protected]> Date: Tue Jun 18 13:13:38 2024 +0800 Merge pull request #114 from zjysteven/add-tinyllava add tinyllava commit ef30651 Author: Jingyang Zhang <[email protected]> Date: Mon Jun 17 17:57:02 2024 -0400 fix typo commit 9bab677 Merge: dbfb238 74facb4 Author: Jingyang Zhang <[email protected]> Date: Sun Jun 16 10:56:05 2024 -0400 Merge branch 'EvolvingLMMs-Lab:main' into add-tinyllava commit 74facb4 Merge: 8ba192f d5df72d Author: Li Bo <[email protected]> Date: Sun Jun 16 17:59:19 2024 +0800 Merge pull request #118 from teowu/main Fix the potential risk by PR #117 commit d5df72d Merge: 5bf59ed 8ba192f Author: Teo (Timothy) Wu Haoning <[email protected]> Date: Sun Jun 16 15:32:13 2024 +0800 Merge branch 'EvolvingLMMs-Lab:main' into main commit 5bf59ed Author: teowu <[email protected]> Date: Sun Jun 16 07:27:28 2024 +0000 fix #117, allow auto download with tar format videos commit 98b3955 Merge: a056f11 be9dada Author: teowu <[email protected]> Date: Sun Jun 16 07:25:07 2024 +0000 Merge branch 'main' of https://github.com/teowu/lmms-eval into main commit a056f11 Author: teowu <[email protected]> Date: Sun Jun 16 07:23:54 2024 +0000 fix #117, allow auto download with tar format videos commit 8ba192f Merge: 7cc2890 be9dada Author: Li Bo <[email protected]> Date: Sat Jun 15 17:30:59 2024 +0800 Merge pull request #117 from teowu/main LongVideoBench for LMMs-Eval commit be9dada Merge: 62ea8ce 7cc2890 Author: Teo (Timothy) Wu Haoning <[email protected]> Date: Sat Jun 15 16:39:20 2024 +0800 Merge pull request #1 from EvolvingLMMs-Lab/main Merge pull request #113 from teowu/main commit 62ea8ce Author: teowu <[email protected]> Date: Sat Jun 15 08:30:11 2024 +0000 LongVideoBench support: image LMMs (idefics2, phi3) and video LMMs (LLaVA-Next-Video-34B) commit 7cc2890 Merge: 4bc7224 ea14cd4 Author: Li Bo <[email protected]> Date: Sat Jun 15 14:10:22 2024 +0800 Merge pull request #113 from teowu/main Q-Bench, Q-Bench2, A-Bench commit dbfb238 Author: Jingyang <[email protected]> Date: Fri Jun 14 16:20:42 2024 -0400 add tinyllava commit ea14cd4 Author: teowu <[email protected]> Date: Fri Jun 14 15:01:52 2024 +0000 Add qbench, qbench2, abench; fix phi3v as its current implementation does not support multi-image commit 4bc7224 Merge: 2797987 bf14cb8 Author: Li Bo <[email protected]> Date: Fri Jun 14 02:14:43 2024 +0800 Merge pull request #111 from XinrunDu/main add II-Bench commit bf14cb8 Author: XinrunDu <[email protected]> Date: Thu Jun 13 09:37:02 2024 +0000 fix dataset_path commit 6248113 Author: XinrunDu <[email protected]> Date: Thu Jun 13 09:32:06 2024 +0000 add II-Bench commit 2797987 Merge: 63d82f1 66d4bb2 Author: Li Bo <[email protected]> Date: Thu Jun 13 11:14:47 2024 +0800 Merge pull request #109 from EvolvingLMMs-Lab/pufanyi/update_version [Small Update] Update the version of LMMs-Eval commit 66d4bb2 Author: Fanyi Pu <[email protected]> Date: Thu Jun 13 11:13:00 2024 +0800 update version commit 63d82f1 Author: Li Bo <[email protected]> Date: Thu Jun 13 11:04:32 2024 +0800 Update README.md commit 44a3379 Merge: 5ed0035 0ce46d0 Author: Li Bo <[email protected]> Date: Thu Jun 13 04:00:12 2024 +0800 Merge pull request #105 from tianyu-z/main Include VCR commit 0ce46d0 Author: Suyuchen <[email protected]> Date: Wed Jun 12 15:56:34 2024 -0400 update README.md commit 46a88d8 Merge: 47b13b9 5ed0035 Author: Suyuchen <[email protected]> Date: Wed Jun 12 15:50:26 2024 -0400 merged readme.md commit 47b13b9 Author: Suyuchen <[email protected]> Date: Wed Jun 12 15:30:52 2024 -0400 update aggregation function for vcr_wiki commit 5ed0035 Author: Li Bo <[email protected]> Date: Thu Jun 13 03:21:42 2024 +0800 Update README.md commit ed88068 Author: Li Bo <[email protected]> Date: Thu Jun 13 03:13:59 2024 +0800 Update README.md commit fea3806 Merge: d99a24a 05dc8e8 Author: Li Bo <[email protected]> Date: Thu Jun 13 03:11:49 2024 +0800 Merge pull request #108 from EvolvingLMMs-Lab/internal_main_dev [Upgrade to v0.2] Embracing Video Evaluations with LMMs-Eval commit 05dc8e8 Author: Bo Li <[email protected]> Date: Wed Jun 12 15:56:04 2024 +0000 chore: Update lmms-eval to support video evaluations for LLaVA models commit cbeee20 Author: Bo Li <[email protected]> Date: Wed Jun 12 15:50:30 2024 +0000 chore: Update lmms-eval to support video evaluations for LLaVA models commit f00d549 Author: Bo Li <[email protected]> Date: Wed Jun 12 15:46:33 2024 +0000 Update image alignment in README.md commit 3415633 Author: Bo Li <[email protected]> Date: Wed Jun 12 15:43:16 2024 +0000 Update llava conv_template in lmms_eval/models/llava.py commit 50575a9 Author: Bo Li <[email protected]> Date: Wed Jun 12 15:39:03 2024 +0000 chore: Update lmms-eval to support video evaluations for LLaVA models commit c9b2252 Author: Bo Li <[email protected]> Date: Wed Jun 12 15:33:48 2024 +0000 Bump version to 0.2.0.dev0 commit 465bd42 Merge: e43bd84 d99a24a Author: Bo Li <[email protected]> Date: Wed Jun 12 15:04:25 2024 +0000 Merge branch 'main' of https://github.com/EvolvingLMMs-Lab/lmms-eval into internal_main_dev commit e43bd84 Author: Bo Li <[email protected]> Date: Wed Jun 12 14:54:06 2024 +0000 chore: Remove unnecessary files and code related to live_bench and sft_eval tasks commit d99a24a Merge: 374590b a66003b Author: Li Bo <[email protected]> Date: Wed Jun 12 19:45:57 2024 +0800 Merge pull request #107 from AtsuMiyai/new_task/upd_update update gpt-3.5-turbo version commit a66003b Author: AtsuMiyai <[email protected]> Date: Wed Jun 12 17:05:17 2024 +0900 update gpt-3.5-turbo version commit ee91f27 Author: AtsuMiyai <[email protected]> Date: Wed Jun 12 16:50:53 2024 +0900 update gpt-3.5-turbo version commit 326b969 Author: tianyu-z <[email protected]> Date: Mon Jun 10 20:07:40 2024 -0400 include std and confidence interval commit cd050d4 Author: Suyuchen <[email protected]> Date: Mon Jun 10 18:49:47 2024 -0400 update vcr_wiki tasks in README.md commit 205721e Author: Suyuchen <[email protected]> Date: Mon Jun 10 18:43:15 2024 -0400 update vcr_wiki tasks commit db8e718 Author: tianyu-z <[email protected]> Date: Mon Jun 10 16:13:58 2024 -0400 include the try-except logic for spacy commit 427dabb Author: Suyuchen <[email protected]> Date: Mon Jun 10 15:51:05 2024 -0400 add crossed_text to vcr_wiki output commit 043b483 Author: tianyu-z <[email protected]> Date: Mon Jun 10 15:47:00 2024 -0400 switch logic commit e1f04db Author: tianyu-z <[email protected]> Date: Mon Jun 10 02:38:21 2024 -0400 modify the form of VCR commit 96e8d98 Author: tianyu-z <[email protected]> Date: Mon Jun 10 00:10:30 2024 -0400 init include vcr commit 374590b Merge: 504685e cb3b9ce Author: Kaichen Zhang - NTU <[email protected]> Date: Fri Jun 7 20:25:48 2024 +0800 Merge pull request #101 from Gumpest/main Update conbench in README commit 504685e Author: Li Bo <[email protected]> Date: Thu Jun 6 15:42:15 2024 +0800 Update README.md commit cb3b9ce Merge: c9793b3 67b64ea Author: Yuan Zhang <[email protected]> Date: Thu Jun 6 11:22:24 2024 +0800 Merge branch 'EvolvingLMMs-Lab:main' into main commit c9793b3 Author: Yuan Zhang <[email protected]> Date: Thu Jun 6 11:21:05 2024 +0800 update README commit 67b64ea Merge: 8ee7848 5fd6845 Author: Li Bo <[email protected]> Date: Wed Jun 5 23:12:58 2024 +0800 Merge pull request #100 from Gumpest/main add Conbench commit 5fd6845 Author: Yuan Zhang <[email protected]> Date: Wed Jun 5 21:52:31 2024 +0800 add conbench commit 8ee7848 Merge: 747e197 6fefaf7 Author: Li Bo <[email protected]> Date: Tue Jun 4 17:09:33 2024 +0800 Merge pull request #95 from AtsuMiyai/new_task/upd add MM-UPD commit 747e197 Merge: 4854a34 0584307 Author: Li Bo <[email protected]> Date: Tue Jun 4 17:09:04 2024 +0800 Merge pull request #97 from CaraJ7/update Add MathVerse in README.md commit 6fefaf7 Author: AtsuMiyai <[email protected]> Date: Tue Jun 4 17:36:39 2024 +0900 update utils.py for leaderboard submission commit 5f4fe36 Author: AtsuMiyai <[email protected]> Date: Sun Jun 2 23:28:27 2024 +0900 slightly change query_prompt for the reproduction commit 0584307 Author: CaraJ7 <[email protected]> Date: Sun Jun 2 17:05:28 2024 +0800 Add MathVerse in README.md commit 0581ab3 Author: AtsuMiyai <[email protected]> Date: Fri May 31 16:09:45 2024 +0900 merge model_specific_prompt_kwargs and dataset_name into each task yaml commit 4854a34 Author: Pu Fanyi <[email protected]> Date: Sat May 4 19:23:39 2024 +0800 Group MMMU images into one image (#83) * update * update font * Add matplotlib.font_manager import in utils.py * Refactor font handling in add_order_label function in utils.py * group mmmu --------- Co-authored-by: Li Bo <[email protected]> commit d224794 Author: AtsuMiyai <[email protected]> Date: Wed May 29 15:15:59 2024 +0900 add upd commit 453e793 Author: AtsuMiyai <[email protected]> Date: Wed May 29 15:03:30 2024 +0900 add upd commit 909edd6 Author: AtsuMiyai <[email protected]> Date: Wed May 29 12:52:21 2024 +0900 add upd commit 7c1ac97 Author: AtsuMiyai <[email protected]> Date: Wed May 29 12:50:32 2024 +0900 add upd commit 811301c Author: AtsuMiyai <[email protected]> Date: Wed May 29 12:46:58 2024 +0900 add upd commit 71401ba Author: AtsuMiyai <[email protected]> Date: Wed May 29 12:41:21 2024 +0900 add upd commit 24dc435 Author: Bo Li <[email protected]> Date: Mon May 27 10:17:32 2024 +0000 fix compatibility issue of older version llava commit 616edf4 Author: Bo Li <[email protected]> Date: Mon May 27 09:32:26 2024 +0000 [Fix] import issues of multilingual llava and olympiadbench commit 4c5a99e Merge: 45c05b2 b05c3e2 Author: Li Bo <[email protected]> Date: Mon May 27 14:19:53 2024 +0800 Merge pull request #87 from vfragoso/vifragos/phi3v Adding microsoft/Phi-3-vision-128k-instruct model. commit b05c3e2 Author: Victor Fragoso <[email protected]> Date: Fri May 24 16:36:37 2024 +0000 Adding documentation of Phi3v class. commit c200897 Author: Victor Fragoso <[email protected]> Date: Fri May 24 16:25:02 2024 +0000 Adding prompt arguments for Phi3v on MathVista-TestMini commit 7f9fb6b Author: Victor Fragoso <[email protected]> Date: Fri May 24 13:24:16 2024 +0000 Adding Phi3v model. commit 45c05b2 Author: kcz358 <[email protected]> Date: Thu May 23 03:47:36 2024 +0000 Set printing info for llava_hf to debug level commit 53f013e Author: kcz358 <[email protected]> Date: Thu May 23 03:41:39 2024 +0000 Fix pope random name in pope full commit 22520a9 Author: kcz358 <[email protected]> Date: Thu May 23 03:41:14 2024 +0000 Add separated pope tasks by category commit d1eefb1 Author: kcz358 <[email protected]> Date: Thu May 9 08:36:02 2024 +0000 Update gitignore commit b2b4dbd Author: kcz358 <[email protected]> Date: Mon May 20 07:45:11 2024 +0000 Comment out Spice in caption task so that don't need to download stanford nlp model commit 662f05c Author: kcz358 <[email protected]> Date: Mon May 20 03:13:13 2024 +0000 Comment out parse result in xcomposer commit 0932932 Author: kcz358 <[email protected]> Date: Thu May 16 03:55:39 2024 +0000 Fix instructblip qformer size mismatch and multi-images problem commit 557a6a3 Author: kcz358 <[email protected]> Date: Thu May 16 03:11:41 2024 +0000 Remove redundant code in fuyu commit 6aeb550 Author: kcz358 <[email protected]> Date: Thu May 16 01:45:24 2024 +0000 Fix idefics2 llava in the wild bugs commit aea80e6 Author: kcz358 <[email protected]> Date: Wed May 15 11:07:35 2024 +0000 Better task list_with_num commit 3c12a08 Author: Li Bo <[email protected]> Date: Sat May 18 02:35:52 2024 +0800 Update LICENSE commit 82317a6 Author: Li Bo <[email protected]> Date: Sat May 18 02:29:09 2024 +0800 Update LICENSE commit a8bba1c Author: Li Bo <[email protected]> Date: Sat May 18 02:28:03 2024 +0800 Create LICENSE commit caa5893 Merge: c094448 423b006 Author: Li Bo <[email protected]> Date: Mon May 13 11:45:26 2024 +0800 Merge pull request #73 from EvolvingLMMs-Lab/kc/qwen_vl_api [Feat] Add qwen vl api commit c094448 Author: kcz358 <[email protected]> Date: Sat May 11 06:11:19 2024 +0000 Fix llava_hf image tokens number issue commit 64f07e4 Author: kcz358 <[email protected]> Date: Thu May 9 02:04:10 2024 +0000 Fix endless warning for llava_hf generation commit 8aaa828 Author: Bo Li <[email protected]> Date: Thu May 2 06:13:56 2024 +0000 Add model_name parameter to Llava constructor commit 7847dc4 Author: kcz358 <[email protected]> Date: Tue May 7 03:15:59 2024 +0000 Parse result for llava_hf 1.6 commit 3e56b4f Author: kcz358 <[email protected]> Date: Tue May 7 03:09:56 2024 +0000 Fix llava_hf generation for 1.6 commit fa3ff92 Author: kcz358 <[email protected]> Date: Mon May 6 08:32:57 2024 +0000 Fix llava conv template for llama3 commit 423b006 Author: kcz358 <[email protected]> Date: Sun May 5 07:54:52 2024 +0000 Add qwen vl api commit b7fd7a9 Merge: 986139a c5a130b Author: Li Bo <[email protected]> Date: Sun May 5 13:19:48 2024 +0800 Merge pull request #59 from EvolvingLMMs-Lab/add_idefics2 add idefics2 commit 986139a Merge: b46239c 8d3526c Author: Li Bo <[email protected]> Date: Fri May 3 01:18:18 2024 +0800 Merge pull request #36 from cocoshe/main [Fix] repr llava doc commit b46239c Merge: bc69a74 373265f Author: Li Bo <[email protected]> Date: Fri May 3 01:17:34 2024 +0800 Merge pull request #56 from gagan3012/main Multilingual LLava bench commit bc69a74 Merge: eef3aeb 626e8a9 Author: Li Bo <[email protected]> Date: Fri May 3 01:12:14 2024 +0800 Merge pull request #70 from hunterheiden/hsh/new_task/WebSRC Bugfix: WebSRC should be token-level F1 NOT character-level commit 626e8a9 Author: Hunter Heidenreich <[email protected]> Date: Thu May 2 09:31:03 2024 -0400 Bugfix: WebSRC should be token-level F1 NOT character-level commit eef3aeb Merge: c4e9dd9 9bca441 Author: Li Bo <[email protected]> Date: Thu May 2 14:38:17 2024 +0800 Merge pull request #69 from hunterheiden/hsh/new_task/WebSRC [New Task] WebSRC (multimodal Q&A on web screenshots) commit 9bca441 Author: Hunter Heidenreich <[email protected]> Date: Wed May 1 11:07:29 2024 -0400 Add code to enable compilation of submission for WebSRC test split commit 7687495 Author: Hunter Heidenreich <[email protected]> Date: Wed May 1 10:47:32 2024 -0400 Draft and validate websrc eval on dev split commit 4eebd3e Author: Hunter Heidenreich <[email protected]> Date: Wed May 1 10:46:54 2024 -0400 Update main README with new task names commit 35fe80b Author: Hunter Heidenreich <[email protected]> Date: Wed May 1 10:46:20 2024 -0400 Draft README for WebSRC commit 955bd06 Author: Hunter Heidenreich <[email protected]> Date: Tue Apr 30 10:16:21 2024 -0400 Init webSRC commit c4e9dd9 Merge: d8a3a99 319afcc Author: Li Bo <[email protected]> Date: Fri Apr 26 14:37:22 2024 +0800 Merge pull request #63 from hunterheiden/hsh/new_task/screenspot New Task: ScreenSpot - Grounding (REC) and instruction generation (REG) on screens commit 319afcc Author: Hunter Heidenreich <[email protected]> Date: Thu Apr 25 11:44:34 2024 -0400 slight update commit 2f3811c Author: Hunter Heidenreich <[email protected]> Date: Thu Apr 25 11:41:04 2024 -0400 Add README file specific to ScreenSpot commit 28962cb Author: Hunter Heidenreich <[email protected]> Date: Wed Apr 24 11:52:33 2024 -0400 Update README to reflect new tasks commit e457cfb Author: Hunter Heidenreich <[email protected]> Date: Tue Apr 23 18:33:16 2024 -0400 Create ScreenSpot on clean branch commit d8a3a99 Merge: 3dcd015 ed17129 Author: Li Bo <[email protected]> Date: Tue Apr 23 10:34:03 2024 +0800 Merge pull request #61 from tupini07/patch-1 Fix typo in Qwen-VL that was causing "reference before assignment" commit ed17129 Author: Andrea Tupini <[email protected]> Date: Mon Apr 22 14:56:41 2024 -0600 refactor query construction for clarity commit cd87420 Author: Andrea Tupini <[email protected]> Date: Mon Apr 22 14:54:29 2024 -0600 convert contexts to list if necessary and remove unnecessary construction of `questions` commit 8557367 Author: Andrea Tupini <[email protected]> Date: Mon Apr 22 14:47:33 2024 -0600 Fix typo in qwen_vl that was causing "reference before assignment" commit 3dcd015 Merge: 95df9fe 743673a Author: Li Bo <[email protected]> Date: Sat Apr 20 22:03:16 2024 +0800 Merge pull request #60 from CaraJ7/main Add MathVerse commit 743673a Merge: c1a5472 95df9fe Author: CaraJ7 <[email protected]> Date: Sat Apr 20 21:49:02 2024 +0800 Merge branch 'main' of https://github.com/EvolvingLMMs-Lab/lmms-eval commit c1a5472 Author: CaraJ7 <[email protected]> Date: Sat Apr 20 21:45:34 2024 +0800 Add MathVerse commit 373265f Author: Gagan Bhatia <[email protected]> Date: Fri Apr 12 17:21:39 2024 -0700 Add files via upload commit d853051 Author: Gagan Bhatia <[email protected]> Date: Fri Apr 12 17:19:49 2024 -0700 Create README.md commit 22a4958 Author: Bo Li <[email protected]> Date: Thu Apr 4 17:12:43 2024 +0000 [WIP] adding mmbench dev evaluation (#75) * WIP * Update GPT evaluation model name and sys prompt * 🛠️ Scale accuracy to percentage The accuracy value is now multiplied by 100 in the aggregation function to represent it as a percentage. Regarding the evaluation process, `math` module importation and refactoring reduce progress log verbosity by logging every 100 evaluations instead of 10. It prevents potential logging overflow. Handling of NaN values is added to ensure 'default_value' is set in case of missing data, avoiding errors in split, category, and l2-category assignments. Finally, reporting of categorical and l2-categorical accuracies is streamlined through a new `calculate_hit_rates` function, improving code readability and maintenance. Issue refs: #1427, #1533 * Update GPT evaluation model name and API configuration * Refactor MMBench_Evaluator class to handle missing columns * Add print statements for detailed results in MMBench-CN(CC), MMBench-CN(Dev), and MMBench-EN(Dev) evaluations * Refactor MMBench-CN and MMBench-EN evaluation functions * 🔄 Refactor result processing and logging logic - Simplified the result processing functions across different utility modules (`cc_utils.py`, `cn_utils.py`, `en_utils.py`) to unify the handling of multiple-choice options. Now, all options ("A" to "E") are dynamically added to the result data, and default to "nan" if not provided in the document. - Removed redundant keys directly from the process results dict creation to avoid clutter and align with the new dynamic addition of options. - In `mmbench_evals.py`, removed the unnecessary check for all splits being 'dev' and streamlined the evaluation loop by eliminating the progress bar (tqdm) for a cleaner log output. - Commented-out code and verbose logging during evaluation, which may have interfered with performance, has been removed for a more efficient and less intrusive logging experience. This cleanup reduces redundancy in the codebase and improves evaluation performance. Refs #2045 --------- Co-authored-by: Bo Li <[email protected]> (cherry picked from commit a19278c) commit 8d3526c Author: cocoshe <[email protected]> Date: Thu Mar 28 13:38:36 2024 +0800 fix doc
* init live bench * update path * chore: Refactor live_bench package structure and update dependencies * update * Merge remote-tracking branch 'origin/internal_main_dev' * Refactor live_bench package structure and update dependencies * Refactor live_bench package structure and update dependencies * Fix execution count in example.ipynb * extract_infomation * Refactor extract_infomation.py to improve text extraction from HTML * fix * fix * extract infomation * chore: Refactor extract_infomation.py for improved readability and maintainability * chore: Refactor data_generator prompt.md and check_prompt.md for improved clarity and instructions * lint * update * update prompt * extract infomation * add info * lint * update * filter * update version of live_bench * Update model version to gemini-1.5-pro * update * livebench_eval * livebench * update
* small fix * lint
* fix doc * [WIP] adding mmbench dev evaluation (#75) * WIP * Update GPT evaluation model name and sys prompt * 🛠️ Scale accuracy to percentage The accuracy value is now multiplied by 100 in the aggregation function to represent it as a percentage. Regarding the evaluation process, `math` module importation and refactoring reduce progress log verbosity by logging every 100 evaluations instead of 10. It prevents potential logging overflow. Handling of NaN values is added to ensure 'default_value' is set in case of missing data, avoiding errors in split, category, and l2-category assignments. Finally, reporting of categorical and l2-categorical accuracies is streamlined through a new `calculate_hit_rates` function, improving code readability and maintenance. Issue refs: #1427, #1533 * Update GPT evaluation model name and API configuration * Refactor MMBench_Evaluator class to handle missing columns * Add print statements for detailed results in MMBench-CN(CC), MMBench-CN(Dev), and MMBench-EN(Dev) evaluations * Refactor MMBench-CN and MMBench-EN evaluation functions * 🔄 Refactor result processing and logging logic - Simplified the result processing functions across different utility modules (`cc_utils.py`, `cn_utils.py`, `en_utils.py`) to unify the handling of multiple-choice options. Now, all options ("A" to "E") are dynamically added to the result data, and default to "nan" if not provided in the document. - Removed redundant keys directly from the process results dict creation to avoid clutter and align with the new dynamic addition of options. - In `mmbench_evals.py`, removed the unnecessary check for all splits being 'dev' and streamlined the evaluation loop by eliminating the progress bar (tqdm) for a cleaner log output. - Commented-out code and verbose logging during evaluation, which may have interfered with performance, has been removed for a more efficient and less intrusive logging experience. This cleanup reduces redundancy in the codebase and improves evaluation performance. Refs #2045 --------- Co-authored-by: Bo Li <[email protected]> (cherry picked from commit a19278c) * Create README.md * Add files via upload * Add MathVerse * Fix typo in qwen_vl that was causing "reference before assignment" * convert contexts to list if necessary and remove unnecessary construction of `questions` * refactor query construction for clarity * Create ScreenSpot on clean branch * Update README to reflect new tasks * Add README file specific to ScreenSpot * slight update * Init webSRC * Draft README for WebSRC * Update main README with new task names * Draft and validate websrc eval on dev split * Add code to enable compilation of submission for WebSRC test split * Bugfix: WebSRC should be token-level F1 NOT character-level * Add qwen vl api * Fix llava conv template for llama3 * Fix llava_hf generation for 1.6 * Parse result for llava_hf 1.6 * Add model_name parameter to Llava constructor * Fix endless warning for llava_hf generation * Fix llava_hf image tokens number issue * Create LICENSE * Update LICENSE * Update LICENSE * Better task list_with_num * Fix idefics2 llava in the wild bugs * Remove redundant code in fuyu * Fix instructblip qformer size mismatch and multi-images problem * Comment out parse result in xcomposer * Comment out Spice in caption task so that don't need to download stanford nlp model * Update gitignore * Add separated pope tasks by category * Fix pope random name in pope full * Set printing info for llava_hf to debug level * Adding Phi3v model. * Adding prompt arguments for Phi3v on MathVista-TestMini * Adding documentation of Phi3v class. * [Fix] import issues of multilingual llava and olympiadbench * fix compatibility issue of older version llava * add upd * add upd * add upd * add upd * add upd * add upd * Group MMMU images into one image (#83) * update * update font * Add matplotlib.font_manager import in utils.py * Refactor font handling in add_order_label function in utils.py * group mmmu --------- Co-authored-by: Li Bo <[email protected]> * merge model_specific_prompt_kwargs and dataset_name into each task yaml * Add MathVerse in README.md * slightly change query_prompt for the reproduction * update utils.py for leaderboard submission * add conbench * update README * Update README.md * init include vcr * modify the form of VCR * switch logic * add crossed_text to vcr_wiki output * include the try-except logic for spacy * update vcr_wiki tasks * update vcr_wiki tasks in README.md * include std and confidence interval * update gpt-3.5-turbo version * update gpt-3.5-turbo version * chore: Remove unnecessary files and code related to live_bench and sft_eval tasks * Bump version to 0.2.0.dev0 * chore: Update lmms-eval to support video evaluations for LLaVA models * Update llava conv_template in lmms_eval/models/llava.py * Update image alignment in README.md * chore: Update lmms-eval to support video evaluations for LLaVA models * chore: Update lmms-eval to support video evaluations for LLaVA models * Update README.md * Update README.md * update aggregation function for vcr_wiki * update README.md * Update README.md * update version * add II-Bench * fix dataset_path * Add qbench, qbench2, abench; fix phi3v as its current implementation does not support multi-image * add tinyllava * LongVideoBench support: image LMMs (idefics2, phi3) and video LMMs (LLaVA-Next-Video-34B) * fix #117, allow auto download with tar format videos * fix #117, allow auto download with tar format videos * fix typo * feat: Add support for auto downloading tar format videos * Release llava-wilder * chore: Update dependencies to fix potential risks and improve compatibility * tutorial * docs * update preparation * small fix * small fix * lint * to sh script * update readme * Remove handling non-visual loop in llava * Add llava_hf back to registry * Update README.md * Update README.md * update ablation for videomme datasets * chore: Handle ImportError when importing models Handle the ImportError exception when importing models in the lmms_eval package. This change adds a try-except block to catch the ImportError and print an error message indicating the failed import. This will help with troubleshooting and identifying any issues with the model imports. * chore: Remove unused models from lmms_eval package * feat: Allow loading model configurations from other packages * feat: Allow including external tasks from plugins * chore: Add loguru for logging in lmms_eval package * Remove unnecessary lines since use batched visuals now in llava * Add longva * Revise model registry for llava_hf and longva * Delete unnecessary lines * Remove unnecessary lines for video llava * Update pyproject.toml * Update activitynetqa_generation.yaml * Fix vid mme post prompt issue * Add wild vision 0617 * Hardcode to keep image for wild vision * Fixing scoring logic * Fixing dataset name * Fixing handling None filtered score --------- Co-authored-by: cocoshe <[email protected]> Co-authored-by: Bo Li <[email protected]> Co-authored-by: Gagan Bhatia <[email protected]> Co-authored-by: CaraJ7 <[email protected]> Co-authored-by: Li Bo <[email protected]> Co-authored-by: Andrea Tupini <[email protected]> Co-authored-by: Hunter Heidenreich <[email protected]> Co-authored-by: Victor Fragoso <[email protected]> Co-authored-by: AtsuMiyai <[email protected]> Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: Yuan Zhang <[email protected]> Co-authored-by: Yuan Zhang <[email protected]> Co-authored-by: tianyu-z <[email protected]> Co-authored-by: Suyuchen <[email protected]> Co-authored-by: XinrunDu <[email protected]> Co-authored-by: teowu <[email protected]> Co-authored-by: Jingyang <[email protected]> Co-authored-by: Teo (Timothy) Wu Haoning <[email protected]> Co-authored-by: choiszt <[email protected]> Co-authored-by: Lorenzo Mammana <[email protected]>
* feat: Update LMMS evaluation configuration and models - Update `activitynetqa_generation.yaml` to remove `dataset_name` field and update `task` field to "activitynetqa" - Update `utils.py` to add default values for `API_URL` and `API_KEY` when `API_TYPE` is not "openai" or "azure" - Update `batch_gpt4.py` and `gpt4v.py` to rename `max_frames_for_video` parameter to `max_frames_num` - Update `reka.py` to rename `max_frames_for_video` parameter to `max_frames_num` and add support for `continual_mode` with a persistent response cache This commit updates the LMMS evaluation configuration and models to improve compatibility and add new features. * Update LMMS evaluation configuration and models * Update LMMS evaluation configuration and models * feat: Update LMMS evaluation configuration and models - Update `activitynetqa_generation.yaml` to remove `dataset_name` field and update `task` field to "activitynetqa" - Update `utils.py` to add default values for `API_URL` and `API_KEY` when `API_TYPE` is not "openai" or "azure" - Update `batch_gpt4.py` and `gpt4v.py` to rename `max_frames_for_video` parameter to `max_frames_num` - Update `reka.py` to rename `max_frames_for_video` parameter to `max_frames_num` and add support for `continual_mode` with a persistent response cache This commit updates the LMMS evaluation configuration and models to improve compatibility and add new features. * Refactor error handling in GPT4V model evaluation * Refactor error handling in GPT4V model evaluation * Refactor video decoding backend to use "decord" instead of "pyav" * Refactor image aspect ratio handling in Llava_OneVision model * Refactor GPT4V model to fix bug in visuals encoding * add exception for azure gpt
* chore: Update lmms-eval to support video evaluations for LLaVA models * lint
* internvl2 * fix some bugs * fix * lint
* feat: Update LMMS evaluation configuration and models - Update `activitynetqa_generation.yaml` to remove `dataset_name` field and update `task` field to "activitynetqa" - Update `utils.py` to add default values for `API_URL` and `API_KEY` when `API_TYPE` is not "openai" or "azure" - Update `batch_gpt4.py` and `gpt4v.py` to rename `max_frames_for_video` parameter to `max_frames_num` - Update `reka.py` to rename `max_frames_for_video` parameter to `max_frames_num` and add support for `continual_mode` with a persistent response cache This commit updates the LMMS evaluation configuration and models to improve compatibility and add new features. * Update LMMS evaluation configuration and models * Update LMMS evaluation configuration and models * feat: Update LMMS evaluation configuration and models - Update `activitynetqa_generation.yaml` to remove `dataset_name` field and update `task` field to "activitynetqa" - Update `utils.py` to add default values for `API_URL` and `API_KEY` when `API_TYPE` is not "openai" or "azure" - Update `batch_gpt4.py` and `gpt4v.py` to rename `max_frames_for_video` parameter to `max_frames_num` - Update `reka.py` to rename `max_frames_for_video` parameter to `max_frames_num` and add support for `continual_mode` with a persistent response cache This commit updates the LMMS evaluation configuration and models to improve compatibility and add new features. * Refactor error handling in GPT4V model evaluation * Refactor error handling in GPT4V model evaluation * Refactor video decoding backend to use "decord" instead of "pyav" * Refactor image aspect ratio handling in Llava_OneVision model * Refactor GPT4V model to fix bug in visuals encoding * add exception for azure gpt * feat: fix bugs * feat: update * Refactor image aspect ratio handling in Llava_OneVision model * Refactor image aspect ratio handling in Llava_OneVision model * update interleave bench
…ab/lmms-eval-internal into internal_main_dev
Luodian
changed the title
[Video Features] add vila
[Sync Features] add vila, add wildvision, add vibe-eval, add interleave bench
Jul 9, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Before you open a pull-request, please check if a similar issue already exists or has been closed before.
When you open a pull-request, please be sure to include the following
Thank you for your contributions!