[Update] InternLM2.5 (InternLM#752)

Co-authored-by: zhangwenwei <[email protected]> Co-authored-by: ZwwWayne <[email protected]> Co-authored-by: 张硕 <[email protected]> Co-authored-by: zhangsongyang <[email protected]> Co-authored-by: 王子奕 <[email protected]> Co-authored-by: 曹巍瀚 <[email protected]> Co-authored-by: tonysy <[email protected]> Co-authored-by: 李博文 <[email protected]>
deep-practice · Jul 3, 2024 · 3b086d7 · 3b086d7
1 parent 9943444
commit 3b086d7
Show file tree

Hide file tree

Showing 27 changed files with 810 additions and 430 deletions.
diff --git a/README.md b/README.md
diff --git a/README_zh-CN.md b/README_zh-CN.md
diff --git a/agent/README.md b/agent/README.md
@@ -4,77 +4,80 @@ English | [简体中文](README_zh-CN.md)
 
 ## Introduction
 
-InternLM-Chat-7B v1.1 has been released as the first open-source model with code interpreter capabilities, supporting external tools such as Python code interpreter and search engine.
+InternLM2.5-Chat, open sourced on June 30, 2024, further enhances its capabilities in code interpreter and general tool utilization. With improved and more generalized instruction understanding, tool selection, and reflection abilities, InternLM2.5-Chat can more reliably support complex agents and multi-step tool calling for more intricate tasks. When combined with a code interpreter, InternLM2.5-Chat obtains comparable results to GPT-4 on MATH. Leveraging strong foundational capabilities in mathematics and tools, InternLM2.5-Chat provides practical data analysis capabilities.
 
-InternLM2-Chat, open sourced on January 17, 2024, further enhances its capabilities in code interpreter and general tool utilization. With improved and more generalized instruction understanding, tool selection, and reflection abilities, InternLM2-Chat can more reliably support complex agents and multi-step tool calling for more intricate tasks. InternLM2-Chat exhibits decent computational and reasoning abilities even without external tools, surpassing ChatGPT in mathematical performance. When combined with a code interpreter, InternLM2-Chat-20B obtains comparable results to GPT-4 on GSM8K and MATH. Leveraging strong foundational capabilities in mathematics and tools, InternLM2-Chat provides practical data analysis capabilities.
+The results of InternLM2.5-Chat on math code interpreter is as below:
 
-The results of InternLM2-Chat-20B on math code interpreter is as below:
-
-|                                          | GSM8K | MATH  |
-| :--------------------------------------: | :---: | :---: |
-|            InternLM2-Chat-20B            | 79.6  | 32.5  |
-| InternLM2-Chat-20B with Code Interpreter | 84.5  | 51.2  |
-|            ChatGPT (GPT-3.5)             | 78.2  | 28.0  |
-|                  GPT-4                   | 91.4  | 45.8  |
+|       Models        | Tool-Integrated | MATH |
+| :-----------------: | :-------------: | :--: |
+|  InternLM2-Chat-7B  |       w/        | 45.1 |
+| InternLM2-Chat-20B  |       w/        | 51.2 |
+| InternLM2.5-7B-Chat |       w/        | 63.0 |
+| gpt-4-0125-preview  |       w/o       | 64.2 |
 
 ## Usages
 
-We offer an example using [Lagent](lagent.md) to build agents based on InternLM2-Chat to call the code interpreter. Firstly install the extra dependencies:
+We offer an example using [Lagent](lagent.md) to build agents based on InternLM2.5-Chat to call the code interpreter. Firstly install the extra dependencies:
 
 ```bash
 pip install -r requirements.txt
 ```
 
-Run the following script to perform inference and evaluation on GSM8K and MATH test.
+Run the following script to perform inference and evaluation on MATH test.
 
 ```bash
 python streaming_inference.py \
   --backend=lmdeploy \  # For HuggingFace models: hf
-  --model_path=internlm/internlm2-chat-20b \
-  --tp=2 \
+  --model_path=internlm/internlm2_5-7b-chat \
+  --tp=1 \
   --temperature=1.0 \
   --top_k=1 \
-  --dataset=math \
   --output_path=math_lmdeploy.jsonl \
   --do_eval
 ```
 
 `output_path` is a jsonl format file to save the inference results. Each line is like
 
-```json
+````json
 {
-    "idx": 41, 
-    "query": "The point $(a, b)$ lies on the line with the equation $3x + 2y = 12.$ When $a = 4$, what is the value of $b$?",
-    "gt": "0",
-    "pred": ["0"],
+    "problem": "The midpoint of the line segment between $(x,y)$ and $(2,4)$ is $(-7,0)$. Find $(x,y)$.",
+    "level": "Level 3",
+    "type": "Algebra",
+    "solution": "Applying the midpoint formula gives  $$\\left(\\frac{2+x}{2},\\frac{4+y}{2}\\right)=(-7,0).$$Solving $\\frac{2+x}{2}=-7$ for $x$ and $\\frac{4+y}{2} = 0$ for $y$ we find $(x,y)$ to be $\\boxed{(-16,-4)}$.",
+    "idx": 306,
+    "query": "The midpoint of the line segment between $(x,y)$ and $(2,4)$ is $(-7,0)$. Find $(x,y)$.",
+    "gt": "(-16,-4)",
+    "pred": [
+        "(-16,-4)"
+    ],
     "steps": [
         {
             "role": "language",
-            "content": ""
+            "content": "To find the coordinates of the point $(x,y)$, we can use the midpoint formula. The midpoint $M$ of a line segment with endpoints $(x_1, y_1)$ and $(x_2, y_2)$ is given by:\n\n\\[M = \\left(\\frac{x_1 + x_2}{2}, \\frac{y_1 + y_2}{2}\\right)\\]\n\nGiven that the midpoint is $(-7,0)$ and one endpoint is $(2,4)$, we can set up equations to solve for $x$ and $y$.\n\nLet's start by setting up the equations based on the midpoint formula:\n\n\\[\\frac{x + 2}{2} = -7\\]\n\\[\\frac{y + 4}{2} = 0\\]\n\nNow, let's solve these equations using Python to find the values of $x$ and $y$."
         },
         {
             "role": "tool",
             "content": {
-                "name": "IPythonInteractive",
+                "name": "AsyncIPythonInteractiveManager",
                 "parameters": {
-                    "command": "```python\nfrom sympy import symbols, solve\n\ndef find_b():\n    x, y = symbols('x y')\n    equation = 3*x + 2*y - 12\n    b = solve(equation.subs(x, 4), y)[0]\n\n    return b\n\nresult = find_b()\nprint(result)\n```"
+                    "command": "```python\nfrom sympy import symbols, Eq, solve\n\n# Define symbols\nx, y = symbols('x y')\n\n# Define equations based on the midpoint formula\neq1 = Eq((x + 2)/2, -7)\neq2 = Eq((y + 4)/2, 0)\n\n# Solve equations\nsolution = solve((eq1, eq2), (x, y))\n\nsolution\n```"
                 }
             },
             "name": "interpreter"
         },
         {
             "role": "environment",
-            "content": "0",
+            "content": "{x: -16, y: -4}",
             "name": "interpreter"
         },
         {
             "role": "language",
-            "content": "The value of $b$ when $a = 4$ is $\\boxed{0}$."
+            "content": "After solving the equations, we find that the coordinates of the point $(x,y)$ are $(-16, -4)$. Therefore, the solution to the problem is:\n\n\\[\\boxed{(-16, -4)}\\]"
         }
     ],
     "error": null
 }
-```
+````
 
 Once it is prepared, just skip the inference stage as follows.
 

diff --git a/agent/README_zh-CN.md b/agent/README_zh-CN.md
@@ -4,77 +4,80 @@
 
 ## 简介
 
-InternLM-Chat-7B v1.1 是首个具有代码解释能力的开源对话模型，支持 Python 解释器和搜索引擎等外部工具。
+InternLM2.5-Chat 在代码解释和通用工具调用方面的能力得到进一步提升。基于更强和更具有泛化性的指令理解、工具筛选与结果反思等能力，新版模型可以更可靠地支持复杂智能体的搭建，支持对工具进行有效的多轮调用，完成较复杂的任务。在配合代码解释器（code-interpreter）的条件下，InternLM2.5-Chat 在 MATH 上可以达到和 GPT-4 相仿的水平。基于在数理和工具方面强大的基础能力，InternLM2.5-Chat 提供了实用的数据分析能力。
 
-InternLM2-Chat 进一步提高了它在代码解释和通用工具调用方面的能力。基于更强和更具有泛化性的指令理解、工具筛选与结果反思等能力，新版模型可以更可靠地支持复杂智能体的搭建，支持对工具进行有效的多轮调用，完成较复杂的任务。模型在不使用外部工具的条件下已具备不错的计算能力和推理能力，数理表现超过 ChatGPT；在配合代码解释器（code-interpreter）的条件下，InternLM2-Chat-20B 在 GSM8K 和 MATH 上可以达到和 GPT-4 相仿的水平。基于在数理和工具方面强大的基础能力，InternLM2-Chat 提供了实用的数据分析能力。
+以下是 InternLM2.5-Chat 在数学代码解释器上的结果。
 
-以下是 InternLM2-Chat-20B 在数学代码解释器上的结果。
-
-|                                     | GSM8K | MATH  |
-| :---------------------------------: | :---: | :---: |
-| InternLM2-Chat-20B 单纯依靠内在能力 | 79.6  | 32.5  |
-|  InternLM2-Chat-20B 配合代码解释器  | 84.5  | 51.2  |
-|          ChatGPT (GPT-3.5)          | 78.2  | 28.0  |
-|                GPT-4                | 91.4  | 45.8  |
+|        模型         | 是否集成工具 | MATH |
+| :-----------------: | :----------: | :--: |
+|  InternLM2-Chat-7B  |      w/      | 45.1 |
+| InternLM2-Chat-20B  |      w/      | 51.2 |
+| InternLM2.5-7B-Chat |      w/      | 63.0 |
+| gpt-4-0125-preview  |     w/o      | 64.2 |
 
 ## 体验
 
-我们提供了使用 [Lagent](lagent_zh-CN.md) 来基于 InternLM2-Chat 构建智能体调用代码解释器的例子。首先安装额外依赖：
+我们提供了使用 [Lagent](lagent_zh-CN.md) 来基于 InternLM2.5-Chat 构建智能体调用代码解释器的例子。首先安装额外依赖：
 
 ```bash
 pip install -r requirements.txt
 ```
 
-运行以下脚本在 GSM8K 和 MATH 测试集上进行推理和评估：
+运行以下脚本在 MATH 测试集上进行推理和评估：
 
 ```bash
 python streaming_inference.py \
   --backend=lmdeploy \  # For HuggingFace models: hf
-  --model_path=internlm/internlm2-chat-20b \
-  --tp=2 \
+  --model_path=internlm/internlm2_5-7b-chat \
+  --tp=1 \
   --temperature=1.0 \
   --top_k=1 \
-  --dataset=math \
   --output_path=math_lmdeploy.jsonl \
   --do_eval
 ```
 
 `output_path` 是一个存储推理结果的 jsonl 格式文件，每行形如：
 
-```json
+````json
 {
-    "idx": 41, 
-    "query": "The point $(a, b)$ lies on the line with the equation $3x + 2y = 12.$ When $a = 4$, what is the value of $b$?",
-    "gt": "0",
-    "pred": ["0"],
+    "problem": "The midpoint of the line segment between $(x,y)$ and $(2,4)$ is $(-7,0)$. Find $(x,y)$.",
+    "level": "Level 3",
+    "type": "Algebra",
+    "solution": "Applying the midpoint formula gives  $$\\left(\\frac{2+x}{2},\\frac{4+y}{2}\\right)=(-7,0).$$Solving $\\frac{2+x}{2}=-7$ for $x$ and $\\frac{4+y}{2} = 0$ for $y$ we find $(x,y)$ to be $\\boxed{(-16,-4)}$.",
+    "idx": 306,
+    "query": "The midpoint of the line segment between $(x,y)$ and $(2,4)$ is $(-7,0)$. Find $(x,y)$.",
+    "gt": "(-16,-4)",
+    "pred": [
+        "(-16,-4)"
+    ],
     "steps": [
         {
             "role": "language",
-            "content": ""
+            "content": "To find the coordinates of the point $(x,y)$, we can use the midpoint formula. The midpoint $M$ of a line segment with endpoints $(x_1, y_1)$ and $(x_2, y_2)$ is given by:\n\n\\[M = \\left(\\frac{x_1 + x_2}{2}, \\frac{y_1 + y_2}{2}\\right)\\]\n\nGiven that the midpoint is $(-7,0)$ and one endpoint is $(2,4)$, we can set up equations to solve for $x$ and $y$.\n\nLet's start by setting up the equations based on the midpoint formula:\n\n\\[\\frac{x + 2}{2} = -7\\]\n\\[\\frac{y + 4}{2} = 0\\]\n\nNow, let's solve these equations using Python to find the values of $x$ and $y$."
         },
         {
             "role": "tool",
             "content": {
-                "name": "IPythonInteractive",
+                "name": "AsyncIPythonInteractiveManager",
                 "parameters": {
-                    "command": "```python\nfrom sympy import symbols, solve\n\ndef find_b():\n    x, y = symbols('x y')\n    equation = 3*x + 2*y - 12\n    b = solve(equation.subs(x, 4), y)[0]\n\n    return b\n\nresult = find_b()\nprint(result)\n```"
+                    "command": "```python\nfrom sympy import symbols, Eq, solve\n\n# Define symbols\nx, y = symbols('x y')\n\n# Define equations based on the midpoint formula\neq1 = Eq((x + 2)/2, -7)\neq2 = Eq((y + 4)/2, 0)\n\n# Solve equations\nsolution = solve((eq1, eq2), (x, y))\n\nsolution\n```"
                 }
             },
             "name": "interpreter"
         },
         {
             "role": "environment",
-            "content": "0",
+            "content": "{x: -16, y: -4}",
             "name": "interpreter"
         },
         {
             "role": "language",
-            "content": "The value of $b$ when $a = 4$ is $\\boxed{0}$."
+            "content": "After solving the equations, we find that the coordinates of the point $(x,y)$ are $(-16, -4)$. Therefore, the solution to the problem is:\n\n\\[\\boxed{(-16, -4)}\\]"
         }
     ],
     "error": null
 }
-```
+````
 
 如果已经准备好了该文件，可直接跳过推理阶段进行评估：
 

diff --git a/agent/lagent.md b/agent/lagent.md
@@ -38,7 +38,7 @@ Then you can chat through the UI shown as below
 
 ![image](https://github.com/InternLM/lagent/assets/24622904/3aebb8b4-07d1-42a2-9da3-46080c556f68)
 
-## Run a ReAct agent with InternLM2-Chat
+## Run a ReAct agent with InternLM2.5-Chat
 
 **NOTE:** If you want to run a HuggingFace model, please run `pip install -e .[all]` first.
 
@@ -49,7 +49,7 @@ from lagent.actions import ActionExecutor, GoogleSearch, PythonInterpreter
 from lagent.llms import HFTransformer
 
 # Initialize the HFTransformer-based Language Model (llm) and provide the model name.
-llm = HFTransformer('internlm/internlm2-chat-7b')
+llm = HFTransformer('internlm/internlm2_5-7b-chat')
 
 # Initialize the Google Search tool and provide your API key.
 search_tool = GoogleSearch(api_key='Your SERPER_API_KEY')

diff --git a/agent/lagent_zh-CN.md b/agent/lagent_zh-CN.md
@@ -38,7 +38,7 @@ streamlit run examples/react_web_demo.py
 
 ![image](https://github.com/InternLM/lagent/assets/24622904/3aebb8b4-07d1-42a2-9da3-46080c556f68)
 
-## 用 InternLM-Chat 构建一个 ReAct 智能体
+## 用 InternLM2.5-Chat 构建一个 ReAct 智能体
 
 \*\*注意：\*\*如果你想要启动一个 HuggingFace 的模型，请先运行 pip install -e .\[all\]。
 
@@ -49,7 +49,7 @@ from lagent.actions import ActionExecutor, GoogleSearch, PythonInterpreter
 from lagent.llms import HFTransformer
 
 # Initialize the HFTransformer-based Language Model (llm) and provide the model name.
-llm = HFTransformer('internlm/internlm-chat-7b-v1_1')
+llm = HFTransformer('internlm/internlm2_5-7b-chat')
 
 # Initialize the Google Search tool and provide your API key.
 search_tool = GoogleSearch(api_key='Your SERPER_API_KEY')

diff --git a/agent/pal_inference.py b/agent/pal_inference.py
@@ -189,8 +189,8 @@ def generate_interactive(
         generation_config.max_length = generation_config.max_new_tokens + input_ids_seq_length
         if not has_default_max_length:
             logger.warn(  # pylint: disable=W4902
-                f"Both `max_new_tokens` (={generation_config.max_new_tokens}) and `max_length`(="
-                f"{generation_config.max_length}) seem to have been set. `max_new_tokens` will take precedence. "
+                f'Both `max_new_tokens` (={generation_config.max_new_tokens}) and `max_length`(='
+                f'{generation_config.max_length}) seem to have been set. `max_new_tokens` will take precedence. '
                 'Please refer to the documentation for more information. '
                 '(https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)',
                 UserWarning,
@@ -199,8 +199,8 @@ def generate_interactive(
     if input_ids_seq_length >= generation_config.max_length:
         input_ids_string = 'input_ids'
         logger.warning(
-            f"Input length of {input_ids_string} is {input_ids_seq_length}, but `max_length` is set to"
-            f" {generation_config.max_length}. This can lead to unexpected behavior. You should consider"
+            f'Input length of {input_ids_string} is {input_ids_seq_length}, but `max_length` is set to'
+            f' {generation_config.max_length}. This can lead to unexpected behavior. You should consider'
             ' increasing `max_new_tokens`.')
 
     # 2. Set generation parameters if not already defined
@@ -510,7 +510,7 @@ def main():
             interface.clear_history()
             f.flush()
 
-    print(f"{args.model}: Accuracy - {sum(scores) / len(scores)}")
+    print(f'{args.model}: Accuracy - {sum(scores) / len(scores)}')
     torch.cuda.empty_cache()
 
 

diff --git a/agent/requirements.txt b/agent/requirements.txt
@@ -1,10 +1,10 @@
-lmdeploy>=0.2.2
+antlr4-python3-runtime==4.11.0
 datasets
-tqdm
+einops
+jsonlines
+lagent @ git+https://github.com/InternLM/lagent@main
+lmdeploy>=0.2.2
 numpy
 pebble
-jsonlines
 sympy==1.12
-antlr4-python3-runtime==4.11.0
-lagent
-einops
+tqdm