Skip to content

Commit

Permalink
[Update] InternLM2.5 (InternLM#752)
Browse files Browse the repository at this point in the history
Co-authored-by: zhangwenwei <[email protected]>
Co-authored-by: ZwwWayne <[email protected]>
Co-authored-by: 张硕 <[email protected]>
Co-authored-by: zhangsongyang <[email protected]>
Co-authored-by: 王子奕 <[email protected]>
Co-authored-by: 曹巍瀚 <[email protected]>
Co-authored-by: tonysy <[email protected]>
Co-authored-by: 李博文 <[email protected]>
  • Loading branch information
9 people authored Jul 3, 2024
1 parent 9943444 commit 3b086d7
Show file tree
Hide file tree
Showing 27 changed files with 810 additions and 430 deletions.
145 changes: 77 additions & 68 deletions README.md

Large diffs are not rendered by default.

146 changes: 80 additions & 66 deletions README_zh-CN.md

Large diffs are not rendered by default.

55 changes: 29 additions & 26 deletions agent/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,77 +4,80 @@ English | [简体中文](README_zh-CN.md)

## Introduction

InternLM-Chat-7B v1.1 has been released as the first open-source model with code interpreter capabilities, supporting external tools such as Python code interpreter and search engine.
InternLM2.5-Chat, open sourced on June 30, 2024, further enhances its capabilities in code interpreter and general tool utilization. With improved and more generalized instruction understanding, tool selection, and reflection abilities, InternLM2.5-Chat can more reliably support complex agents and multi-step tool calling for more intricate tasks. When combined with a code interpreter, InternLM2.5-Chat obtains comparable results to GPT-4 on MATH. Leveraging strong foundational capabilities in mathematics and tools, InternLM2.5-Chat provides practical data analysis capabilities.

InternLM2-Chat, open sourced on January 17, 2024, further enhances its capabilities in code interpreter and general tool utilization. With improved and more generalized instruction understanding, tool selection, and reflection abilities, InternLM2-Chat can more reliably support complex agents and multi-step tool calling for more intricate tasks. InternLM2-Chat exhibits decent computational and reasoning abilities even without external tools, surpassing ChatGPT in mathematical performance. When combined with a code interpreter, InternLM2-Chat-20B obtains comparable results to GPT-4 on GSM8K and MATH. Leveraging strong foundational capabilities in mathematics and tools, InternLM2-Chat provides practical data analysis capabilities.
The results of InternLM2.5-Chat on math code interpreter is as below:

The results of InternLM2-Chat-20B on math code interpreter is as below:

| | GSM8K | MATH |
| :--------------------------------------: | :---: | :---: |
| InternLM2-Chat-20B | 79.6 | 32.5 |
| InternLM2-Chat-20B with Code Interpreter | 84.5 | 51.2 |
| ChatGPT (GPT-3.5) | 78.2 | 28.0 |
| GPT-4 | 91.4 | 45.8 |
| Models | Tool-Integrated | MATH |
| :-----------------: | :-------------: | :--: |
| InternLM2-Chat-7B | w/ | 45.1 |
| InternLM2-Chat-20B | w/ | 51.2 |
| InternLM2.5-7B-Chat | w/ | 63.0 |
| gpt-4-0125-preview | w/o | 64.2 |

## Usages

We offer an example using [Lagent](lagent.md) to build agents based on InternLM2-Chat to call the code interpreter. Firstly install the extra dependencies:
We offer an example using [Lagent](lagent.md) to build agents based on InternLM2.5-Chat to call the code interpreter. Firstly install the extra dependencies:

```bash
pip install -r requirements.txt
```

Run the following script to perform inference and evaluation on GSM8K and MATH test.
Run the following script to perform inference and evaluation on MATH test.

```bash
python streaming_inference.py \
--backend=lmdeploy \ # For HuggingFace models: hf
--model_path=internlm/internlm2-chat-20b \
--tp=2 \
--model_path=internlm/internlm2_5-7b-chat \
--tp=1 \
--temperature=1.0 \
--top_k=1 \
--dataset=math \
--output_path=math_lmdeploy.jsonl \
--do_eval
```

`output_path` is a jsonl format file to save the inference results. Each line is like

```json
````json
{
"idx": 41,
"query": "The point $(a, b)$ lies on the line with the equation $3x + 2y = 12.$ When $a = 4$, what is the value of $b$?",
"gt": "0",
"pred": ["0"],
"problem": "The midpoint of the line segment between $(x,y)$ and $(2,4)$ is $(-7,0)$. Find $(x,y)$.",
"level": "Level 3",
"type": "Algebra",
"solution": "Applying the midpoint formula gives $$\\left(\\frac{2+x}{2},\\frac{4+y}{2}\\right)=(-7,0).$$Solving $\\frac{2+x}{2}=-7$ for $x$ and $\\frac{4+y}{2} = 0$ for $y$ we find $(x,y)$ to be $\\boxed{(-16,-4)}$.",
"idx": 306,
"query": "The midpoint of the line segment between $(x,y)$ and $(2,4)$ is $(-7,0)$. Find $(x,y)$.",
"gt": "(-16,-4)",
"pred": [
"(-16,-4)"
],
"steps": [
{
"role": "language",
"content": ""
"content": "To find the coordinates of the point $(x,y)$, we can use the midpoint formula. The midpoint $M$ of a line segment with endpoints $(x_1, y_1)$ and $(x_2, y_2)$ is given by:\n\n\\[M = \\left(\\frac{x_1 + x_2}{2}, \\frac{y_1 + y_2}{2}\\right)\\]\n\nGiven that the midpoint is $(-7,0)$ and one endpoint is $(2,4)$, we can set up equations to solve for $x$ and $y$.\n\nLet's start by setting up the equations based on the midpoint formula:\n\n\\[\\frac{x + 2}{2} = -7\\]\n\\[\\frac{y + 4}{2} = 0\\]\n\nNow, let's solve these equations using Python to find the values of $x$ and $y$."
},
{
"role": "tool",
"content": {
"name": "IPythonInteractive",
"name": "AsyncIPythonInteractiveManager",
"parameters": {
"command": "```python\nfrom sympy import symbols, solve\n\ndef find_b():\n x, y = symbols('x y')\n equation = 3*x + 2*y - 12\n b = solve(equation.subs(x, 4), y)[0]\n\n return b\n\nresult = find_b()\nprint(result)\n```"
"command": "```python\nfrom sympy import symbols, Eq, solve\n\n# Define symbols\nx, y = symbols('x y')\n\n# Define equations based on the midpoint formula\neq1 = Eq((x + 2)/2, -7)\neq2 = Eq((y + 4)/2, 0)\n\n# Solve equations\nsolution = solve((eq1, eq2), (x, y))\n\nsolution\n```"
}
},
"name": "interpreter"
},
{
"role": "environment",
"content": "0",
"content": "{x: -16, y: -4}",
"name": "interpreter"
},
{
"role": "language",
"content": "The value of $b$ when $a = 4$ is $\\boxed{0}$."
"content": "After solving the equations, we find that the coordinates of the point $(x,y)$ are $(-16, -4)$. Therefore, the solution to the problem is:\n\n\\[\\boxed{(-16, -4)}\\]"
}
],
"error": null
}
```
````

Once it is prepared, just skip the inference stage as follows.

Expand Down
55 changes: 29 additions & 26 deletions agent/README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,77 +4,80 @@

## 简介

InternLM-Chat-7B v1.1 是首个具有代码解释能力的开源对话模型,支持 Python 解释器和搜索引擎等外部工具
InternLM2.5-Chat 在代码解释和通用工具调用方面的能力得到进一步提升。基于更强和更具有泛化性的指令理解、工具筛选与结果反思等能力,新版模型可以更可靠地支持复杂智能体的搭建,支持对工具进行有效的多轮调用,完成较复杂的任务。在配合代码解释器(code-interpreter)的条件下,InternLM2.5-Chat 在 MATH 上可以达到和 GPT-4 相仿的水平。基于在数理和工具方面强大的基础能力,InternLM2.5-Chat 提供了实用的数据分析能力

InternLM2-Chat 进一步提高了它在代码解释和通用工具调用方面的能力。基于更强和更具有泛化性的指令理解、工具筛选与结果反思等能力,新版模型可以更可靠地支持复杂智能体的搭建,支持对工具进行有效的多轮调用,完成较复杂的任务。模型在不使用外部工具的条件下已具备不错的计算能力和推理能力,数理表现超过 ChatGPT;在配合代码解释器(code-interpreter)的条件下,InternLM2-Chat-20B 在 GSM8K 和 MATH 上可以达到和 GPT-4 相仿的水平。基于在数理和工具方面强大的基础能力,InternLM2-Chat 提供了实用的数据分析能力
以下是 InternLM2.5-Chat 在数学代码解释器上的结果

以下是 InternLM2-Chat-20B 在数学代码解释器上的结果。

| | GSM8K | MATH |
| :---------------------------------: | :---: | :---: |
| InternLM2-Chat-20B 单纯依靠内在能力 | 79.6 | 32.5 |
| InternLM2-Chat-20B 配合代码解释器 | 84.5 | 51.2 |
| ChatGPT (GPT-3.5) | 78.2 | 28.0 |
| GPT-4 | 91.4 | 45.8 |
| 模型 | 是否集成工具 | MATH |
| :-----------------: | :----------: | :--: |
| InternLM2-Chat-7B | w/ | 45.1 |
| InternLM2-Chat-20B | w/ | 51.2 |
| InternLM2.5-7B-Chat | w/ | 63.0 |
| gpt-4-0125-preview | w/o | 64.2 |

## 体验

我们提供了使用 [Lagent](lagent_zh-CN.md) 来基于 InternLM2-Chat 构建智能体调用代码解释器的例子。首先安装额外依赖:
我们提供了使用 [Lagent](lagent_zh-CN.md) 来基于 InternLM2.5-Chat 构建智能体调用代码解释器的例子。首先安装额外依赖:

```bash
pip install -r requirements.txt
```

运行以下脚本在 GSM8K 和 MATH 测试集上进行推理和评估:
运行以下脚本在 MATH 测试集上进行推理和评估:

```bash
python streaming_inference.py \
--backend=lmdeploy \ # For HuggingFace models: hf
--model_path=internlm/internlm2-chat-20b \
--tp=2 \
--model_path=internlm/internlm2_5-7b-chat \
--tp=1 \
--temperature=1.0 \
--top_k=1 \
--dataset=math \
--output_path=math_lmdeploy.jsonl \
--do_eval
```

`output_path` 是一个存储推理结果的 jsonl 格式文件,每行形如:

```json
````json
{
"idx": 41,
"query": "The point $(a, b)$ lies on the line with the equation $3x + 2y = 12.$ When $a = 4$, what is the value of $b$?",
"gt": "0",
"pred": ["0"],
"problem": "The midpoint of the line segment between $(x,y)$ and $(2,4)$ is $(-7,0)$. Find $(x,y)$.",
"level": "Level 3",
"type": "Algebra",
"solution": "Applying the midpoint formula gives $$\\left(\\frac{2+x}{2},\\frac{4+y}{2}\\right)=(-7,0).$$Solving $\\frac{2+x}{2}=-7$ for $x$ and $\\frac{4+y}{2} = 0$ for $y$ we find $(x,y)$ to be $\\boxed{(-16,-4)}$.",
"idx": 306,
"query": "The midpoint of the line segment between $(x,y)$ and $(2,4)$ is $(-7,0)$. Find $(x,y)$.",
"gt": "(-16,-4)",
"pred": [
"(-16,-4)"
],
"steps": [
{
"role": "language",
"content": ""
"content": "To find the coordinates of the point $(x,y)$, we can use the midpoint formula. The midpoint $M$ of a line segment with endpoints $(x_1, y_1)$ and $(x_2, y_2)$ is given by:\n\n\\[M = \\left(\\frac{x_1 + x_2}{2}, \\frac{y_1 + y_2}{2}\\right)\\]\n\nGiven that the midpoint is $(-7,0)$ and one endpoint is $(2,4)$, we can set up equations to solve for $x$ and $y$.\n\nLet's start by setting up the equations based on the midpoint formula:\n\n\\[\\frac{x + 2}{2} = -7\\]\n\\[\\frac{y + 4}{2} = 0\\]\n\nNow, let's solve these equations using Python to find the values of $x$ and $y$."
},
{
"role": "tool",
"content": {
"name": "IPythonInteractive",
"name": "AsyncIPythonInteractiveManager",
"parameters": {
"command": "```python\nfrom sympy import symbols, solve\n\ndef find_b():\n x, y = symbols('x y')\n equation = 3*x + 2*y - 12\n b = solve(equation.subs(x, 4), y)[0]\n\n return b\n\nresult = find_b()\nprint(result)\n```"
"command": "```python\nfrom sympy import symbols, Eq, solve\n\n# Define symbols\nx, y = symbols('x y')\n\n# Define equations based on the midpoint formula\neq1 = Eq((x + 2)/2, -7)\neq2 = Eq((y + 4)/2, 0)\n\n# Solve equations\nsolution = solve((eq1, eq2), (x, y))\n\nsolution\n```"
}
},
"name": "interpreter"
},
{
"role": "environment",
"content": "0",
"content": "{x: -16, y: -4}",
"name": "interpreter"
},
{
"role": "language",
"content": "The value of $b$ when $a = 4$ is $\\boxed{0}$."
"content": "After solving the equations, we find that the coordinates of the point $(x,y)$ are $(-16, -4)$. Therefore, the solution to the problem is:\n\n\\[\\boxed{(-16, -4)}\\]"
}
],
"error": null
}
```
````

如果已经准备好了该文件,可直接跳过推理阶段进行评估:

Expand Down
4 changes: 2 additions & 2 deletions agent/lagent.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ Then you can chat through the UI shown as below

![image](https://github.com/InternLM/lagent/assets/24622904/3aebb8b4-07d1-42a2-9da3-46080c556f68)

## Run a ReAct agent with InternLM2-Chat
## Run a ReAct agent with InternLM2.5-Chat

**NOTE:** If you want to run a HuggingFace model, please run `pip install -e .[all]` first.

Expand All @@ -49,7 +49,7 @@ from lagent.actions import ActionExecutor, GoogleSearch, PythonInterpreter
from lagent.llms import HFTransformer

# Initialize the HFTransformer-based Language Model (llm) and provide the model name.
llm = HFTransformer('internlm/internlm2-chat-7b')
llm = HFTransformer('internlm/internlm2_5-7b-chat')

# Initialize the Google Search tool and provide your API key.
search_tool = GoogleSearch(api_key='Your SERPER_API_KEY')
Expand Down
4 changes: 2 additions & 2 deletions agent/lagent_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ streamlit run examples/react_web_demo.py

![image](https://github.com/InternLM/lagent/assets/24622904/3aebb8b4-07d1-42a2-9da3-46080c556f68)

## InternLM-Chat 构建一个 ReAct 智能体
## InternLM2.5-Chat 构建一个 ReAct 智能体

\*\*注意:\*\*如果你想要启动一个 HuggingFace 的模型,请先运行 pip install -e .\[all\]

Expand All @@ -49,7 +49,7 @@ from lagent.actions import ActionExecutor, GoogleSearch, PythonInterpreter
from lagent.llms import HFTransformer

# Initialize the HFTransformer-based Language Model (llm) and provide the model name.
llm = HFTransformer('internlm/internlm-chat-7b-v1_1')
llm = HFTransformer('internlm/internlm2_5-7b-chat')

# Initialize the Google Search tool and provide your API key.
search_tool = GoogleSearch(api_key='Your SERPER_API_KEY')
Expand Down
10 changes: 5 additions & 5 deletions agent/pal_inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -189,8 +189,8 @@ def generate_interactive(
generation_config.max_length = generation_config.max_new_tokens + input_ids_seq_length
if not has_default_max_length:
logger.warn( # pylint: disable=W4902
f"Both `max_new_tokens` (={generation_config.max_new_tokens}) and `max_length`(="
f"{generation_config.max_length}) seem to have been set. `max_new_tokens` will take precedence. "
f'Both `max_new_tokens` (={generation_config.max_new_tokens}) and `max_length`(='
f'{generation_config.max_length}) seem to have been set. `max_new_tokens` will take precedence. '
'Please refer to the documentation for more information. '
'(https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)',
UserWarning,
Expand All @@ -199,8 +199,8 @@ def generate_interactive(
if input_ids_seq_length >= generation_config.max_length:
input_ids_string = 'input_ids'
logger.warning(
f"Input length of {input_ids_string} is {input_ids_seq_length}, but `max_length` is set to"
f" {generation_config.max_length}. This can lead to unexpected behavior. You should consider"
f'Input length of {input_ids_string} is {input_ids_seq_length}, but `max_length` is set to'
f' {generation_config.max_length}. This can lead to unexpected behavior. You should consider'
' increasing `max_new_tokens`.')

# 2. Set generation parameters if not already defined
Expand Down Expand Up @@ -510,7 +510,7 @@ def main():
interface.clear_history()
f.flush()

print(f"{args.model}: Accuracy - {sum(scores) / len(scores)}")
print(f'{args.model}: Accuracy - {sum(scores) / len(scores)}')
torch.cuda.empty_cache()


Expand Down
12 changes: 6 additions & 6 deletions agent/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
lmdeploy>=0.2.2
antlr4-python3-runtime==4.11.0
datasets
tqdm
einops
jsonlines
lagent @ git+https://github.com/InternLM/lagent@main
lmdeploy>=0.2.2
numpy
pebble
jsonlines
sympy==1.12
antlr4-python3-runtime==4.11.0
lagent
einops
tqdm
Loading

0 comments on commit 3b086d7

Please sign in to comment.