-
Notifications
You must be signed in to change notification settings - Fork 62
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #285 from OpenRL-Lab/main
v0.2.0
- Loading branch information
Showing
132 changed files
with
3,999 additions
and
6,566 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
<div align="center"> | ||
<a href="https://openrl-docs.readthedocs.io/zh/latest/index.html"><img width="450px" height="auto" src="docs/images/openrl_text.png"></a> | ||
<a href="https://openrl-docs.readthedocs.io/"><img width="450px" height="auto" src="docs/images/openrl_text.png"></a> | ||
</div> | ||
|
||
--- | ||
|
@@ -25,10 +25,10 @@ | |
[![Contributors](https://img.shields.io/github/contributors/OpenRL-Lab/openrl)](https://github.com/OpenRL-Lab/openrl/graphs/contributors) | ||
[![GitHub license](https://img.shields.io/github/license/OpenRL-Lab/openrl)](https://github.com/OpenRL-Lab/openrl/blob/master/LICENSE) | ||
|
||
[![Embark](https://img.shields.io/badge/discord-OpenRL-%237289da.svg?logo=discord)](https://discord.gg/guvAS2up) | ||
[![Embark](https://img.shields.io/badge/discord-OpenRL-%237289da.svg?logo=discord)](https://discord.gg/qMbVT2qBhr) | ||
[![slack badge](https://img.shields.io/badge/Slack-join-blueviolet?logo=slack&)](https://join.slack.com/t/openrlhq/shared_invite/zt-1tqwpvthd-Eeh0IxQ~DIaGqYXoW2IUQg) | ||
|
||
OpenRL-v0.1.7 is updated on Sep 21, 2023 | ||
OpenRL-v0.2.0 is updated on Dec 20, 2023 | ||
|
||
The main branch is the latest version of OpenRL, which is under active development. If you just want to have a try with | ||
OpenRL, you can switch to the stable branch. | ||
|
@@ -58,6 +58,8 @@ Currently, the features supported by OpenRL include: | |
|
||
- Reinforcement learning training support for natural language tasks (such as dialogue) | ||
|
||
- Support [DeepSpeed](https://github.com/microsoft/DeepSpeed) | ||
|
||
- Support [Arena](https://openrl-docs.readthedocs.io/en/latest/arena/index.html) , which allows convenient evaluation of | ||
various agents (even submissions for [JiDi](https://openrl-docs.readthedocs.io/en/latest/arena/index.html#performing-local-evaluation-of-agents-submitted-to-the-jidi-platform-using-openrl)) in a competitive environment. | ||
|
||
|
@@ -160,19 +162,19 @@ Here we provide a table for the comparison of OpenRL and existing popular RL lib | |
OpenRL employs a modular design and high-level abstraction, allowing users to accomplish training for various tasks | ||
through a unified and user-friendly interface. | ||
|
||
| Library | NLP/RLHF | Multi-agent | Self-Play Training | Offline RL | Bilingual Document | | ||
|:------------------------------------------------------------------:|:------------------:|:--------------------:|:--------------------:|:------------------:|:------------------:| | ||
| **[OpenRL](https://github.com/OpenRL-Lab/openrl)** | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | | ||
| [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3) | :x: | :x: | :x: | :x: | :x: | | ||
| [Ray/RLlib](https://github.com/ray-project/ray/tree/master/rllib/) | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :x: | | ||
| [DI-engine](https://github.com/opendilab/DI-engine/) | :x: | :heavy_check_mark: | not fullly supported | :heavy_check_mark: | :heavy_check_mark: | | ||
| [Tianshou](https://github.com/thu-ml/tianshou) | :x: | not fullly supported | not fullly supported | :heavy_check_mark: | :heavy_check_mark: | | ||
| [MARLlib](https://github.com/Replicable-MARL/MARLlib) | :x: | :heavy_check_mark: | not fullly supported | :x: | :x: | | ||
| [MAPPO Benchmark](https://github.com/marlbenchmark/on-policy) | :x: | :heavy_check_mark: | :x: | :x: | :x: | | ||
| [RL4LMs](https://github.com/allenai/RL4LMs) | :heavy_check_mark: | :x: | :x: | :x: | :x: | | ||
| [trlx](https://github.com/CarperAI/trlx) | :heavy_check_mark: | :x: | :x: | :x: | :x: | | ||
| [trl](https://github.com/huggingface/trl) | :heavy_check_mark: | :x: | :x: | :x: | :x: | | ||
| [TimeChamber](https://github.com/inspirai/TimeChamber) | :x: | :x: | :heavy_check_mark: | :x: | :x: | | ||
| Library | NLP/RLHF | Multi-agent | Self-Play Training | Offline RL | [DeepSpeed](https://github.com/microsoft/DeepSpeed) | | ||
|:------------------------------------------------------------------:|:------------------:|:--------------------:|:--------------------:|:------------------:|:--------------------:| | ||
| **[OpenRL](https://github.com/OpenRL-Lab/openrl)** | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | | ||
| [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3) | :x: | :x: | :x: | :x: | :x: | | ||
| [Ray/RLlib](https://github.com/ray-project/ray/tree/master/rllib/) | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :x: | | ||
| [DI-engine](https://github.com/opendilab/DI-engine/) | :x: | :heavy_check_mark: | not fullly supported | :heavy_check_mark: | :x: | | ||
| [Tianshou](https://github.com/thu-ml/tianshou) | :x: | not fullly supported | not fullly supported | :heavy_check_mark: | :x: | | ||
| [MARLlib](https://github.com/Replicable-MARL/MARLlib) | :x: | :heavy_check_mark: | not fullly supported | :x: | :x: | | ||
| [MAPPO Benchmark](https://github.com/marlbenchmark/on-policy) | :x: | :heavy_check_mark: | :x: | :x: | :x: | | ||
| [RL4LMs](https://github.com/allenai/RL4LMs) | :heavy_check_mark: | :x: | :x: | :x: | :x: | | ||
| [trlx](https://github.com/CarperAI/trlx) | :heavy_check_mark: | :x: | :x: | :x: | :heavy_check_mark: | | ||
| [trl](https://github.com/huggingface/trl) | :heavy_check_mark: | :x: | :x: | :x: | :heavy_check_mark: | | ||
| [TimeChamber](https://github.com/inspirai/TimeChamber) | :x: | :x: | :heavy_check_mark: | :x: | :x: | | ||
|
||
## Installation | ||
|
||
|
@@ -333,7 +335,7 @@ If you are using OpenRL in your research project, you are also welcome to join t | |
|
||
- Join the [slack](https://join.slack.com/t/openrlhq/shared_invite/zt-1tqwpvthd-Eeh0IxQ~DIaGqYXoW2IUQg) group to discuss | ||
OpenRL usage and development with us. | ||
- Join the [Discord](https://discord.gg/guvAS2up) group to discuss OpenRL usage and development with us. | ||
- Join the [Discord](https://discord.gg/qMbVT2qBhr) group to discuss OpenRL usage and development with us. | ||
- Send an E-mail to: [[email protected]]([email protected]) | ||
- Join the [GitHub Discussion](https://github.com/orgs/OpenRL-Lab/discussions). | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
<div align="center"> | ||
<a href="https://openrl-docs.readthedocs.io/zh/latest/index.html"><img width="450px" height="auto" src="docs/images/openrl_text.png"></a> | ||
<a href="https://openrl-docs.readthedocs.io/"><img width="450px" height="auto" src="docs/images/openrl_text.png"></a> | ||
</div> | ||
|
||
|
||
|
@@ -26,10 +26,10 @@ | |
[![Contributors](https://img.shields.io/github/contributors/OpenRL-Lab/openrl)](https://github.com/OpenRL-Lab/openrl/graphs/contributors) | ||
[![GitHub license](https://img.shields.io/github/license/OpenRL-Lab/openrl)](https://github.com/OpenRL-Lab/openrl/blob/master/LICENSE) | ||
|
||
[![Embark](https://img.shields.io/badge/discord-OpenRL-%237289da.svg?logo=discord)](https://discord.gg/guvAS2up) | ||
[![Embark](https://img.shields.io/badge/discord-OpenRL-%237289da.svg?logo=discord)](https://discord.gg/qMbVT2qBhr) | ||
[![slack badge](https://img.shields.io/badge/Slack-join-blueviolet?logo=slack&)](https://join.slack.com/t/openrlhq/shared_invite/zt-1tqwpvthd-Eeh0IxQ~DIaGqYXoW2IUQg) | ||
|
||
OpenRL-v0.1.7 is updated on Sep 21, 2023 | ||
OpenRL-v0.1.10 is updated on Oct 27, 2023 | ||
|
||
The main branch is the latest version of OpenRL, which is under active development. If you just want to have a try with | ||
OpenRL, you can switch to the stable branch. | ||
|
@@ -51,6 +51,7 @@ OpenRL基于PyTorch进行开发,目标是为强化学习研究社区提供一 | |
- 支持通过专家数据进行离线强化学习训练 | ||
- 支持自博弈训练 | ||
- 支持自然语言任务(如对话任务)的强化学习训练 | ||
- 支持[DeepSpeed](https://github.com/microsoft/DeepSpeed) | ||
- 支持[竞技场](https://openrl-docs.readthedocs.io/zh/latest/arena/index.html)功能,可以在多智能体对抗性环境中方便地对各种智能体(甚至是[及第平台](https://openrl-docs.readthedocs.io/zh/latest/arena/index.html#openrl)上提交的智能体)进行评测。 | ||
- 支持从[Hugging Face](https://huggingface.co/)上导入模型和数据。支持加载Hugging Face上[Stable-baselines3的模型](https://openrl-docs.readthedocs.io/zh/latest/sb3/index.html)来进行测试和训练。 | ||
- 提供用户自有环境接入OpenRL的[详细教程](https://openrl-docs.readthedocs.io/zh/latest/custom_env/index.html). | ||
|
@@ -128,18 +129,18 @@ OpenRL-Lab将持续维护和更新OpenRL,欢迎大家加入我们的[开源社 | |
|
||
这里我们提供了一个表格,比较了OpenRL和其他常用的强化学习库。 OpenRL采用模块化设计和高层次的抽象,使得用户可以通过统一的简单易用的接口完成各种任务的训练。 | ||
|
||
| 强化学习库 | 自然语言任务/RLHF | 多智能体训练 | 自博弈训练 | 离线强化学习 | 双语文档 | | ||
| 强化学习库 | 自然语言任务/RLHF | 多智能体训练 | 自博弈训练 | 离线强化学习 | [DeepSpeed](https://github.com/microsoft/DeepSpeed) | | ||
|:------------------------------------------------------------------:|:------------------:|:--------------------:|:--------------------:|:------------------:|:------------------:| | ||
| **[OpenRL](https://github.com/OpenRL-Lab/openrl)** | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | | ||
| [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3) | :x: | :x: | :x: | :x: | :x: | | ||
| [Ray/RLlib](https://github.com/ray-project/ray/tree/master/rllib/) | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :x: | | ||
| [DI-engine](https://github.com/opendilab/DI-engine/) | :x: | :heavy_check_mark: | not fullly supported | :heavy_check_mark: | :heavy_check_mark: | | ||
| [Tianshou](https://github.com/thu-ml/tianshou) | :x: | not fullly supported | not fullly supported | :heavy_check_mark: | :heavy_check_mark: | | ||
| [DI-engine](https://github.com/opendilab/DI-engine/) | :x: | :heavy_check_mark: | not fullly supported | :heavy_check_mark: | :x: | | ||
| [Tianshou](https://github.com/thu-ml/tianshou) | :x: | not fullly supported | not fullly supported | :heavy_check_mark: | :x: | | ||
| [MARLlib](https://github.com/Replicable-MARL/MARLlib) | :x: | :heavy_check_mark: | not fullly supported | :x: | :x: | | ||
| [MAPPO Benchmark](https://github.com/marlbenchmark/on-policy) | :x: | :heavy_check_mark: | :x: | :x: | :x: | | ||
| [RL4LMs](https://github.com/allenai/RL4LMs) | :heavy_check_mark: | :x: | :x: | :x: | :x: | | ||
| [trlx](https://github.com/CarperAI/trlx) | :heavy_check_mark: | :x: | :x: | :x: | :x: | | ||
| [trl](https://github.com/huggingface/trl) | :heavy_check_mark: | :x: | :x: | :x: | :x: | | ||
| [trlx](https://github.com/CarperAI/trlx) | :heavy_check_mark: | :x: | :x: | :x: | :heavy_check_mark: | | ||
| [trl](https://github.com/huggingface/trl) | :heavy_check_mark: | :x: | :x: | :x: | :heavy_check_mark: | | ||
| [TimeChamber](https://github.com/inspirai/TimeChamber) | :x: | :x: | :heavy_check_mark: | :x: | :x: | | ||
|
||
## 安装 | ||
|
@@ -293,7 +294,7 @@ openrl --mode train --env CartPole-v1 | |
|
||
- 加入 [slack](https://join.slack.com/t/openrlhq/shared_invite/zt-1tqwpvthd-Eeh0IxQ~DIaGqYXoW2IUQg) | ||
群组,与我们一起讨论OpenRL的使用和开发。 | ||
- 加入 [Discord](https://discord.gg/guvAS2up) 群组,与我们一起讨论OpenRL的使用和开发。 | ||
- 加入 [Discord](https://discord.gg/qMbVT2qBhr) 群组,与我们一起讨论OpenRL的使用和开发。 | ||
- 发送邮件到: [[email protected]]([email protected]) | ||
- 加入 [GitHub Discussion](https://github.com/orgs/OpenRL-Lab/discussions) | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
#!/usr/bin/env python | ||
# -*- coding: utf-8 -*- | ||
# Copyright 2023 The OpenRL Authors. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# https://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
"""""" | ||
|
||
from pettingzoo.butterfly import cooperative_pong_v5 | ||
from pettingzoo.classic import connect_four_v3, go_v5, rps_v2, texas_holdem_no_limit_v6 | ||
from pettingzoo.mpe import simple_push_v3 | ||
|
||
from openrl.arena import make_arena | ||
from openrl.arena.agents.local_agent import LocalAgent | ||
from openrl.arena.agents.random_agent import RandomAgent | ||
from openrl.envs.PettingZoo.registration import register | ||
from openrl.envs.wrappers.pettingzoo_wrappers import RecordWinner | ||
|
||
|
||
def ConnectFourEnv(render_mode, **kwargs): | ||
return connect_four_v3.env(render_mode) | ||
|
||
|
||
def RockPaperScissorsEnv(render_mode, **kwargs): | ||
return rps_v2.env(num_actions=3, max_cycles=15) | ||
|
||
|
||
def GoEnv(render_mode, **kwargs): | ||
return go_v5.env(render_mode=render_mode, board_size=5, komi=7.5) | ||
|
||
|
||
def TexasHoldemEnv(render_mode, **kwargs): | ||
return texas_holdem_no_limit_v6.env(render_mode=render_mode) | ||
|
||
|
||
# MPE | ||
def SimplePushEnv(render_mode, **kwargs): | ||
return simple_push_v3.env(render_mode=render_mode) | ||
|
||
|
||
def CooperativePongEnv(render_mode, **kwargs): | ||
return cooperative_pong_v5.env(render_mode=render_mode) | ||
|
||
|
||
def register_new_envs(): | ||
new_env_dict = { | ||
"connect_four_v3": ConnectFourEnv, | ||
"RockPaperScissors": RockPaperScissorsEnv, | ||
"go_v5": GoEnv, | ||
"texas_holdem_no_limit_v6": TexasHoldemEnv, | ||
"simple_push_v3": SimplePushEnv, | ||
"cooperative_pong_v5": CooperativePongEnv, | ||
} | ||
|
||
for env_id, env in new_env_dict.items(): | ||
register(env_id, env) | ||
return new_env_dict.keys() | ||
|
||
|
||
def run_arena( | ||
env_id: str, | ||
parallel: bool = True, | ||
seed=0, | ||
total_games: int = 10, | ||
max_game_onetime: int = 5, | ||
): | ||
env_wrappers = [RecordWinner] | ||
|
||
arena = make_arena(env_id, env_wrappers=env_wrappers, use_tqdm=False) | ||
|
||
agent1 = LocalAgent("../selfplay/opponent_templates/random_opponent") | ||
agent2 = RandomAgent() | ||
|
||
arena.reset( | ||
agents={"agent1": agent1, "agent2": agent2}, | ||
total_games=total_games, | ||
max_game_onetime=max_game_onetime, | ||
seed=seed, | ||
) | ||
result = arena.run(parallel=parallel) | ||
arena.close() | ||
print(result) | ||
return result | ||
|
||
|
||
def test_new_envs(): | ||
env_ids = register_new_envs() | ||
seed = 0 | ||
for env_id in env_ids: | ||
run_arena(env_id=env_id, seed=seed, parallel=False, total_games=1) | ||
|
||
|
||
if __name__ == "__main__": | ||
test_new_envs() |
Oops, something went wrong.