Skip to content

Commit

Permalink
Merge pull request #133 from huawei-noah/zjj_release_1.6.0
Browse files Browse the repository at this point in the history
release 1.6.0
  • Loading branch information
zhangjiajin authored Aug 11, 2021
2 parents 1bba610 + de0c4ac commit 398db34
Show file tree
Hide file tree
Showing 170 changed files with 1,747 additions and 1,282 deletions.
11 changes: 6 additions & 5 deletions README.cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,14 @@

---

**Vega ver1.5.0 发布**
**Vega ver1.6.0 发布**

- 特性增强

- 解决了分布式训练的一些bug。
- 部分网络支持PyTorch + Ascend 910)。
- 命令Vega-process、Vega-progress、vega-verify-cluster提供Json格式信息。
- 支持简洁的quota设置,比如:`quota: flops < 11.2 and params in [34.0, 56.0]`
- 支持在python虚拟环境下运行Vega。
- 支持运行环境:Python 3.8和PyTorch 1.9。
- 解决了并行训练和分布式搜索的一些bug。

---

Expand Down Expand Up @@ -90,7 +91,7 @@ Vega提供了40+示例供参考:[示例](https://github.com/huawei-noah/vega/t

| 对象 | 参考 |
| :--: | :-- |
| [**用户**<br>(用户指南)](./docs/cn/user/README.md) | [安装指导](./docs/cn/user/install.md)[部署指导](./docs/cn/user/deployment.md)[配置指导](./docs/cn/user/config_reference.md)[示例参考](./docs/cn/user/examples.md)[评估服务](./docs/cn/user/evaluate_service.md) |
| [**用户**<br>(用户指南)](./docs/cn/user/README.md) | [安装指导](./docs/cn/user/install.md)[部署指导](./docs/cn/user/deployment.md)[配置指导](./docs/cn/user/config_reference.md)[示例参考](./docs/cn/user/examples.md)[评估服务](./docs/cn/user/evaluate_service.md)、任务参考([分类](./docs/cn/tasks/classification.md)[检测](./docs/cn/tasks/detection.md)[分割](./docs/cn/tasks/segmentation.md)[超分](./docs/cn/tasks/segmentation.md)) |
| [**开发者**<br>(开发者指南)](./docs/cn/developer/README.md) | [开发者指导](./docs/cn/developer/developer_guide.md)[快速入门指导](./docs/cn/developer/quick_start.md)[数据集指导](./docs/cn/developer/datasets.md)[算法开发指导](./docs/cn/developer/new_algorithm.md)[细粒度搜索空间指导](./docs/cn/developer/fine_grained_space.md) |

## FAQ
Expand Down
9 changes: 5 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,14 @@

---

**Vega ver1.5.0 released**
**Vega ver1.6.0 released**

- Feature enhancement:

- Fixed some bugs in distributed training.
- Some networks support PyTorch + Ascend 910.
- The Vega-process, Vega-progress, and vega-verify-cluster commands provide JSON format information.
- Supports simple quota settings, for example, `quota: flops < 11.2 and params in [34.0, 56.0]`.
- Supports running Vega in a Python virtual environment.
- Supported running environments: Python 3.8 and PyTorch 1.9.
- Fixed some bugs with parallel training and distributed search.

---

Expand Down
2 changes: 1 addition & 1 deletion RELEASE.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
**Vega ver1.5.0 released:**
**Vega ver1.6.0 released:**

**Introduction**

Expand Down
130 changes: 0 additions & 130 deletions docs/cn/developer/developer_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -238,136 +238,6 @@ class Generator(object):
代码中的sample接口即是NAS中每一次采样,首先调用搜索算法search出一个网络描述,再通过网络描述生成网络模型。
此外,Generator还具有判断迭代搜索是否停止以及更新搜索算法等功能。

### 4.2 Quota

Quota是一个可选的插件,允许用户定义特定的规则来控制nas搜索过程并实现自定义功能。
Quota目前提供的功能包括:

- 搜索过程控制:如果达到用户定义的条件限制,则停止nas搜索过程
- Sample过滤:如果sample不满足用户定义的条件限制,则丢弃不符合要求的sample

Quota的实现如下:

```python
class Quota(object):
def __new__(cls, *args, **kwargs):
return super().__new__(cls)
def __init__(self):
self.strategies = []
self.filters = []
# check whether quota is configured
pipe_config = UserConfig().data.get(General.step_name)
if not pipe_config.get('quota'):
return
else:
# get quota configuration
quota_config = pipe_config.get('quota')
# get all defined strategies if any
if quota_config.get('strategy'):
strategy_types = quota_config['strategy']
for type_name in strategy_types:
t_cls = ClassFactory.get_cls(ClassType.QUOTA, type_name)
self.strategies.append(t_cls())
# get all defined limitations if any
if quota_config.get('filter'):
filter_types = quota_config['filter']
for type_name in filter_types:
t_cls = ClassFactory.get_cls(ClassType.QUOTA, type_name)
self.filters.append(t_cls())
def halt(self):
raise NotImplementedError
def filter(self, res):
raise NotImplementedError
def is_halted(self):
for strategy in self.strategies:
if strategy.halt():
return True
return False
# check whether some defined filters are satisfied.
# If reaching constraints, just return false. Otherwise, always return False.
def is_filtered(self, res=None):
for flt in self.filters:
if flt.filter(res):
logging.info("Sample was throw away by strategy = %s", flt.__class__.__name__)
return True
return False
```

Quota在初始化时首先尝试检查用户是否开启了Quota配置。如果设置开启Quota,Quota将读取所有用户自定义的搜索停止策略和过滤器。Quota允许用户同时定义多个规则,规则名称同时指定在一个list中。同时定义的多个搜索停止策略属于并列关系,将同时起作用。如果未设置Quota,不会对Vega的运行造成任何影响。

Quota的配置使用需要完成四个步骤。首先,继承Quota基类,根据自定义需求构造strategy或filter类。其次,覆写Quota基类中的“halt()”和“filter()”抽象函数,并在其中实现用户的自定义策略。第三,将自定义实现的strategy和filter在Class Factory中进行注册。最后,在用户配置文件中添加Quota配置项和配置参数。之后Vega将自动执行Quota配置的策略。

以下是一个Quota的配置样例:

```yml
general:
pipeline: [nas]
nas:
pipe_step:
type: SearchPipeStep
quota:
strategy: [MaxDurationStrategy, MaxTrialNumberStrategy]
filter: [FlopParamFliter]
policy:
max_duration_time: 3000
max_trial_num: 300
flop_range: [!!float 0, !!float 0.6]
param_range: [!!float 0, !!float 1e10]
```

Quota的配置项位于每个具体的pipeline之下,如上述配置中名为“nas”的pipeline,因此,一个Quota配置只对自己所在的Pipeline负责和起作用.如果用户希望Quota对不同的Pipeline步骤生效,则需要在每个流水线步骤中都添加Quota配置。在Quota的配置中,用户可以添加任意自定义的strategy和filter,只需将自定义的具体Quota类使用“QUOTA”关键字注册到类工厂中即可被Vega索引。与Quota相关的参数可以定义到“policy”字段中,在实现具体的Quota类时,可以通过UserConfig()对定义在用户配置中的参数进行使用。

```python
class Generator(object):
"""Convert search space and search algorithm, sample a new model."""
def __init__(self):
...
self.quota = Quota()
...
@property
def is_completed(self):
return self.search_alg.is_completed or self.quota.is_halted()
def sample(self):
"""Sample a work id and model from search algorithm."""
res = self.search_alg.search()
if not res:
return None
if not isinstance(res, list):
res = [res]
if self.quota.is_filtered(res):
return None
if len(res) == 0:
return None
out = []
for sample in res:
if isinstance(sample, tuple):
sample = dict(worker_id=sample[0], desc=sample[1])
record = self.record.load_dict(sample)
logging.debug("update record=%s", str(record))
ReportClient().update(**record.to_dict())
desc = self._decode_hps(record.desc)
out.append((record.worker_id, desc))
return out
```

下面是关于Quota在Vega中工作的流程。在每个Pipeline中,Vega首先检查nas搜索是否完成(is_completed())或是否达到用户定义的停止条件(is_halted())。如果搜索完成,或者达到用户定义的停止条件所到达停止条件,当前pipeline将被立即停止。

在准备sample的阶段,generator首先在从搜索算法中获取待评估的sample,这些sample被移交给Quota进行过滤(is_filtered())。过滤规则由用户在具体Quota类的“filter()”函数中定义。任何不满足用户定义过滤规则的的样本将不会被用来训练并被直接丢弃。过滤完成后,Quota将所有满足条件的Sample传递给generator,之后将完成训练。

Vega现在为用户提供两种pipeline停止策略和一种样本过滤器。现有的两种停止策略分别是利用最大采样次数和最长运行时间来作为终止条件。这两种策略都支持“开箱即用”。内置的样本过滤器则允许用户在搜索算法搜索到sample之后立即评估sample的flops和parameters参数来决定是否要保留该sample。由于计算flops和parameters需要知道数据集的相关信息,因此用户必须在相关的数据集类中提供一个“data_case()”接口,或者提供自定义的方法来计算flops和parameters。

## 5 Trainer

Trainer用于训练模型,在NAS、HPO、fully train等阶段,可将trainer配置这些阶段的pipestep中,完成模型的训练。
Expand Down
19 changes: 3 additions & 16 deletions docs/cn/user/config_reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,12 +61,7 @@ my_fully_train:
| logger / level | debug \| info \| warn \| error \| critical | info | 日志级别。 |
| cluster / master_ip | - | ~ | 在集群场景下需要设置该参数,设置为master节点的IP地址。 |
| cluster / slaves | - | [] | 在集群场景下需要设置该参数,设置为除了master节点外的其他节点的IP地址。 |
| quota / restrict / flops | - | ~ | 过滤模型。设置采样模型的浮点计算量最大值或范围,单位为M。 |
| quota / restrict / params | - | ~ | 过滤模型。设置采样模型的参数量最大值或范围,单位为K。 |
| quota / restrict / latency | - | ~ | 过滤模型。设置采样模型的时延最大值或范围,单位为ms。 |
| quota / target / type | accuracy \| IoUMetric \| PSNR | ~ | 过滤模型。设置模型的训练metric目标类型。 |
| quota / target / value | - | ~ | 过滤模型。设置模型的训练metric目标值。 |
| quota / runtime | - | ~ | 用户设定的Pipeline最大运行时间估计值,单位为h。 |
| quota | - | ~ | 过滤模型。可设置采样模型的浮点计算量最大值或范围(单位为M),模型的参数量最大值或范围(单位为K),采样模型的时延最大值或范围(单位为ms),Pipeline最大运行时间(单位为h)。支持"<"、">"、"in"、"and" 四种操作。<br>eg: "flops < 10 and params in [100, 1000]" |

```yaml
general:
Expand All @@ -81,15 +76,7 @@ general:
cluster:
master_ip: ~
slaves: []
quota:
restrict:
flops: 10
params: [100, 1000]
latency: 100
target:
type: accuracy
value: 0.98
runtime: 10
quota: "flops < 10 and params in [100, 1000]"
```
## 2.1 并行和分布式
Expand Down Expand Up @@ -244,7 +231,7 @@ search_algorithm:
| type | 搜索算法名称,包括RandomSearch、AshaHpo、BohbHpo、BossHpo、PBTHpo | `type: RandomSearch` |
| objective_keys | 优化目标 | `objective_keys: 'accuracy'` |
| policy.total_epochs | 搜索epoch配额。Vega简化了配置策略,只需要配置该参数。若需了解其他参数配置,可参考HPO和NAGO算法示例。 | `total_epochs: 2430` |
| tuner | tuner类型,用于BOHB算法,包括gp(缺省)、rf、hebo | tuner: "gp" |
| tuner | tuner类型,用于BOHB算法,包括gp、rf(缺省)、hebo | tuner: "rf" |

注意:若参数tuner设置hebo,则需要安装"[HEBO](https://github.com/huawei-noah/noah-research/tree/master/HEBO)",且需要注意gpytorch的版本为1.1.1,torch的版本设置为1.5.0,torchvision的版本为0.5.0。

Expand Down
4 changes: 2 additions & 2 deletions docs/cn/user/evaluate_service.md
Original file line number Diff line number Diff line change
Expand Up @@ -269,9 +269,9 @@ from .my_hardware import MyHardware

## 6. FAQ

### 6.1 Pytorch模型评估
### 6.1 Pytorch模型转换caffe模型

在评估服务的客户端需要进行`Pytorch`模型的转换,请下载[PytorchToCaffe](https://github.com/xxradon/PytorchToCaffe)获取并放在`./third_party`目录下(third_party目录与vega处于同一目录层级)。
如果需要将pytorch模型转换为caffe模型,请下载[PytorchToCaffe](https://github.com/xxradon/PytorchToCaffe)获取并放在`./third_party`目录下(third_party目录与vega处于同一目录层级)。

注意: 该第三方开源软件不支持pytorch1.1版本, 并且如果您使用原生torchvisoin中的模型, 当torchvision版本高于0.2.0时, 您需要做以下额外修改:
修改`pytorch_to_caffe.py`文件, 增加以下内容:
Expand Down
130 changes: 0 additions & 130 deletions docs/en/developer/developer_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -242,136 +242,6 @@ The sample interface in the code is used for each sampling in the NAS. The sampl

In addition, the generator can determine whether the iterative search stops and update the search algorithm.

### 4.2 Quota

Quota is an optional plugin that enables users to define specific rules to control nas search process for special purpose.
Currently, Quota provides avaliable abilities as follows:

- Search control: halt nas search process if reaching user-defined constraints
- Sample filtering: throw away sample from search algorithm if disatisfying user-defined limitations

The implementation of Quota is show below.

```python
class Quota(object):

def __new__(cls, *args, **kwargs):
return super().__new__(cls)

def __init__(self):
self.strategies = []
self.filters = []
# check whether quota is configured
pipe_config = UserConfig().data.get(General.step_name)
if not pipe_config.get('quota'):
return
else:
# get quota configuration
quota_config = pipe_config.get('quota')
# get all defined strategies if any
if quota_config.get('strategy'):
strategy_types = quota_config['strategy']
for type_name in strategy_types:
t_cls = ClassFactory.get_cls(ClassType.QUOTA, type_name)
self.strategies.append(t_cls())
# get all defined limitations if any
if quota_config.get('filter'):
filter_types = quota_config['filter']
for type_name in filter_types:
t_cls = ClassFactory.get_cls(ClassType.QUOTA, type_name)
self.filters.append(t_cls())

def halt(self):
raise NotImplementedError

def filter(self, res):
raise NotImplementedError

def is_halted(self):
for strategy in self.strategies:
if strategy.halt():
return True
return False

# check whether some defined filters are satisfied.
# If reaching constraints, just return false. Otherwise, always return False.
def is_filtered(self, res=None):
for flt in self.filters:
if flt.filter(res):
logging.info("Sample was throw away by strategy = %s", flt.__class__.__name__)
return True
return False
```

While initializing, Quota tries to find whether users set Quota's configuration or not. If setted, Quota gets all user-defined strategies and filters. Quota allows multiple defined rules, and users can give multi rule names in one list. All of the strategies work with a union relationship, which means they make effect at the same time. If not setted, Quota will have nothing influnce on Vega's running.

To take advantage of Quota, there are four steps users should walk with. First, construct a strategy or filter class, which is inherited from Quota base class. Second, overwrite the abstract function of "halt()" and "filter()" to put on users' self-defined approach. Third, regist finished concrete class into class factory. At the end, add quota configuration setting item into user configuration file, and then Vega will automatically hold on the rest of things.

Here is an configuration example of Quota:

```yml
general:

pipeline: [nas]

nas:
pipe_step:
type: SearchPipeStep

quota:
strategy: [MaxDurationStrategy, MaxTrialNumberStrategy]
filter: [FlopParamFliter]
policy:
max_duration_time: 3000
max_trial_num: 300
flop_range: [!!float 0, !!float 0.6]
param_range: [!!float 0, !!float 1e10]

```

Quota configuration item is put under the converage of each pipeline, and only has responsibility of each pipeline step, so users need to add Quota setting in each single pipeline step if they want Quota makes effect on different pipeline steps. In Quota's setting paragraph, users can selectively give their own defined strategies and filters by classname which are implemented free and registed into class factory by type "QUOTA". Relavant parameters can be difined into "policy" and refered in strategy class by UserConfig().

```python
class Generator(object):
"""Convert search space and search algorithm, sample a new model."""

def __init__(self):
...
self.quota = Quota()
...
@property
def is_completed(self):
return self.search_alg.is_completed or self.quota.is_halted()

def sample(self):
"""Sample a work id and model from search algorithm."""
res = self.search_alg.search()
if not res:
return None
if not isinstance(res, list):
res = [res]
if self.quota.is_filtered(res):
return None
if len(res) == 0:
return None
out = []
for sample in res:
if isinstance(sample, tuple):
sample = dict(worker_id=sample[0], desc=sample[1])
record = self.record.load_dict(sample)
logging.debug("update record=%s", str(record))
ReportClient().update(**record.to_dict())
desc = self._decode_hps(record.desc)
out.append((record.worker_id, desc))
return out
```

It should be anounced how Quota works in a round of nas search process. In each nas pipeline step, Vega first check whether the search procedure has completed or arrived at the user-defined halting conditions. If getting to the stop condition, the current nas pipeline step will halt at once.

In proposing samples, after receiving a sample res from search algorithm, the res sample is handed to quota to filter. The filtering rules is defined by users in the function"fliter()" in concrete class. Users can throw away any sample that don't reach their expectation. Afterwards, generator gets all satisfactory samples and go for further processing.

Vega now provides two kinds of halting strategies and one kind of sample filter. The two exisitng halting strategies allow the pipe step to stop by sample trials number and pipe step running time, respectively. These two strategies all support "out of box". The fliter example enables users to remove the sample they don't want by evaluating the flops and parameters of the sample network before training them. Calculating flops and parameters needs to know the dataset's information, so users have to write a "data_case()" interface in related dataset class or just give their own method to compute flops and parameters.

### 5 Trainer

The trainer is used to train models. In the NAS, HPO, and fully train phases, the trainer can be configured in the pipe steps of these phases to complete model training.
Expand Down
Loading

0 comments on commit 398db34

Please sign in to comment.