diff --git a/README.md b/README.md index 70285409a..79b707bd3 100644 --- a/README.md +++ b/README.md @@ -63,7 +63,7 @@ Running Platform: - [DSSM](docs/source/models/dssm.md) / [MIND](docs/source/models/mind.md) / [DropoutNet](docs/source/models/dropoutnet.md) / [CoMetricLearningI2I](docs/source/models/co_metric_learning_i2i.md) / [PDN](docs/source/models/pdn.md) - [W&D](docs/source/models/wide_and_deep.md) / [DeepFM](docs/source/models/deepfm.md) / [MultiTower](docs/source/models/multi_tower.md) / [DCN](docs/source/models/dcn.md) / [FiBiNet](docs/source/models/fibinet.md) / [MaskNet](docs/source/models/masknet.md) / [PPNet](docs/source/models/ppnet.md) / [CDN](docs/source/models/cdn.md) - [DIN](docs/source/models/din.md) / [BST](docs/source/models/bst.md) / [CL4SRec](docs/source/models/cl4srec.md) -- [MMoE](docs/source/models/mmoe.md) / [ESMM](docs/source/models/esmm.md) / [DBMTL](docs/source/models/dbmtl.md) / [PLE](docs/source/models/ple.md) +- [MMoE](docs/source/models/mmoe.md) / [ESMM](docs/source/models/esmm.md) / [DBMTL](docs/source/models/dbmtl.md) / [AITM](docs/source/models/aitm.md) / [PLE](docs/source/models/ple.md) - [HighwayNetwork](docs/source/models/highway.md) / [CMBF](docs/source/models/cmbf.md) / [UNITER](docs/source/models/uniter.md) - More models in development diff --git a/docs/images/models/aitm.jpg b/docs/images/models/aitm.jpg new file mode 100644 index 000000000..4eab9af17 Binary files /dev/null and b/docs/images/models/aitm.jpg differ diff --git a/docs/source/benchmark.md b/docs/source/benchmark.md index 8e2d20c6f..8a2c5348e 100644 --- a/docs/source/benchmark.md +++ b/docs/source/benchmark.md @@ -9,6 +9,7 @@ - 该数据集是淘宝展示广告点击率预估数据集,包含用户、广告特征和行为日志。[天池比赛链接](https://tianchi.aliyun.com/dataset/dataDetail?dataId=56) - 训练数据表:pai_online_project.easyrec_demo_taobao_train_data - 测试数据表:pai_online_project.easyrec_demo_taobao_test_data +- 其中pai_online_project是一个公共读的MaxCompute project,里面写入了一些数据表做测试,不需要申请权限。 - 在PAI上面测试使用的资源包括2个parameter server,9个worker,其中一个worker做评估: ```json {"ps":{"count":2, diff --git a/docs/source/component/backbone.md b/docs/source/component/backbone.md index 2a0ec03a5..d9edd2ef7 100644 --- a/docs/source/component/backbone.md +++ b/docs/source/component/backbone.md @@ -1111,13 +1111,14 @@ MovieLens-1M数据集效果: ## 2.特征交叉组件 -| 类名 | 功能 | 说明 | 示例 | -| -------------- | ---------------- | ------------ | -------------------------------------------------------------------------------------------------------------------------- | -| FM | 二阶交叉 | DeepFM模型的组件 | [案例2](#deepfm) | -| DotInteraction | 二阶内积交叉 | DLRM模型的组件 | [案例4](#dlrm) | -| Cross | bit-wise交叉 | DCN v2模型的组件 | [案例3](#dcn) | -| BiLinear | 双线性 | FiBiNet模型的组件 | [fibinet_on_movielens.config](https://github.com/alibaba/EasyRec/tree/master/examples/configs/fibinet_on_movielens.config) | -| FiBiNet | SENet & BiLinear | FiBiNet模型 | [fibinet_on_movielens.config](https://github.com/alibaba/EasyRec/tree/master/examples/configs/fibinet_on_movielens.config) | +| 类名 | 功能 | 说明 | 示例 | +| -------------- | --------------------- | ---------------- | -------------------------------------------------------------------------------------------------------------------------- | +| FM | 二阶交叉 | DeepFM模型的组件 | [案例2](#deepfm) | +| DotInteraction | 二阶内积交叉 | DLRM模型的组件 | [案例4](#dlrm) | +| Cross | bit-wise交叉 | DCN v2模型的组件 | [案例3](#dcn) | +| BiLinear | 双线性 | FiBiNet模型的组件 | [fibinet_on_movielens.config](https://github.com/alibaba/EasyRec/tree/master/examples/configs/fibinet_on_movielens.config) | +| FiBiNet | SENet & BiLinear | FiBiNet模型 | [fibinet_on_movielens.config](https://github.com/alibaba/EasyRec/tree/master/examples/configs/fibinet_on_movielens.config) | +| Attention | Dot-product attention | Transformer模型的组件 | | ## 3.特征重要度学习组件 diff --git a/docs/source/component/component.md b/docs/source/component/component.md index 897e53162..731e95759 100644 --- a/docs/source/component/component.md +++ b/docs/source/component/component.md @@ -79,6 +79,33 @@ | senet | SENet | | protobuf message | | mlp | MLP | | protobuf message | +- Attention + +Dot-product attention layer, a.k.a. Luong-style attention. + +The calculation follows the steps: + +1. Calculate attention scores using query and key with shape (batch_size, Tq, Tv). +1. Use scores to calculate a softmax distribution with shape (batch_size, Tq, Tv). +1. Use the softmax distribution to create a linear combination of value with shape (batch_size, Tq, dim). + +| 参数 | 类型 | 默认值 | 说明 | +| ----------------------- | ------ | ----- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| use_scale | bool | False | If True, will create a scalar variable to scale the attention scores. | +| score_mode | string | dot | Function to use to compute attention scores, one of {"dot", "concat"}. "dot" refers to the dot product between the query and key vectors. "concat" refers to the hyperbolic tangent of the concatenation of the query and key vectors. | +| dropout | float | 0.0 | Float between 0 and 1. Fraction of the units to drop for the attention scores. | +| seed | int | None | A Python integer to use as random seed incase of dropout. | +| return_attention_scores | bool | False | if True, returns the attention scores (after masking and softmax) as an additional output argument. | +| use_causal_mask | bool | False | Set to True for decoder self-attention. Adds a mask such that position i cannot attend to positions j > i. This prevents the flow of information from the future towards the past. | + +- inputs: List of the following tensors: + - query: Query tensor of shape (batch_size, Tq, dim). + - value: Value tensor of shape (batch_size, Tv, dim). + - key: Optional key tensor of shape (batch_size, Tv, dim). If not given, will use value for both key and value, which is the most common case. +- output: + - Attention outputs of shape (batch_size, Tq, dim). + - (Optional) Attention scores after masking and softmax with shape (batch_size, Tq, Tv). + ## 3.特征重要度学习组件 - SENet diff --git a/docs/source/feature/data.md b/docs/source/feature/data.md index 827791ffb..169902b78 100644 --- a/docs/source/feature/data.md +++ b/docs/source/feature/data.md @@ -2,7 +2,7 @@ EasyRec作为阿里云PAI的推荐算法包,可以无缝对接MaxCompute的数据表,也可以读取OSS中的大文件,还支持E-MapReduce环境中的HDFS文件,也支持local环境中的csv文件。 -为了识别这些输入数据中的字段信息,需要设置相应的字段名称和字段类型、设置默认值,帮助EasyRec去读取相应的数据。设置label字段,作为训练的目标。为了适应多目标模型,label字段可以设置多个。 +为了识别这些输入数据中的字段信息,需要设置相应的字段名称和字段类型、设置默认值,帮助EasyRec去读取相应的数据。设置label字段,作为训练的目标。为了适配多目标模型,label字段可设置多个。 另外还有一些参数如prefetch_size,是tensorflow中读取数据需要设置的参数。 @@ -10,7 +10,7 @@ EasyRec作为阿里云PAI的推荐算法包,可以无缝对接MaxCompute的数 这个配置里面,只有三个字段,用户ID(uid)、物品ID(item_id)、label字段(click)。 -OdpsInputV2表示读取MaxCompute的表作为输入数据。 +OdpsInputV2表示读取MaxCompute的表作为输入数据。如果是本地机器上训练,注意使用CSVInput类型。 ```protobuf data_config { @@ -160,7 +160,7 @@ def remap_lbl(labels): ### prefetch_size - data prefetch,以batch为单位,默认是32 -- 设置prefetch size可以提高数据加载的速度,防止数据瓶颈 +- 设置prefetch size可以提高数据加载的速度,防止数据瓶颈。但是当batchsize较小的时候,该值可适当调小。 ### shard && file_shard diff --git a/docs/source/feature/feature.rst b/docs/source/feature/feature.rst index a41b42a53..901fe6673 100644 --- a/docs/source/feature/feature.rst +++ b/docs/source/feature/feature.rst @@ -3,7 +3,7 @@ 在上一节介绍了输入数据包括MaxCompute表、csv文件、hdfs文件、OSS文件等,表或文件的一列对应一个特征。 -在数据中可以有一个或者多个label字段,而特征比较丰富,支持的类型包括IdFeature,RawFeature,TagFeature,SequenceFeature, ComboFeature. +在数据中可以有一个或者多个label字段,在多目标模型中,需要多个label字段。而特征比较丰富,支持的类型包括IdFeature,RawFeature,TagFeature,SequenceFeature, ComboFeature。 各种特征共用字段 ---------------------------------------------------------------- @@ -71,12 +71,12 @@ IdFeature: 离散值特征/ID类特征 .. math:: - embedding\_dim=8+x^{0.25} - - 其中,x 为不同特征取值的个数 + embedding\_dim=8+n^{0.25} + - 其中,n 是特征的唯一值的个数(如gender特征的取值是男、女,则n=2) - hash\_bucket\_size: hash bucket的大小。适用于category_id, user_id等 -- 对于user\_id等规模比较大的,hash冲突影响比较小的特征, +- 对于user\_id等规模比较大的,hash冲突影响比较小的特征,用户行为日志不够丰富可通过hash压缩id数量, .. math:: @@ -91,7 +91,8 @@ IdFeature: 离散值特征/ID类特征 - num\_buckets: buckets number, - 仅仅当输入是integer类型时,可以使用num\_buckets + 仅仅当输入是integer类型时,可以使用num\_buckets。 + 但是当使用fg特征的时候,不要用integer特征用num\_buckets的方式来变换,注意要用hash\_bucket\_size的方式。 - vocab\_list: 指定词表,适合取值比较少可以枚举的特征,如星期,月份,星座等 diff --git a/docs/source/feature/pai_rec_callback_conf.md b/docs/source/feature/pai_rec_callback_conf.md index 151c07b1d..5679222d7 100644 --- a/docs/source/feature/pai_rec_callback_conf.md +++ b/docs/source/feature/pai_rec_callback_conf.md @@ -1,5 +1,9 @@ # PAI-REC 全埋点配置 +## PAI-Rec引擎的callback服务文档 + +- [文档](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/pairec/docs/pairec/html/intro/callback_api.html) + ## 模板 ```json diff --git a/docs/source/feature/rtp_fg.md b/docs/source/feature/rtp_fg.md index baeaf078b..40da4852e 100644 --- a/docs/source/feature/rtp_fg.md +++ b/docs/source/feature/rtp_fg.md @@ -2,7 +2,7 @@ - RTP FG: RealTime Predict Feature Generation, 解决实时预测需要的特征工程需求. 特征工程在推荐链路里面也占用了比较长的时间. -- RTP FG能够以比较高的效率生成一些复杂的交叉特征,如match feature和lookup feature, 通过使用同一套c++代码保证离线在线的一致性. +- RTP FG能够以比较高的效率生成一些复杂的交叉特征,如match feature和lookup feature.离线训练和在线预测的时候通过使用同一套c++代码保证离线在线的一致性. - 其生成的特征可以接入EasyRec进行训练,从RTP FG的配置(fg.json)可以生成EasyRec的配置文件(pipeline.config). diff --git a/docs/source/feature/rtp_native.md b/docs/source/feature/rtp_native.md index d2524079a..8774041c7 100644 --- a/docs/source/feature/rtp_native.md +++ b/docs/source/feature/rtp_native.md @@ -1,6 +1,6 @@ # RTP部署 -本文档介绍将EasyRec模型部署到RTP上的流程. +本文档介绍将EasyRec模型部署到RTP(Real Time Prediction,实时打分服务)上的流程. - RTP目前仅支持checkpoint形式的模型部署,因此需要将EasyRec模型导出为checkpoint形式 diff --git a/docs/source/intro.md b/docs/source/intro.md index f4dabcb76..b91c0e7cf 100644 --- a/docs/source/intro.md +++ b/docs/source/intro.md @@ -63,4 +63,5 @@ EasyRec implements state of the art machine learning models used in common recom ### Contact +- DingDing Group: 32260796. (EasyRec usage general discussion.) - DingDing Group: 37930014162, click [this url](https://qr.dingtalk.com/action/joingroup?code=v1,k1,oHNqtNObbu+xUClHh77gCuKdGGH8AYoQ8AjKU23zTg4=&_dt_no_comment=1&origin=11) or scan QrCode to join![new_group.jpg](../images/qrcode/new_group.jpg) diff --git a/docs/source/models/aitm.md b/docs/source/models/aitm.md new file mode 100644 index 000000000..a15ea0489 --- /dev/null +++ b/docs/source/models/aitm.md @@ -0,0 +1,118 @@ +# AITM + +### 简介 + +在推荐场景里,用户的转化链路往往有多个中间步骤(曝光->点击->转化),AITM是一种多任务模型框架,充分利用了链路上各个节点的样本,提升模型对后端节点转化率的预估。 + +![AITM](../../images/models/aitm.jpg) + +1. (a) Expert-Bottom pattern。如 [MMoE](mmoe.md) +1. (b) Probability-Transfer pattern。如 [ESMM](esmm.md) +1. (c) Adaptive Information Transfer Multi-task (AITM) framework. + +两个特点: + +1. 使用Attention机制来融合多个目标对应的特征表征; +1. 引入了行为校正的辅助损失函数。 + +### 配置说明 + +```protobuf +model_config { + model_name: "AITM" + model_class: "MultiTaskModel" + feature_groups { + group_name: "all" + feature_names: "user_id" + feature_names: "cms_segid" + ... + feature_names: "tag_brand_list" + wide_deep: DEEP + } + backbone { + blocks { + name: "mlp" + inputs { + feature_group_name: "all" + } + keras_layer { + class_name: 'MLP' + mlp { + hidden_units: [512, 256] + } + } + } + } + model_params { + task_towers { + tower_name: "ctr" + label_name: "clk" + loss_type: CLASSIFICATION + metrics_set: { + auc {} + } + dnn { + hidden_units: [256, 128] + } + use_ait_module: true + weight: 1.0 + } + task_towers { + tower_name: "cvr" + label_name: "buy" + losses { + loss_type: CLASSIFICATION + } + losses { + loss_type: ORDER_CALIBRATE_LOSS + } + metrics_set: { + auc {} + } + dnn { + hidden_units: [256, 128] + } + relation_tower_names: ["ctr"] + use_ait_module: true + ait_project_dim: 128 + weight: 1.0 + } + l2_regularization: 1e-6 + } + embedding_regularization: 5e-6 +} +``` + +- model_name: 任意自定义字符串,仅有注释作用 + +- model_class: 'MultiTaskModel', 不需要修改, 通过组件化方式搭建的多目标排序模型都叫这个名字 + +- feature_groups: 配置一组特征。 + +- backbone: 通过组件化的方式搭建的主干网络,[参考文档](../component/backbone.md) + + - blocks: 由多个`组件块`组成的一个有向无环图(DAG),框架负责按照DAG的拓扑排序执行个`组件块`关联的代码逻辑,构建TF Graph的一个子图 + - name/inputs: 每个`block`有一个唯一的名字(name),并且有一个或多个输入(inputs)和输出 + - keras_layer: 加载由`class_name`指定的自定义或系统内置的keras layer,执行一段代码逻辑;[参考文档](../component/backbone.md#keraslayer) + - mlp: MLP模型的参数,详见[参考文档](../component/component.md#id1) + +- model_params: AITM相关的参数 + + - task_towers 根据任务数配置task_towers + - tower_name + - dnn deep part的参数配置 + - hidden_units: dnn每一层的channel数目,即神经元的数目 + - use_ait_module: if true 使用`AITM`模型;否则,使用[DBMTL](dbmtl.md)模型 + - ait_project_dim: 每个tower对应的表征向量的维度,一般设为最后一个隐藏的维度即可 + - 默认为二分类任务,即num_class默认为1,weight默认为1.0,loss_type默认为CLASSIFICATION,metrics_set为auc + - loss_type: ORDER_CALIBRATE_LOSS 使用目标依赖关系校正预测结果的辅助损失函数,详见原始论文 + - 注:label_fields需与task_towers一一对齐。 + - embedding_regularization: 对embedding部分加regularization,防止overfit + +### 示例Config + +- [AITM_demo.config](https://github.com/alibaba/EasyRec/blob/master/samples/model_config/aitm_on_taobao.config) + +### 参考论文 + +[AITM: Modeling the Sequential Dependence among Audience Multi-step Conversions with Multi-task Learning in Targeted Display Advertising](https://arxiv.org/pdf/2105.08489.pdf) diff --git a/docs/source/models/loss.md b/docs/source/models/loss.md index f1246299f..881794e6a 100644 --- a/docs/source/models/loss.md +++ b/docs/source/models/loss.md @@ -19,6 +19,7 @@ EasyRec支持两种损失函数配置方式:1)使用单个损失函数;2 | PAIRWISE_LOGISTIC_LOSS | pair粒度的logistic loss, 支持自定义pair分组 | | JRC_LOSS | 二分类 + listwise ranking loss | | F1_REWEIGHTED_LOSS | 可以调整二分类召回率和准确率相对权重的损失函数,可有效对抗正负样本不平衡问题 | +| ORDER_CALIBRATE_LOSS | 使用目标依赖关系校正预测结果的辅助损失函数,详见[AITM](aitm.md)模型 | - 说明:SOFTMAX_CROSS_ENTROPY_WITH_NEGATIVE_MINING - 支持参数配置,升级为 [support vector guided softmax loss](https://128.84.21.199/abs/1812.11317) , @@ -71,9 +72,9 @@ EasyRec支持两种损失函数配置方式:1)使用单个损失函数;2 - f1_beta_square: 大于1的值会导致模型更关注recall,小于1的值会导致模型更关注precision - F1 分数,又称平衡F分数(balanced F Score),它被定义为精确率和召回率的调和平均数。 - - ![f1 score](../images/other/f1_score.svg) + - ![f1 score](../../images/other/f1_score.svg) - 更一般的,我们定义 F_beta 分数为: - - ![f_beta score](../images/other/f_beta_score.svg) + - ![f_beta score](../../images/other/f_beta_score.svg) - f1_beta_square 即为 上述公式中的 beta 系数的平方。 - PAIRWISE_FOCAL_LOSS 的参数配置 @@ -211,3 +212,4 @@ task_towers { - 《 Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics 》 - 《 [Reasonable Effectiveness of Random Weighting: A Litmus Test for Multi-Task Learning](https://arxiv.org/abs/2111.10603) 》 +- [AITM: Modeling the Sequential Dependence among Audience Multi-step Conversions with Multi-task Learning in Targeted Display Advertising](https://arxiv.org/pdf/2105.08489.pdf) diff --git a/docs/source/models/multi_target.rst b/docs/source/models/multi_target.rst index 9012aca9b..2263a27c0 100644 --- a/docs/source/models/multi_target.rst +++ b/docs/source/models/multi_target.rst @@ -7,5 +7,6 @@ esmm mmoe dbmtl + aitm ple simple_multi_task diff --git "a/docs/source/predict/MaxCompute \347\246\273\347\272\277\351\242\204\346\265\213.md" "b/docs/source/predict/MaxCompute \347\246\273\347\272\277\351\242\204\346\265\213.md" index 7f0b9e675..dd867a165 100644 --- "a/docs/source/predict/MaxCompute \347\246\273\347\272\277\351\242\204\346\265\213.md" +++ "b/docs/source/predict/MaxCompute \347\246\273\347\272\277\351\242\204\346\265\213.md" @@ -11,7 +11,7 @@ drop table if exists ctr_test_output; pai -name easy_rec_ext -Dcmd=predict --Dcluster='{"worker" : {"count":5, "cpu":1600, "memory":40000, "gpu":100}}' +-Dcluster='{"worker" : {"count":5, "cpu":1000, "memory":40000, "gpu":0}}' -Darn=acs:ram::xxx:role/aliyunodpspaidefaultrole -Dbuckets=oss://easyrec/ -Dsaved_model_dir=oss://easyrec/easy_rec_test/experiment/export/1597299619 @@ -23,6 +23,7 @@ pai -name easy_rec_ext -DossHost=oss-cn-beijing-internal.aliyuncs.com; ``` +- cluster: 这里cpu:1000表示是10个cpu核;核与内存的关系设置1:4000,一般不超过40000;gpu设置为0,表示不用GPU推理。 - saved_model_dir: 导出的模型目录 - output_table: 输出表,不需要提前创建,会自动创建 - excluded_cols: 预测模型不需要的columns,比如labels @@ -55,6 +56,8 @@ pai -name easy_rec_ext - 多分类模型(num_class > 1),导出字段: - logits: string(json), softmax之前的vector, shape\[num_class\] - probs: string(json), softmax之后的vector, shape\[num_class\] + - 如果一个分类目标是is_click, 输出概率的变量名称是probs_is_click + - 多目标模型中有一个回归目标是paytime,那么输出回归预测分的变量名称是:y_paytime - logits_y: logits\[y\], float, 类别y对应的softmax之前的概率 - probs_y: probs\[y\], float, 类别y对应的概率 - y: 类别id, = argmax(probs_y), int, 概率最大的类别 diff --git a/docs/source/predict/processor.md b/docs/source/predict/processor.md index 0ce0b4bd8..dabdb7aa1 100644 --- a/docs/source/predict/processor.md +++ b/docs/source/predict/processor.md @@ -1,17 +1,17 @@ # EasyRec Processor -EasyRec Processor, 是EasyRec对应的高性能在线打分引擎, 包含特征处理和模型推理功能. EasyRecProcessor运行在PAI-EAS之上, 可以充分利用PAI-EAS多种优化特性. +EasyRec Processor([阿里云上的EasyRec Processor详细文档,包括版本、使用方式](https://help.aliyun.com/zh/pai/user-guide/easyrec)), 是EasyRec对应的高性能在线打分引擎, 包含特征处理和模型推理功能. EasyRecProcessor运行在PAI-EAS之上, 可以充分利用PAI-EAS多种优化特性. ## 架构设计 -EasyRec Processor包含三个部分: Item特征缓存, 特征处理(Feature Generator), TFModel(tensorflow model). +EasyRec Processor包含三个部分: Item特征缓存(支持通过[FeatureStore](https://help.aliyun.com/zh/pai/user-guide/featurestore-overview)加载MaxCompute表做初始化), 特征生成(Feature Generator), TFModel(tensorflow model). ![image.png](../../images/processor/easy_rec_processor_1.png) ## 性能优化 ### 基础实现 -将FeatureGenerator和TFModel分开, 先做特征生成,然后再Run TFModel. +将FeatureGenerator和TFModel分开, 先做特征生成(即fg),然后再Run TFModel得到预测结果. ### 优化实现 diff --git "a/docs/source/predict/\345\234\250\347\272\277\351\242\204\346\265\213.md" "b/docs/source/predict/\345\234\250\347\272\277\351\242\204\346\265\213.md" index 56f496945..8cb7db1ca 100644 --- "a/docs/source/predict/\345\234\250\347\272\277\351\242\204\346\265\213.md" +++ "b/docs/source/predict/\345\234\250\347\272\277\351\242\204\346\265\213.md" @@ -1,6 +1,6 @@ # Model Serving -推荐使用阿里云上的[模型在线服务(PAI-EAS)](https://help.aliyun.com/document_detail/113696.html)预置的EasyRecProcessor 来部署在线推理服务。EasyRecProcessor针对推荐模型做了多种优化, 相比tensorflow serving和TensorRT方式部署具有显著的[性能优势](./processor.md)。 +推荐使用阿里云上的[模型在线服务(PAI-EAS)](https://help.aliyun.com/document_detail/113696.html)预置的EasyRecProcessor 来部署在线推理服务。EasyRec Processor([阿里云文档](https://help.aliyun.com/zh/pai/user-guide/easyrec))针对推荐模型做了多种优化, 相比tensorflow serving和TensorRT方式部署具有显著的[性能优势](./processor.md)。 ## 命令行部署 diff --git a/docs/source/quick_start/designer_tutorial.md b/docs/source/quick_start/designer_tutorial.md index 95d9899b2..66a22d15c 100644 --- a/docs/source/quick_start/designer_tutorial.md +++ b/docs/source/quick_start/designer_tutorial.md @@ -94,3 +94,7 @@ PAI-Designer(Studio 2.0)是基于云原生架构Pipeline Service(PAIFlow `pai -name easy_rec_ext -project algo_public -Dcmd=predict` - 具体命令及详细[参数说明](../train.md#on-pai) + +### 推荐算法定制的方案 + +- 在Designer中做推荐算法特征工程、排序模型训练、向量召回等案例的阿里云官网[文档链接](https://help.aliyun.com/zh/pai/use-cases/overview-18) diff --git a/docs/source/quick_start/dlc_tutorial.md b/docs/source/quick_start/dlc_tutorial.md index 22e067daa..f766a5f93 100644 --- a/docs/source/quick_start/dlc_tutorial.md +++ b/docs/source/quick_start/dlc_tutorial.md @@ -88,16 +88,16 @@ dlc submit tfjob \ --workspace_id=67849 \ --priority=1 \ --workers=1 \ - --worker_image=mybigpai-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py36-tf1.15-0.4.9 \ + --worker_image=mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py36-tf1.15-0.7.4 \ --worker_spec=ecs.g6.2xlarge \ --ps=1 \ - --ps_image=mybigpai-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py36-tf1.15-0.4.9 \ + --ps_image=mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py36-tf1.15-0.7.4 \ --ps_spec=ecs.g6.2xlarge \ --chief=true \ - --chief_image=mybigpai-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py36-tf1.15-0.4.9 \ + --chief_image=mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py36-tf1.15-0.7.4 \ --chief_spec=ecs.g6.2xlarge \ --evaluators=1 \ - --evaluator_image=mybigpai-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py36-tf1.15-0.4.9 \ + --evaluator_image=mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py36-tf1.15-0.7.4 \ --evaluator_spec=ecs.g6.2xlarge ``` diff --git a/docs/source/quick_start/local_tutorial.md b/docs/source/quick_start/local_tutorial.md index 8074c1218..443312ce9 100644 --- a/docs/source/quick_start/local_tutorial.md +++ b/docs/source/quick_start/local_tutorial.md @@ -4,6 +4,8 @@ 我们提供了`本地Anaconda安装`和`Docker镜像启动`两种方式。 +有技术问题可加钉钉群:37930014162 + #### 本地Anaconda安装 Demo实验中使用的环境为 `python=3.6.8` + `tenserflow=1.12.0` @@ -31,8 +33,8 @@ Docker的环境为`python=3.6.9` + `tenserflow=1.15.5` ```bash git clone https://github.com/alibaba/EasyRec.git cd EasyRec -docker pull mybigpai-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py36-tf1.15-0.6.3 -docker run -td --network host -v /local_path/EasyRec:/docker_path/EasyRec mybigpai-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py36-tf1.15-0.6.3 +docker pull mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py36-tf1.15-0.7.4 +docker run -td --network host -v /local_path/EasyRec:/docker_path/EasyRec mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py36-tf1.15-0.7.4 docker exec -it bash ``` @@ -42,7 +44,7 @@ docker exec -it bash git clone https://github.com/alibaba/EasyRec.git cd EasyRec bash scripts/build_docker.sh -sudo docker run -td --network host -v /local_path:/docker_path mybigpai-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py36-tf1.15- +sudo docker run -td --network host -v /local_path:/docker_path mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py36-tf1.15- sudo docker exec -it bash ``` @@ -52,7 +54,7 @@ sudo docker exec -it bash 输入一般是csv格式的文件。 -#### 示例数据 +#### 示例数据(点击下载) - train: [dwd_avazu_ctr_deepmodel_train.csv](http://easyrec.oss-cn-beijing.aliyuncs.com/data/dwd_avazu_ctr_deepmodel_train.csv) - test: [dwd_avazu_ctr_deepmodel_test.csv](http://easyrec.oss-cn-beijing.aliyuncs.com/data/dwd_avazu_ctr_deepmodel_test.csv) diff --git a/docs/source/quick_start/mc_tutorial_inner.md b/docs/source/quick_start/mc_tutorial_inner.md index 04940ad8c..d04dc2e8d 100644 --- a/docs/source/quick_start/mc_tutorial_inner.md +++ b/docs/source/quick_start/mc_tutorial_inner.md @@ -34,7 +34,7 @@ pai -name easy_rec_ext -project algo_public -Dconfig=oss://easyrec/config/MultiTower/dwd_avazu_ctr_deepmodel_ext.config -Dtrain_tables='odps://pai_online_project/tables/dwd_avazu_ctr_deepmodel_train' -Deval_tables='odps://pai_online_project/tables/dwd_avazu_ctr_deepmodel_test' --Dcluster='{"ps":{"count":1, "cpu":1000}, "worker" : {"count":3, "cpu":1000, "gpu":100, "memory":40000}}' +-Dcluster='{"ps":{"count":1, "cpu":1000}, "worker" : {"count":3, "cpu":1000, "gpu":0, "memory":40000}}' -Deval_method=separate -Dmodel_dir=oss://easyrec/ckpt/MultiTower -Dbuckets=oss://easyrec/?role_arn=acs:ram::xxx:role/xxx&host=oss-cn-beijing-internal.aliyuncs.com; diff --git a/docs/source/train.md b/docs/source/train.md index 843955e81..85dd4af0b 100644 --- a/docs/source/train.md +++ b/docs/source/train.md @@ -194,9 +194,9 @@ pai -name easy_rec_ext -project algo_public ### 依赖 - 混合并行使用Horovod做底层的通信, 因此需要安装Horovod, 可以直接使用下面的镜像 -- mybigpai-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:sok-tf212-gpus-v5 +- mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:sok-tf212-gpus-v5 ``` - sudo docker run --gpus=all --privileged -v /home/easyrec/:/home/easyrec/ -ti mybigpai-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:sok-tf212-gpus-v5 bash + sudo docker run --gpus=all --privileged -v /home/easyrec/:/home/easyrec/ -ti mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:sok-tf212-gpus-v5 bash ``` ### 配置 diff --git a/easy_rec/python/core/sampler.py b/easy_rec/python/core/sampler.py index 6baee406f..779b30b48 100644 --- a/easy_rec/python/core/sampler.py +++ b/easy_rec/python/core/sampler.py @@ -268,6 +268,7 @@ def __init__(self, def _get_impl(self, ids): ids = np.array(ids, dtype=np.int64) + ids = np.pad(ids, (0, self._batch_size - len(ids)), 'edge') nodes = self._sampler.get(ids) features = self._parse_nodes(nodes) return features @@ -491,7 +492,9 @@ def __init__(self, def _get_impl(self, src_ids, dst_ids): src_ids = np.array(src_ids, dtype=np.int64) + src_ids = np.pad(src_ids, (0, self._batch_size - len(src_ids)), 'edge') dst_ids = np.array(dst_ids, dtype=np.int64) + dst_ids = np.pad(dst_ids, (0, self._batch_size - len(dst_ids)), 'edge') nodes = self._sampler.get(src_ids, dst_ids) features = self._parse_nodes(nodes) return features @@ -571,6 +574,7 @@ def __init__(self, def _get_impl(self, src_ids, dst_ids): src_ids = np.array(src_ids, dtype=np.int64) dst_ids = np.array(dst_ids, dtype=np.int64) + dst_ids = np.pad(dst_ids, (0, self._batch_size - len(dst_ids)), 'edge') nodes = self._neg_sampler.get(dst_ids) neg_features = self._parse_nodes(nodes) sparse_nodes = self._hard_neg_sampler.get(src_ids).layer_nodes(1) @@ -669,8 +673,11 @@ def __init__(self, def _get_impl(self, src_ids, dst_ids): src_ids = np.array(src_ids, dtype=np.int64) + src_ids_padded = np.pad(src_ids, (0, self._batch_size - len(src_ids)), + 'edge') dst_ids = np.array(dst_ids, dtype=np.int64) - nodes = self._neg_sampler.get(src_ids, dst_ids) + dst_ids = np.pad(dst_ids, (0, self._batch_size - len(dst_ids)), 'edge') + nodes = self._neg_sampler.get(src_ids_padded, dst_ids) neg_features = self._parse_nodes(nodes) sparse_nodes = self._hard_neg_sampler.get(src_ids).layer_nodes(1) hard_neg_features, hard_neg_indices = self._parse_sparse_nodes(sparse_nodes) diff --git a/easy_rec/python/layers/keras/__init__.py b/easy_rec/python/layers/keras/__init__.py index 0e59090ce..f029b9c66 100644 --- a/easy_rec/python/layers/keras/__init__.py +++ b/easy_rec/python/layers/keras/__init__.py @@ -1,3 +1,4 @@ +from .attention import Attention from .auxiliary_loss import AuxiliaryLoss from .blocks import MLP from .blocks import Gate diff --git a/easy_rec/python/layers/keras/attention.py b/easy_rec/python/layers/keras/attention.py new file mode 100644 index 000000000..d7f717cb5 --- /dev/null +++ b/easy_rec/python/layers/keras/attention.py @@ -0,0 +1,268 @@ +# -*- encoding:utf-8 -*- +# Copyright (c) Alibaba, Inc. and its affiliates. +"""Attention layers that can be used in sequence DNN/CNN models. + +This file follows the terminology of https://arxiv.org/abs/1706.03762 Figure 2. +Attention is formed by three tensors: Query, Key and Value. +""" +import tensorflow as tf +from tensorflow.python.keras.layers import Layer + + +class Attention(Layer): + """Dot-product attention layer, a.k.a. Luong-style attention. + + Inputs are a list with 2 or 3 elements: + 1. A `query` tensor of shape `(batch_size, Tq, dim)`. + 2. A `value` tensor of shape `(batch_size, Tv, dim)`. + 3. A optional `key` tensor of shape `(batch_size, Tv, dim)`. If none + supplied, `value` will be used as a `key`. + + The calculation follows the steps: + 1. Calculate attention scores using `query` and `key` with shape + `(batch_size, Tq, Tv)`. + 2. Use scores to calculate a softmax distribution with shape + `(batch_size, Tq, Tv)`. + 3. Use the softmax distribution to create a linear combination of `value` + with shape `(batch_size, Tq, dim)`. + + Args: + use_scale: If `True`, will create a scalar variable to scale the + attention scores. + dropout: Float between 0 and 1. Fraction of the units to drop for the + attention scores. Defaults to `0.0`. + seed: A Python integer to use as random seed in case of `dropout`. + score_mode: Function to use to compute attention scores, one of + `{"dot", "concat"}`. `"dot"` refers to the dot product between the + query and key vectors. `"concat"` refers to the hyperbolic tangent + of the concatenation of the `query` and `key` vectors. + + Call Args: + inputs: List of the following tensors: + - `query`: Query tensor of shape `(batch_size, Tq, dim)`. + - `value`: Value tensor of shape `(batch_size, Tv, dim)`. + - `key`: Optional key tensor of shape `(batch_size, Tv, dim)`. If + not given, will use `value` for both `key` and `value`, which is + the most common case. + mask: List of the following tensors: + - `query_mask`: A boolean mask tensor of shape `(batch_size, Tq)`. + If given, the output will be zero at the positions where + `mask==False`. + - `value_mask`: A boolean mask tensor of shape `(batch_size, Tv)`. + If given, will apply the mask such that values at positions + where `mask==False` do not contribute to the result. + return_attention_scores: bool, it `True`, returns the attention scores + (after masking and softmax) as an additional output argument. + training: Python boolean indicating whether the layer should behave in + training mode (adding dropout) or in inference mode (no dropout). + use_causal_mask: Boolean. Set to `True` for decoder self-attention. Adds + a mask such that position `i` cannot attend to positions `j > i`. + This prevents the flow of information from the future towards the + past. Defaults to `False`. + + Output: + Attention outputs of shape `(batch_size, Tq, dim)`. + (Optional) Attention scores after masking and softmax with shape + `(batch_size, Tq, Tv)`. + """ + + def __init__(self, params, name='attention', reuse=None, **kwargs): + super(Attention, self).__init__(name=name, **kwargs) + self.use_scale = params.get_or_default('use_scale', False) + self.scale_by_dim = params.get_or_default('scale_by_dim', False) + self.score_mode = params.get_or_default('score_mode', 'dot') + if self.score_mode not in ['dot', 'concat']: + raise ValueError('Invalid value for argument score_mode. ' + "Expected one of {'dot', 'concat'}. " + 'Received: score_mode=%s' % self.score_mode) + self.dropout = params.get_or_default('dropout', 0.0) + self.seed = params.get_or_default('seed', None) + self.scale = None + self.concat_score_weight = None + self.return_attention_scores = params.get_or_default( + 'return_attention_scores', False) + self.use_causal_mask = params.get_or_default('use_causal_mask', False) + + def build(self, input_shape): + self._validate_inputs(input_shape) + if self.use_scale: + self.scale = self.add_weight( + name='scale', + shape=(), + initializer='ones', + dtype=self.dtype, + trainable=True, + ) + if self.score_mode == 'concat': + self.concat_score_weight = self.add_weight( + name='concat_score_weight', + shape=(), + initializer='ones', + dtype=self.dtype, + trainable=True, + ) + self.built = True + + def _calculate_scores(self, query, key): + """Calculates attention scores as a query-key dot product. + + Args: + query: Query tensor of shape `(batch_size, Tq, dim)`. + key: Key tensor of shape `(batch_size, Tv, dim)`. + + Returns: + Tensor of shape `(batch_size, Tq, Tv)`. + """ + if self.score_mode == 'dot': + scores = tf.matmul(query, tf.transpose(key, [0, 2, 1])) + if self.scale is not None: + scores *= self.scale + elif self.scale_by_dim: + dk = tf.cast(tf.shape(key)[-1], tf.float32) + scores /= tf.math.sqrt(dk) + elif self.score_mode == 'concat': + # Reshape tensors to enable broadcasting. + # Reshape into [batch_size, Tq, 1, dim]. + q_reshaped = tf.expand_dims(query, axis=-2) + # Reshape into [batch_size, 1, Tv, dim]. + k_reshaped = tf.expand_dims(key, axis=-3) + if self.scale is not None: + scores = self.concat_score_weight * tf.reduce_sum( + tf.tanh(self.scale * (q_reshaped + k_reshaped)), axis=-1) + else: + scores = self.concat_score_weight * tf.reduce_sum( + tf.tanh(q_reshaped + k_reshaped), axis=-1) + return scores + + def _apply_scores(self, scores, value, scores_mask=None, training=False): + """Applies attention scores to the given value tensor. + + To use this method in your attention layer, follow the steps: + + * Use `query` tensor of shape `(batch_size, Tq)` and `key` tensor of + shape `(batch_size, Tv)` to calculate the attention `scores`. + * Pass `scores` and `value` tensors to this method. The method applies + `scores_mask`, calculates + `attention_distribution = softmax(scores)`, then returns + `matmul(attention_distribution, value). + * Apply `query_mask` and return the result. + + Args: + scores: Scores float tensor of shape `(batch_size, Tq, Tv)`. + value: Value tensor of shape `(batch_size, Tv, dim)`. + scores_mask: A boolean mask tensor of shape `(batch_size, 1, Tv)` + or `(batch_size, Tq, Tv)`. If given, scores at positions where + `scores_mask==False` do not contribute to the result. It must + contain at least one `True` value in each line along the last + dimension. + training: Python boolean indicating whether the layer should behave + in training mode (adding dropout) or in inference mode + (no dropout). + + Returns: + Tensor of shape `(batch_size, Tq, dim)`. + Attention scores after masking and softmax with shape + `(batch_size, Tq, Tv)`. + """ + if scores_mask is not None: + padding_mask = tf.logical_not(scores_mask) + # Bias so padding positions do not contribute to attention + # distribution. Note 65504. is the max float16 value. + max_value = 65504.0 if scores.dtype == 'float16' else 1.0e9 + scores -= max_value * tf.cast(padding_mask, dtype=scores.dtype) + + weights = tf.nn.softmax(scores, axis=-1) + if training and self.dropout > 0: + weights = tf.nn.dropout(weights, 1.0 - self.dropout, seed=self.seed) + return tf.matmul(weights, value), weights + + def _calculate_score_mask(self, scores, v_mask, use_causal_mask): + if use_causal_mask: + # Creates a lower triangular mask, so position i cannot attend to + # positions j > i. This prevents the flow of information from the + # future into the past. + score_shape = tf.shape(scores) + # causal_mask_shape = [1, Tq, Tv]. + mask_shape = (1, score_shape[-2], score_shape[-1]) + ones_mask = tf.ones(shape=mask_shape, dtype='int32') + row_index = tf.cumsum(ones_mask, axis=-2) + col_index = tf.cumsum(ones_mask, axis=-1) + causal_mask = tf.greater_equal(row_index, col_index) + + if v_mask is not None: + # Mask of shape [batch_size, 1, Tv]. + v_mask = tf.expand_dims(v_mask, axis=-2) + return tf.logical_and(v_mask, causal_mask) + return causal_mask + else: + # If not using causal mask, return the value mask as is, + # or None if the value mask is not provided. + return v_mask + + def call( + self, + inputs, + mask=None, + training=False, + ): + self._validate_inputs(inputs=inputs, mask=mask) + q = inputs[0] + v = inputs[1] + k = inputs[2] if len(inputs) > 2 else v + q_mask = mask[0] if mask else None + v_mask = mask[1] if mask else None + scores = self._calculate_scores(query=q, key=k) + scores_mask = self._calculate_score_mask(scores, v_mask, + self.use_causal_mask) + result, attention_scores = self._apply_scores( + scores=scores, value=v, scores_mask=scores_mask, training=training) + if q_mask is not None: + # Mask of shape [batch_size, Tq, 1]. + q_mask = tf.expand_dims(q_mask, axis=-1) + result *= tf.cast(q_mask, dtype=result.dtype) + if self.return_attention_scores: + return result, attention_scores + return result + + def compute_mask(self, inputs, mask=None): + self._validate_inputs(inputs=inputs, mask=mask) + if mask is None or mask[0] is None: + return None + return tf.convert_to_tensor(mask[0]) + + def compute_output_shape(self, input_shape): + """Returns shape of value tensor dim, but for query tensor length.""" + return list(input_shape[0][:-1]), input_shape[1][-1] + + def _validate_inputs(self, inputs, mask=None): + """Validates arguments of the call method.""" + class_name = self.__class__.__name__ + if not isinstance(inputs, list): + raise ValueError('{class_name} layer must be called on a list of inputs, ' + 'namely [query, value] or [query, value, key]. ' + 'Received: inputs={inputs}.'.format( + class_name=class_name, inputs=inputs)) + if len(inputs) < 2 or len(inputs) > 3: + raise ValueError('%s layer accepts inputs list of length 2 or 3, ' + 'namely [query, value] or [query, value, key]. ' + 'Received length: %d.' % (class_name, len(inputs))) + if mask is not None: + if not isinstance(mask, list): + raise ValueError( + '{class_name} layer mask must be a list, ' + 'namely [query_mask, value_mask]. Received: mask={mask}.'.format( + class_name=class_name, mask=mask)) + if len(mask) < 2 or len(mask) > 3: + raise ValueError( + '{class_name} layer accepts mask list of length 2 or 3. ' + 'Received: inputs={inputs}, mask={mask}.'.format( + class_name=class_name, inputs=inputs, mask=mask)) + + def get_config(self): + base_config = super(Attention, self).get_config() + config = { + 'use_scale': self.use_scale, + 'score_mode': self.score_mode, + 'dropout': self.dropout, + } + return dict(list(base_config.items()) + list(config.items())) diff --git a/easy_rec/python/layers/keras/blocks.py b/easy_rec/python/layers/keras/blocks.py index 06ce11cbf..13cd14612 100644 --- a/easy_rec/python/layers/keras/blocks.py +++ b/easy_rec/python/layers/keras/blocks.py @@ -4,6 +4,11 @@ import logging import tensorflow as tf +from tensorflow.python.keras.initializers import Constant +from tensorflow.python.keras.layers import Dense +from tensorflow.python.keras.layers import Dropout +from tensorflow.python.keras.layers import Lambda +from tensorflow.python.keras.layers import Layer from easy_rec.python.layers.keras.activation import activation_layer from easy_rec.python.layers.utils import Parameter @@ -14,7 +19,7 @@ tf = tf.compat.v1 -class MLP(tf.keras.layers.Layer): +class MLP(Layer): """Sequential multi-layer perceptron (MLP) block. Attributes: @@ -74,7 +79,7 @@ def add_rich_layer(self, l2_reg=None): act_layer = activation_layer(activation) if use_bn and not use_bn_after_activation: - dense = tf.keras.layers.Dense( + dense = Dense( units=num_units, use_bias=use_bias, kernel_initializer=initializer, @@ -86,7 +91,7 @@ def add_rich_layer(self, self._sub_layers.append(bn) self._sub_layers.append(act_layer) else: - dense = tf.keras.layers.Dense( + dense = Dense( num_units, use_bias=use_bias, kernel_initializer=initializer, @@ -99,7 +104,7 @@ def add_rich_layer(self, self._sub_layers.append(bn) if 0.0 < dropout_rate < 1.0: - dropout = tf.keras.layers.Dropout(dropout_rate, name='%s/dropout' % name) + dropout = Dropout(dropout_rate, name='%s/dropout' % name) self._sub_layers.append(dropout) elif dropout_rate >= 1.0: raise ValueError('invalid dropout_ratio: %.3f' % dropout_rate) @@ -117,31 +122,56 @@ def call(self, x, training=None, **kwargs): return x -class Highway(tf.keras.layers.Layer): +class Highway(Layer): def __init__(self, params, name='highway', reuse=None, **kwargs): super(Highway, self).__init__(name, **kwargs) self.emb_size = params.get_or_default('emb_size', None) self.num_layers = params.get_or_default('num_layers', 1) - self.activation = params.get_or_default('activation', 'gelu') + self.activation = params.get_or_default('activation', 'relu') self.dropout_rate = params.get_or_default('dropout_rate', 0.0) self.init_gate_bias = params.get_or_default('init_gate_bias', -3.0) - self.reuse = reuse + self.act_layer = activation_layer(self.activation) + self.dropout_layer = Dropout( + self.dropout_rate) if self.dropout_rate > 0.0 else None + self.project_layer = None + self.gate_bias_initializer = Constant(self.init_gate_bias) + self.gates = [] # T + self.transforms = [] # H + self.multiply_layer = tf.keras.layers.Multiply() + self.add_layer = tf.keras.layers.Add() + + def build(self, input_shape): + dim = input_shape[-1] + if self.emb_size is not None and dim != self.emb_size: + self.project_layer = Dense(self.emb_size, name='input_projection') + dim = self.emb_size + self.carry_gate = Lambda(lambda x: 1.0 - x, output_shape=(dim,)) + for i in range(self.num_layers): + gate = Dense( + units=dim, + bias_initializer=self.gate_bias_initializer, + activation='sigmoid', + name='gate_%d' % i) + self.gates.append(gate) + self.transforms.append(Dense(units=dim)) def call(self, inputs, training=None, **kwargs): - from easy_rec.python.layers.common_layers import highway - return highway( - inputs, - self.emb_size, - activation=self.activation, - num_layers=self.num_layers, - dropout=self.dropout_rate if training else 0.0, - init_gate_bias=self.init_gate_bias, - scope=self.name, - reuse=self.reuse) - - -class Gate(tf.keras.layers.Layer): + value = inputs + if self.project_layer is not None: + value = self.project_layer(inputs) + for i in range(self.num_layers): + gate = self.gates[i](value) + transformed = self.act_layer(self.transforms[i](value)) + if self.dropout_layer is not None: + transformed = self.dropout_layer(transformed, training=training) + transformed_gated = self.multiply_layer([gate, transformed]) + identity_gated = self.multiply_layer([self.carry_gate(gate), value]) + value = self.add_layer([transformed_gated, identity_gated]) + return value + + +class Gate(Layer): """Weighted sum gate.""" def __init__(self, params, name='gate', reuse=None, **kwargs): @@ -165,7 +195,7 @@ def call(self, inputs, **kwargs): return output -class TextCNN(tf.keras.layers.Layer): +class TextCNN(Layer): """Text CNN Model. References diff --git a/easy_rec/python/model/easy_rec_model.py b/easy_rec/python/model/easy_rec_model.py index e45010553..f2408ba47 100644 --- a/easy_rec/python/model/easy_rec_model.py +++ b/easy_rec/python/model/easy_rec_model.py @@ -120,6 +120,8 @@ def backbone(self): kwargs = { 'loss_dict': self._loss_dict, 'metric_dict': self._metric_dict, + 'prediction_dict': self._prediction_dict, + 'labels': self._labels, constant.SAMPLE_WEIGHT: self._sample_weight } return self._backbone_net(self._is_training, **kwargs) diff --git a/easy_rec/python/model/multi_task_model.py b/easy_rec/python/model/multi_task_model.py index f35148a65..f38d825a1 100644 --- a/easy_rec/python/model/multi_task_model.py +++ b/easy_rec/python/model/multi_task_model.py @@ -4,9 +4,13 @@ from collections import OrderedDict import tensorflow as tf +from google.protobuf import struct_pb2 +from tensorflow.python.keras.layers import Dense from easy_rec.python.builders import loss_builder from easy_rec.python.layers.dnn import DNN +from easy_rec.python.layers.keras.attention import Attention +from easy_rec.python.layers.utils import Parameter from easy_rec.python.model.rank_model import RankModel from easy_rec.python.protos import tower_pb2 from easy_rec.python.protos.easy_rec_model_pb2 import EasyRecModel @@ -82,6 +86,28 @@ def build_predict_graph(self): tower_inputs, axis=-1, name=tower_name + '/relation_input') relation_fea = relation_dnn(relation_input) relation_features[tower_name] = relation_fea + elif task_tower_cfg.use_ait_module: + tower_inputs = [tower_features[tower_name]] + for relation_tower_name in task_tower_cfg.relation_tower_names: + tower_inputs.append(relation_features[relation_tower_name]) + if len(tower_inputs) == 1: + relation_fea = tower_inputs[0] + relation_features[tower_name] = relation_fea + else: + if task_tower_cfg.HasField('ait_project_dim'): + dim = task_tower_cfg.ait_project_dim + else: + dim = int(tower_inputs[0].shape[-1]) + queries = tf.stack([Dense(dim)(x) for x in tower_inputs], axis=1) + keys = tf.stack([Dense(dim)(x) for x in tower_inputs], axis=1) + values = tf.stack([Dense(dim)(x) for x in tower_inputs], axis=1) + st_params = struct_pb2.Struct() + st_params.update({'scale_by_dim': True}) + params = Parameter(st_params, True) + attention_layer = Attention(params, name='AITM_%s' % tower_name) + result = attention_layer([queries, values, keys]) + relation_fea = result[:, 0, :] + relation_features[tower_name] = relation_fea else: relation_fea = tower_features[tower_name] @@ -224,7 +250,17 @@ def build_loss_graph(self): for loss_name in loss_dict.keys(): loss_dict[loss_name] = loss_dict[loss_name] * task_loss_weight[0] else: + calibrate_loss = [] for loss in losses: + if loss.loss_type == LossType.ORDER_CALIBRATE_LOSS: + y_t = self._prediction_dict['probs_%s' % tower_name] + for relation_tower_name in task_tower_cfg.relation_tower_names: + y_rt = self._prediction_dict['probs_%s' % relation_tower_name] + cali_loss = tf.reduce_mean(tf.nn.relu(y_t - y_rt)) + calibrate_loss.append(cali_loss * loss.weight) + logging.info('calibrate loss: %s -> %s' % + (relation_tower_name, tower_name)) + continue loss_param = loss.WhichOneof('loss_param') if loss_param is not None: loss_param = getattr(loss, loss_param) @@ -243,6 +279,10 @@ def build_loss_graph(self): loss.loss_type, loss_name, loss_value) else: loss_dict[loss_name] = loss_value * task_loss_weight[i] + if calibrate_loss: + cali_loss = tf.add_n(calibrate_loss) + loss_dict['order_calibrate_loss'] = cali_loss + tf.summary.scalar('loss/order_calibrate_loss', cali_loss) self._loss_dict.update(loss_dict) kd_loss_dict = loss_builder.build_kd_loss(self.kd, self._prediction_dict, @@ -263,6 +303,8 @@ def get_outputs(self): suffix='_%s' % tower_name)) else: for loss in task_tower_cfg.losses: + if loss.loss_type == LossType.ORDER_CALIBRATE_LOSS: + continue outputs.extend( self._get_outputs_impl( loss.loss_type, diff --git a/easy_rec/python/protos/keras_layer.proto b/easy_rec/python/protos/keras_layer.proto index 3b7c0d34d..a8b92d1a7 100644 --- a/easy_rec/python/protos/keras_layer.proto +++ b/easy_rec/python/protos/keras_layer.proto @@ -26,5 +26,6 @@ message KerasLayer { SequenceAugment seq_aug = 15; PPNet ppnet = 16; TextCNN text_cnn = 17; + HighWayTower highway = 18; } } diff --git a/easy_rec/python/protos/layer.proto b/easy_rec/python/protos/layer.proto index df51009bc..c0a01686a 100644 --- a/easy_rec/python/protos/layer.proto +++ b/easy_rec/python/protos/layer.proto @@ -6,8 +6,10 @@ import "easy_rec/python/protos/dnn.proto"; message HighWayTower { optional string input = 1; required uint32 emb_size = 2; - required string activation = 3 [default = 'gelu']; + required string activation = 3 [default = 'relu']; optional float dropout_rate = 4; + optional float init_gate_bias = 5 [default = -3.0]; + optional uint32 num_layers = 6 [default = 1]; } message PeriodicEmbedding { diff --git a/easy_rec/python/protos/loss.proto b/easy_rec/python/protos/loss.proto index 5c913bf6e..5098518b3 100644 --- a/easy_rec/python/protos/loss.proto +++ b/easy_rec/python/protos/loss.proto @@ -17,6 +17,7 @@ enum LossType { PAIRWISE_FOCAL_LOSS = 11; PAIRWISE_LOGISTIC_LOSS = 12; JRC_LOSS = 13; + ORDER_CALIBRATE_LOSS = 14; } message Loss { diff --git a/easy_rec/python/protos/tower.proto b/easy_rec/python/protos/tower.proto index 3cd6f6253..73df3e6f7 100644 --- a/easy_rec/python/protos/tower.proto +++ b/easy_rec/python/protos/tower.proto @@ -60,7 +60,7 @@ message BayesTaskTower { optional DNN relation_dnn = 8; // training loss weights optional float weight = 9 [default = 1.0]; - // label name for indcating the sample space for the task tower + // label name for indicating the sample space for the task tower optional string task_space_indicator_label = 10; // the loss weight for sample in the task space optional float in_task_space_weight = 11 [default = 1.0]; @@ -74,6 +74,10 @@ message BayesTaskTower { repeated Loss losses = 15; // whether to use sample weight in this tower required bool use_sample_weight = 16 [default = true]; + // whether to use AIT module + optional bool use_ait_module = 17 [default = false]; + // set this when the dimensions of last layer of towers are not equal + optional uint32 ait_project_dim = 18; // training loss label dynamic weights - optional string dynamic_weight = 17; + optional string dynamic_weight = 19; }; diff --git a/easy_rec/python/test/train_eval_test.py b/easy_rec/python/test/train_eval_test.py index f689dcd01..a682e91bc 100644 --- a/easy_rec/python/test/train_eval_test.py +++ b/easy_rec/python/test/train_eval_test.py @@ -650,6 +650,11 @@ def test_tag_kv_input(self): 'samples/model_config/kv_tag.config', self._test_dir) self.assertTrue(self._success) + def test_aitm(self): + self._success = test_utils.test_single_train_eval( + 'samples/model_config/aitm_on_taobao.config', self._test_dir) + self.assertTrue(self._success) + def test_dbmtl(self): self._success = test_utils.test_single_train_eval( 'samples/model_config/dbmtl_on_taobao.config', self._test_dir) diff --git a/easy_rec/version.py b/easy_rec/version.py index 68e35a53c..2ae7769a6 100644 --- a/easy_rec/version.py +++ b/easy_rec/version.py @@ -1,4 +1,4 @@ # -*- encoding:utf-8 -*- # Copyright (c) Alibaba, Inc. and its affiliates. -__version__ = '0.8.0' +__version__ = '0.8.1' diff --git a/examples/readme.md b/examples/readme.md index fd02b6825..bf936cf21 100644 --- a/examples/readme.md +++ b/examples/readme.md @@ -36,12 +36,14 @@ cd EasyRec -- Docker环境可选 (1) `python=3.6.9` + `tenserflow=1.15.5` -docker pull mybigpai-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py36-tf1.15-0.6.3 -docker run -td --network host -v /local_path/EasyRec:/docker_path/EasyRec mybigpai-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py36-tf1.15-0.6.3 +docker pull mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py36-tf1.15-0.7.4 +docker run -td --network host -v /local_path/EasyRec:/docker_path/EasyRec mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py36-tf1.15-0.7.4 +docker exec -it bash + (2) `python=3.8.10` + `tenserflow=2.10.0` -docker pull mybigpai-registry-vpc.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py38-tf2.10-0.6.4 -docker run -td --network host -v /local_path/EasyRec:/docker_path/EasyRec mybigpai-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py38-tf2.10-0.6.4 +docker pull mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py38-tf2.10-0.7.4 +docker run -td --network host -v /local_path/EasyRec:/docker_path/EasyRec mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py38-tf2.10-0.7.4 docker exec -it bash ``` @@ -55,11 +57,11 @@ cd EasyRec -- Docker环境可选 (1) `python=3.6.9` + `tenserflow=1.15.5` bash scripts/build_docker.sh -sudo docker run -td --network host -v /local_path:/docker_path mybigpai-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py36-tf1.15- +sudo docker run -td --network host -v /local_path:/docker_path mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py36-tf1.15- (2) `python=3.8.10` + `tenserflow=2.10.0` bash scripts/build_docker_tf210.sh -sudo docker run -td --network host -v /local_path:/docker_path mybigpai-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py38-tf2.10- +sudo docker run -td --network host -v /local_path:/docker_path mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py38-tf2.10- sudo docker exec -it bash ``` diff --git a/samples/model_config/aitm_on_taobao.config b/samples/model_config/aitm_on_taobao.config new file mode 100644 index 000000000..c67f1d677 --- /dev/null +++ b/samples/model_config/aitm_on_taobao.config @@ -0,0 +1,295 @@ +train_input_path: "data/test/tb_data/taobao_train_data" +eval_input_path: "data/test/tb_data/taobao_test_data" +model_dir: "experiments/aitm_taobao_ckpt" + +train_config { + optimizer_config { + adam_optimizer { + learning_rate { + constant_learning_rate { + learning_rate: 0.0001 + } + } + } + use_moving_average: false + } + num_steps: 500 + sync_replicas: true + save_checkpoints_steps: 100 + log_step_count_steps: 100 +} +data_config { + batch_size: 4096 + label_fields: "clk" + label_fields: "buy" + prefetch_size: 32 + input_type: CSVInput + input_fields { + input_name: "clk" + input_type: INT32 + } + input_fields { + input_name: "buy" + input_type: INT32 + } + input_fields { + input_name: "pid" + input_type: STRING + } + input_fields { + input_name: "adgroup_id" + input_type: STRING + } + input_fields { + input_name: "cate_id" + input_type: STRING + } + input_fields { + input_name: "campaign_id" + input_type: STRING + } + input_fields { + input_name: "customer" + input_type: STRING + } + input_fields { + input_name: "brand" + input_type: STRING + } + input_fields { + input_name: "user_id" + input_type: STRING + } + input_fields { + input_name: "cms_segid" + input_type: STRING + } + input_fields { + input_name: "cms_group_id" + input_type: STRING + } + input_fields { + input_name: "final_gender_code" + input_type: STRING + } + input_fields { + input_name: "age_level" + input_type: STRING + } + input_fields { + input_name: "pvalue_level" + input_type: STRING + } + input_fields { + input_name: "shopping_level" + input_type: STRING + } + input_fields { + input_name: "occupation" + input_type: STRING + } + input_fields { + input_name: "new_user_class_level" + input_type: STRING + } + input_fields { + input_name: "tag_category_list" + input_type: STRING + } + input_fields { + input_name: "tag_brand_list" + input_type: STRING + } + input_fields { + input_name: "price" + input_type: INT32 + } +} +feature_config: { + features { + input_names: "pid" + feature_type: IdFeature + embedding_dim: 16 + hash_bucket_size: 10 + } + features { + input_names: "adgroup_id" + feature_type: IdFeature + embedding_dim: 16 + hash_bucket_size: 100000 + } + features { + input_names: "cate_id" + feature_type: IdFeature + embedding_dim: 16 + hash_bucket_size: 10000 + } + features { + input_names: "campaign_id" + feature_type: IdFeature + embedding_dim: 16 + hash_bucket_size: 100000 + } + features { + input_names: "customer" + feature_type: IdFeature + embedding_dim: 16 + hash_bucket_size: 100000 + } + features { + input_names: "brand" + feature_type: IdFeature + embedding_dim: 16 + hash_bucket_size: 100000 + } + features { + input_names: "user_id" + feature_type: IdFeature + embedding_dim: 16 + hash_bucket_size: 100000 + } + features { + input_names: "cms_segid" + feature_type: IdFeature + embedding_dim: 16 + hash_bucket_size: 100 + } + features { + input_names: "cms_group_id" + feature_type: IdFeature + embedding_dim: 16 + hash_bucket_size: 100 + } + features { + input_names: "final_gender_code" + feature_type: IdFeature + embedding_dim: 16 + hash_bucket_size: 10 + } + features { + input_names: "age_level" + feature_type: IdFeature + embedding_dim: 16 + hash_bucket_size: 10 + } + features { + input_names: "pvalue_level" + feature_type: IdFeature + embedding_dim: 16 + hash_bucket_size: 10 + } + features { + input_names: "shopping_level" + feature_type: IdFeature + embedding_dim: 16 + hash_bucket_size: 10 + } + features { + input_names: "occupation" + feature_type: IdFeature + embedding_dim: 16 + hash_bucket_size: 10 + } + features { + input_names: "new_user_class_level" + feature_type: IdFeature + embedding_dim: 16 + hash_bucket_size: 10 + } + features { + input_names: "tag_category_list" + feature_type: TagFeature + embedding_dim: 16 + hash_bucket_size: 100000 + separator: "|" + } + features { + input_names: "tag_brand_list" + feature_type: TagFeature + embedding_dim: 16 + hash_bucket_size: 100000 + separator: "|" + } + features { + input_names: "price" + feature_type: IdFeature + embedding_dim: 16 + num_buckets: 50 + } +} +model_config { + model_name: "AITM" + model_class: "MultiTaskModel" + feature_groups { + group_name: "all" + feature_names: "user_id" + feature_names: "cms_segid" + feature_names: "cms_group_id" + feature_names: "age_level" + feature_names: "pvalue_level" + feature_names: "shopping_level" + feature_names: "occupation" + feature_names: "new_user_class_level" + feature_names: "adgroup_id" + feature_names: "cate_id" + feature_names: "campaign_id" + feature_names: "customer" + feature_names: "brand" + feature_names: "price" + feature_names: "pid" + feature_names: "tag_category_list" + feature_names: "tag_brand_list" + wide_deep: DEEP + } + backbone { + blocks { + name: "mlp" + inputs { + feature_group_name: "all" + } + keras_layer { + class_name: 'MLP' + mlp { + hidden_units: [512, 256] + } + } + } + } + model_params { + task_towers { + tower_name: "ctr" + label_name: "clk" + loss_type: CLASSIFICATION + metrics_set: { + auc {} + } + dnn { + hidden_units: [256, 128] + } + use_ait_module: true + weight: 1.0 + } + task_towers { + tower_name: "cvr" + label_name: "buy" + losses { + loss_type: CLASSIFICATION + } + losses { + loss_type: ORDER_CALIBRATE_LOSS + } + metrics_set: { + auc {} + } + dnn { + hidden_units: [256, 128] + } + relation_tower_names: ["ctr"] + use_ait_module: true + ait_project_dim: 128 + weight: 1.0 + } + l2_regularization: 1e-6 + } + embedding_regularization: 5e-6 +} diff --git a/scripts/build_docker.sh b/scripts/build_docker.sh index be4113257..16a80775a 100644 --- a/scripts/build_docker.sh +++ b/scripts/build_docker.sh @@ -18,4 +18,4 @@ then exit 1 fi -sudo docker build --network=host . -f docker/Dockerfile -t mybigpai-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py36-tf1.15-${version} +sudo docker build --network=host . -f docker/Dockerfile -t mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py36-tf1.15-${version} diff --git a/scripts/build_docker_tf210.sh b/scripts/build_docker_tf210.sh index 876d6dd06..33bc1a11d 100644 --- a/scripts/build_docker_tf210.sh +++ b/scripts/build_docker_tf210.sh @@ -18,4 +18,4 @@ then exit 1 fi -sudo docker build --progress=plain --network=host . -f docker/Dockerfile_tf210 -t mybigpai-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py38-tf2.10-${version} +sudo docker build --progress=plain --network=host . -f docker/Dockerfile_tf210 -t mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py38-tf2.10-${version}