Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add aitm model #756

Merged
merged 11 commits into from
May 11, 2022
Merged

Add aitm model #756

merged 11 commits into from
May 11, 2022

Conversation

renmada
Copy link
Contributor

@renmada renmada commented May 6, 2022

No description provided.

runner:
train_data_dir: "./data/sample_data/train"
train_reader_path: "reader" # importlib format
use_gpu: True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

demo数据下不建议使用gpu,可以采取减小数据集,减小epoch等方式缩短demo训练时间至一分钟以内

epochs: 6
print_interval: 500
#model_init_path: "output_model/0" # init model
model_save_path: "output_model_aitm/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

保存路径命名可以参考其他模型,改为output_model_aitm_all

@@ -0,0 +1,6 @@
tar zxvf data/sample_train.tar.gz -C data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

数据处理部分可以放在dataset中,在dataset目录中建立新的子目录,放入数据获取脚本

import numpy as np
import paddle
from paddle.io import Dataset

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reader.py可以改名为aitm_reader.py

| 模型 | click auc | purchase auc |batch_size | epoch_num| Time of each epoch |
| :------| :------ | :------ | :------ | :------| :------ |
| aitm | 0.6186 |0.6525 | 2000 | 6| 约3小时 |
详细日志见log文件夹
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

提交pr时应去除log日志

cp -r ./models/rank/aitm/data/sample_data/train/* ./test_tipc/data/train
cp -r ./models/rank/aitm/data/sample_data/test/* ./test_tipc/data/infer
echo "demo data ready"
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

其它模式的数据可以仿照上面的先写上

auto_cast:False
runner.epochs:lite_train_lite_infer=1|whole_train_whole_infer=102|whole_infer=101|lite_train_whole_infer=1
runner.model_save_path
runner.train_batch_size:lite_train_lite_infer=2|whole_train_whole_infer=128|whole_infer=1|lite_train_whole_infer=2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里epoch数及batchsize数应与demo数据模式下对齐

@@ -0,0 +1,51 @@
===========================train_params===========================
model_name:aitm
python:python3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里具体为python3.7

--enable_mkldnn:True|False
--cpu_threads:1|6
--batchsize:1
--enable_tensorRT:False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TRT模式要求有true有false

Copy link
Contributor

@yinhaofeng yinhaofeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要改变提交rank目录下的readme,中英文首页的readme,contribute文档,doc/source下的目录结构与doc/source/models下的文档

@renmada
Copy link
Contributor Author

renmada commented May 10, 2022

已按要求修改

README_EN.md Outdated
@@ -159,7 +159,8 @@ python -u tools/static_trainer.py -m models/rank/dnn/config.yaml # Training wit
| Rank | [FLEN](models/rank/flen/) | - | ✓ | ✓ | >=2.1.0 | [2019][FLEN: Leveraging Field for Scalable CTR Prediction]( https://arxiv.org/pdf/1911.04690.pdf) |
| Rank | [DeepRec](models/rank/deeprec/) | - | ✓ | ✓ | >=2.1.0 | [2017][Training Deep AutoEncoders for Collaborative Filtering](https://arxiv.org/pdf/1708.01715v3.pdf) |
| Rank | [AutoFIS](models/rank/autofis/) | - | ✓ | ✓ | >=2.1.0 | [KDD 2020][AutoFIS: Automatic Feature Interaction Selection in Factorization Models for Click-Through Rate Prediction](https://arxiv.org/pdf/2003.11235v3.pdf) |
| Rank | [DCN_V2](models/rank/dcn_v2/) | - | ✓ | ✓ | >=2.1.0 | [WWW 2021][DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems](https://arxiv.org/pdf/2008.13535v2.pdf)
| Rank | [DCN_V2](models/rank/dcn_v2/) | - | ✓ | ✓ | >=2.1.0 | [WWW 2021][DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems](https://arxiv.org/pdf/2008.13535v2.pdf)
| Rank | [AITM](models/rank/aitm/) | - | ✓ | ✓ | >=2.1.0 | [KDD 2021][Modeling the Sequential Dependence among Audience Multi-step Conversions withMulti-task Learning in Targeted Display Advertising](https://arxiv.org/pdf/2105.08489v2.pdf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

缺少结尾的竖杠

| aitm | 0.6186 |0.6525 | 2000 | 6| 约3小时 |

1. 确认您当前所在目录为PaddleRec/models/rank/aitm
2. 下载数据:[地址](https://tianchi.aliyun.com/datalab/dataSet.html?dataId=408) ,放到data文件下
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

全量数据要在dataset目录下,不能直接放在模型目录的data下面

@@ -159,7 +159,8 @@ python -u tools/static_trainer.py -m models/rank/dnn/config.yaml # Training wit
| Rank | [FLEN](models/rank/flen/) | - | ✓ | ✓ | >=2.1.0 | [2019][FLEN: Leveraging Field for Scalable CTR Prediction]( https://arxiv.org/pdf/1911.04690.pdf) |
| Rank | [DeepRec](models/rank/deeprec/) | - | ✓ | ✓ | >=2.1.0 | [2017][Training Deep AutoEncoders for Collaborative Filtering](https://arxiv.org/pdf/1708.01715v3.pdf) |
| Rank | [AutoFIS](models/rank/autofis/) | - | ✓ | ✓ | >=2.1.0 | [KDD 2020][AutoFIS: Automatic Feature Interaction Selection in Factorization Models for Click-Through Rate Prediction](https://arxiv.org/pdf/2003.11235v3.pdf) |
| Rank | [DCN_V2](models/rank/dcn_v2/) | - | ✓ | ✓ | >=2.1.0 | [WWW 2021][DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems](https://arxiv.org/pdf/2008.13535v2.pdf)
| Rank | [DCN_V2](models/rank/dcn_v2/) | - | ✓ | ✓ | >=2.1.0 | [WWW 2021][DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems](https://arxiv.org/pdf/2008.13535v2.pdf)|
| Rank | [AITM](models/rank/aitm/) | - | ✓ | ✓ | >=2.1.0 | [KDD 2021][Modeling the Sequential Dependence among Audience Multi-step Conversions withMulti-task Learning in Targeted Display Advertising](https://arxiv.org/pdf/2105.08489v2.pdf) |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multitask

@frankwhzhang frankwhzhang merged commit 73c2277 into PaddlePaddle:master May 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants