Implement some GB ops #1342

DoorKickers · 2024-08-14T06:08:50Z

Motivation and Context

Description

1.北大国标算子支持

完成了#1306 中7.2.10.3 ~ 7.2.15.4，7.2.4.3 ~ 7.2.11.1共24个待完善/开发的国标算子.

2.修复偶发的CI adam算子测试精度问题，将adam算子实现对齐到torch，参考https://github.com/pytorch/pytorch/blob/ba10259115b1c89292f3499cdba059ebb9ec6b4e/torch/optim/adam.py#L320

DIOPI中进行的修改如下：

diopiError_t diopiAdam(diopiContextHandle_t ctx, diopiTensorHandle_t param, diopiConstTensorHandle_t grad, diopiTensorHandle_t exp_avg,
                       diopiTensorHandle_t exp_avg_sq, diopiTensorHandle_t max_exp_avg_sq, float lr, float beta1, float beta2, float eps, float weight_decay,
                       int64_t step, bool amsgrad) {
    impl::aten::setCurStream(ctx);
    auto atParam = impl::aten::buildATen(param);
    auto atGrad = impl::aten::buildATen(grad);
    auto atExpAvg = impl::aten::buildATen(exp_avg);
    auto atExpAvgSq = impl::aten::buildATen(exp_avg_sq);
    auto atMaxExpAvgSq = impl::aten::buildATen(max_exp_avg_sq);

    auto grad_d = atGrad.data();
    if (weight_decay != 0) {
        grad_d = grad_d.add(atParam, weight_decay);
    }
    // atExpAvg.mul_(beta1).add_(grad_d, 1 - beta1);
    atExpAvg.lerp_(grad_d, 1 - beta1);
    atExpAvgSq.mul_(beta2).addcmul_(grad_d, grad_d.conj(), 1 - beta2);

    at::Tensor denom;
    auto bias_correction1 = 1 - pow(beta1, step);
    auto bias_correction2 = 1 - pow(beta2, step);
    if (amsgrad) {
        CALL_ATEN_CUDA_FUNC(maximum_out, atMaxExpAvgSq, atMaxExpAvgSq, atExpAvgSq);
        denom = atMaxExpAvgSq.sqrt().div_(sqrt(bias_correction2)).add_(eps);
    } else {
        denom = atExpAvgSq.sqrt().div_(sqrt(bias_correction2)).add_(eps);
    }
    auto stepSize = lr / bias_correction1;
    atParam.addcdiv_(atExpAvg, denom, -1 * stepSize);

    return diopiSuccess;
}

Use cases (Optional)

BC-breaking (Optional)

Checklist

Before PR:

I have read and followed the workflow indicated in the Contributors.md to create this PR.
Pre-commit or linting tools indicated in Contributors.md are used to fix the potential lint issues.
Bug fixes are covered by unit tests, the case that causes the bug should be added in the unit tests.
New functionalities are covered by complete unit tests. If not, please add more unit test to ensure the correctness.
The documentation has been modified accordingly, including docstring or example tutorials.

After PR:

CLA has been signed and all committers have signed the CLA in this PR.

…go_function

…tion's save_args param

…_normGB_backward

…ent_some_GB_ops Conflicts: impl/torch/functions/functions.cpp

DoorKickers added 3 commits August 14, 2024 11:37

add expm1 on diopi torch impl

07e5dce

add tan on diopi torch impl

a7fc14d

add acos on diopi torch impl

299455e

DoorKickers requested review from jingguo-st, NeosZhang and yangbofun as code owners August 14, 2024 06:08

DoorKickers marked this pull request as draft August 14, 2024 06:09

DoorKickers added 22 commits August 14, 2024 14:25

fix format & add sinh on diopi torch impl

9a9a39e

add cosh on diopi torch impl

dca4fcc

add asinh on diopi torch impl

766a976

add acosh on diopi torch impl

6f4e9d6

add atanh on diopi torch impl

89168cd

conbine some hyperbolic trigo fucntion diopi test into hyperbolic_tri…

56b3150

…go_function

add argmin on diopi torch impl & del diopi test hyperbolic_trigo_func…

74b8872

…tion's save_args param

add argsort on diopi torch impl

a8c9223

add sort_backward on diopi torch impl

6fd5621

refactor diopi_test for sort_backward & add cumsum_backward

8e0948a

add complex on diopi torch impl

0f2194c

add conj on diopi torch impl

4f4870c

refactor some format & add imag on diopi torch impl

0e6bcc1

add real on diopi torch impl

5eb369e

fix setCurStream & add grid_sample & prepare for diopiPool2d

30fadee

add norm_backward, normalize, normalize_backward, layer_normGB, layer…

dbebe51

…_normGB_backward

prepare for pool1d, pool2d, pool3d

e66858d

finish diopi_pool related

41fad0b

add part of pool1d in diopi_configs.py and diopi_functions.py

de320eb

finish all of pool1d, pool2d and pool3d

d0b25fa

update layer_normGB, add instance_norm_backward

d106821

add conv_transpose3d

87df8ec

DoorKickers marked this pull request as ready for review October 16, 2024 10:36

DoorKickers added 6 commits October 16, 2024 18:40

Merge remote-tracking branch 'upstream/main' into zhanglantian/implem…

d6e63fb

…ent_some_GB_ops Conflicts: impl/torch/functions/functions.cpp

fix clang format

d6eac97

remove conflicted layer_norm

c75f57f

finish comment and fix fused_adam caused by previously merge

68aa0be

remove unused code & try to fix adam

25901cd

add layerNorm GB national standard operator's explanation

d8a1c54

caikun-pjlab approved these changes Oct 17, 2024

View reviewed changes

yangbofun approved these changes Oct 17, 2024

View reviewed changes

caikun-pjlab merged commit 3e82955 into DeepLink-org:main Oct 18, 2024
14 of 17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement some GB ops #1342

Implement some GB ops #1342

DoorKickers commented Aug 14, 2024 •

edited

Loading

Implement some GB ops #1342

Implement some GB ops #1342

Conversation

DoorKickers commented Aug 14, 2024 • edited Loading

Motivation and Context

Description

Use cases (Optional)

BC-breaking (Optional)

Checklist

DoorKickers commented Aug 14, 2024 •

edited

Loading