Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用vulkan进行推理时结果不正确 #5909

Open
XingRay opened this issue Feb 17, 2025 · 3 comments
Open

使用vulkan进行推理时结果不正确 #5909

XingRay opened this issue Feb 17, 2025 · 3 comments

Comments

@XingRay
Copy link

XingRay commented Feb 17, 2025

error log | 日志或报错信息 | ログ

context | 编译/运行环境 | バックグラウンド

windows11

how to reproduce | 复现步骤 | 再現方法

使用cpu进行推理时, 结果是正常的, 但是使用gpu推理时,返回的结果是错误的, 我将运行时的所有的blob输出, 结果发现经过第一个卷积层后输出就不一样了

more | 其他 | その他

主要代码如下:

    bool useGpu = true;
    bool useDebugParam = true;

    SetConsoleOutputCP(CP_UTF8);
    LOG_I("开始 face detection test...");

    // 文件路径配置
    std::string param_path;
    if (useDebugParam) {
        param_path = R"(D:\tmp\ncnn_pytorch\face_detector.ncnn_debug.param)";
    } else {
        param_path = R"(D:\tmp\ncnn_pytorch\face_detector.ncnn.param)";
    }
    std::string bin_path = R"(D:\tmp\ncnn_pytorch\face_detector.ncnn.bin)";

    std::string original_img_path = R"(D:\tmp\image\o\face_image_1080_1920.png)";
    std::string padded_image_save_path = R"(D:\tmp\image\face_detector_ncnn_padded.png)"; // 你可以修改为所需路径

    std::string output_img_path = R"(D:\tmp\image\face_detector_ncnn.png)";
    std::string original_with_detection_output_img_path = R"(D:\tmp\image\face_detector_ncnn_with_original.png)";

    // 加载图像
    cv::Mat originalImg = cv::imread(original_img_path, cv::IMREAD_UNCHANGED);
    if (originalImg.empty()) {
        LOG_E("图片未找到: %s", original_img_path.c_str());
        return -1;
    }

    // 转换通道:如果图像有 4 通道,转换为 RGB;否则从 BGR 转换为 RGB
    if (originalImg.channels() == 4) {
        LOG_D("COLOR_BGRA2RGB");
        cv::cvtColor(originalImg, originalImg, cv::COLOR_BGRA2RGB);
    } else {
        LOG_D("COLOR_BGR2RGB");
        cv::cvtColor(originalImg, originalImg, cv::COLOR_BGR2RGB);
    }

    // 1. letterbox处理后得到 padded 图像,尺寸为 128x128,格式为 RGB
    PaddingParams padding_params{};
    cv::Mat padded = letterbox_padding(originalImg, cv::Size(128, 128), padding_params);

    ncnn::Mat mat_in;
    cv::Mat padded_float;

    if (useDebugParam) {
        mat_in = ncnn::Mat::from_pixels(padded.data, ncnn::Mat::PIXEL_RGB, padded.cols, padded.rows);
        const float norm_vals[3] = {1 / 255.f, 1 / 255.f, 1 / 255.f};
        mat_in.substract_mean_normalize(0, norm_vals);
        mat_in.dims = 4;
    }else {
        padded.convertTo(padded_float, CV_32FC3, 1.0 / 255.0);
        mat_in = ncnn::Mat(3, 128, 128, 1, padded_float.data);
    }
    print_ncnn_mat_shape(mat_in, "mat_in");

    ncnn::Net net;
    if (useGpu) {
        int gpu_count = ncnn::get_gpu_count();
        LOG_D("gpu_count:%d", gpu_count);
        if (gpu_count <= 0) {
            LOG_E("gpu_count<=0");
            return -1;
        }

        LOG_D("use_vulkan_compute");
        net.opt.use_vulkan_compute = true;

        // set specified vulkan device before loading param and model
        // net.set_vulkan_device(0); // use device-0

        net.opt.use_fp16_packed = false;
        net.opt.use_fp16_storage = false;
        net.opt.use_fp16_arithmetic = false;
        net.opt.use_int8_storage = false;
        net.opt.use_int8_arithmetic = false;
    }

    LOG_I("load_param: %s", param_path.c_str());
    if (net.load_param(param_path.c_str()) != 0) {
        LOG_E("加载 param 文件失败");
        return -1;
    }
    LOG_I("load_model: %s", bin_path.c_str());
    if (net.load_model(bin_path.c_str()) != 0) {
        LOG_E("加载 bin 文件失败");
        return -1;
    }

    ncnn::Extractor ex = net.create_extractor();
    // 设置输入节点名称为 "in0"
    LOG_D("ex.input");
    ex.input("in0", mat_in);

    // 执行推理,提取输出 "out0" 和 "out1"
    LOG_D("ex.extract");
    ncnn::Mat regressors, scores;
    ex.extract("out0", regressors);
    ex.extract("out1", scores);
    print_ncnn_mat_shape(regressors, "regressors");
    print_ncnn_mat_shape(scores, "scores");

    int num_regressors = regressors.w * regressors.h * regressors.c; // 896*16
    int num_scores = scores.w * scores.h * scores.c; // 896

    std::vector<float> reg_vec((float *) regressors.data, (float *) regressors.data + num_regressors);
    std::vector<float> score_vec((float *) scores.data, (float *) scores.data + num_scores);

    // 对 score_vec 执行 clip(-100,100) 并计算 sigmoid
    for (auto &s: score_vec) {
        if (s < -100.0f) s = -100.0f;
        if (s > 100.0f) s = 100.0f;
        s = 1.0f / (1.0f + std::exp(-s));
    }
    // 找到最大分数索引
    int max_index = std::distance(score_vec.begin(), std::max_element(score_vec.begin(), score_vec.end()));
    float max_score = score_vec[max_index];
    LOG_I("最大分数: %.4f, 索引: %d", max_score, max_index);

通过flag useGpu 切换使用cpu/gpu 推理
bool useGpu = true;
通过flag useDebugParam 切换是否使用手动调整过的param
bool useDebugParam = true;

模型是使用pnnx将onnx转换成的ncnn模型,
pnnx输出的模型转换输入:

Input                    in0                      0 1 in0
Permute                  permute_56               1 1 in0 1 0=4

手动调整一下可以传入常规的shape的tensor

Input                    in0                      0 1 in0
Permute                  permute_56               1 1 in0 1 0=6

区别是 permute 参数 type 修改

现在的现象是:
当 useGpu = false 时, useDebugParam 为 true/false 都可以正常输出
当 useGpu = true 时, useDebugParam 为 true/false 都可以输出, 但是数值是错误的

完整的项目见附件
ncnn-test.zip

输出的blob部分如下, 前2个blob, 使用cpu和gpu时完全一致, 第三个blob开始产生区别
blob.zip

@nihui
Copy link
Member

nihui commented Feb 18, 2025

@XingRay
Copy link
Author

XingRay commented Feb 18, 2025

https://github.com/Tencent/ncnn/wiki/FAQ-ncnn-produce-wrong-result#disable-fp16 尝试禁用fp16测试下

已经尝试过启用和禁用下面的选项:

if (gpu_count > 0) {
LOG_D("use_vulkan_compute");
net.opt.use_vulkan_compute = true;

    // set specified vulkan device before loading param and model
    net.set_vulkan_device(0); // use device-0

    net.opt.use_fp16_packed = false;
    net.opt.use_fp16_storage = false;
    net.opt.use_fp16_arithmetic = false;
    net.opt.use_int8_storage = false;
    net.opt.use_int8_arithmetic = false;
}

结果是一样的, 我发现 blob "3" 前一小半部分数值是一样的, 从中间开始有区别,我使用对比工具:

Image

右边可以看到前面一部分是相同的:

Image

blob "3" 是图中这个算子的输出:

Image

@XingRay
Copy link
Author

XingRay commented Feb 19, 2025

我把程序从windows平台移植到android平台, 现象与windows平台运行结果一致:

使用 cpu推理结果正确
使用gpu推理可以返回结果, 但是数据是错误的

使用gpu推理时的日志如下:

00:09:41.055 D COLOR_BGRA2RGB
00:09:41.072 D mat_in shape: c=3, d=1, h=128, w=128, dims=4
00:09:41.073 I QUALCOMM build : fdd61e0, I20154638fb
Build Date : 10/07/20
Shader Compiler Version : EV031.27.05.01
Local Branch :
Remote Branch : refs/tags/AU_LINUX_ANDROID_LA.UM.8.3.R1.10.00.00.520.058
Remote Branch : NONE
Reconstruct Branch : NOTHING
00:09:41.073 I Build Config : S P 8.0.11 AArch64
00:09:41.074 W [0 Adreno (TM) 630] queueC=0[3] queueG=0[3] queueT=0[3]
00:09:41.074 W [0 Adreno (TM) 630] bugsbn1=1 bugbilz=0 bugcopc=0 bugihfa=1
00:09:41.074 W [0 Adreno (TM) 630] fp16-p/s/u/a=1/0/0/0 int8-p/s/u/a=1/0/0/0
00:09:41.074 W [0 Adreno (TM) 630] subgroup=64 basic/vote/ballot/shuffle=1/1/0/0
00:09:41.074 W [0 Adreno (TM) 630] fp16-8x8x16/16x8x8/16x8x16/16x16x16=0/0/0/0
00:09:41.074 D gpu_count:1
00:09:41.074 D use_vulkan_compute
00:09:41.074 I load_param: /storage/emulated/0/test/face_detection/face_detector.ncnn_debug.param
00:09:41.079 I load_model: /storage/emulated/0/test/face_detection/face_detector.ncnn.bin
00:09:44.132 D ex.input
00:09:44.132 D ex.extract
00:09:44.286 D regressors shape: c=1, d=1, h=896, w=16, dims=2
00:09:44.286 D scores shape: c=1, d=1, h=896, w=1, dims=2
00:09:44.287 I 最大分数: 0.3218, 索引: 691
00:09:44.290 I 检测结果保存至: /storage/emulated/0/test/output/face_detector_ncnn.png
00:09:44.465 I 原始图像检测结果保存至: /storage/emulated/0/test/output/face_detector_ncnn_with_original.png

初始化net的代码如下:

ncnn::Net net;
        if (useGpu) {
            int gpu_count = ncnn::get_gpu_count();
            LOG_D("gpu_count:%d", gpu_count);
            if (gpu_count <= 0) {
                LOG_E("gpu_count<=0");
                return;
            }

            LOG_D("use_vulkan_compute");
            net.opt.use_vulkan_compute = true;

            // set specified vulkan device before loading param and model
            // net.set_vulkan_device(0); // use device-0

            net.opt.use_fp16_packed = false;
            net.opt.use_fp16_storage = false;
            net.opt.use_fp16_arithmetic = false;
            net.opt.use_int8_storage = false;
            net.opt.use_int8_arithmetic = false;
        }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants