使用vulkan进行推理时结果不正确 #5909

XingRay · 2025-02-17T16:38:34Z

error log | 日志或报错信息 | ログ

context | 编译/运行环境 | バックグラウンド

windows11

how to reproduce | 复现步骤 | 再現方法

使用cpu进行推理时, 结果是正常的, 但是使用gpu推理时,返回的结果是错误的, 我将运行时的所有的blob输出, 结果发现经过第一个卷积层后输出就不一样了

more | 其他 | その他

主要代码如下:

    bool useGpu = true;
    bool useDebugParam = true;

    SetConsoleOutputCP(CP_UTF8);
    LOG_I("开始 face detection test...");

    // 文件路径配置
    std::string param_path;
    if (useDebugParam) {
        param_path = R"(D:\tmp\ncnn_pytorch\face_detector.ncnn_debug.param)";
    } else {
        param_path = R"(D:\tmp\ncnn_pytorch\face_detector.ncnn.param)";
    }
    std::string bin_path = R"(D:\tmp\ncnn_pytorch\face_detector.ncnn.bin)";

    std::string original_img_path = R"(D:\tmp\image\o\face_image_1080_1920.png)";
    std::string padded_image_save_path = R"(D:\tmp\image\face_detector_ncnn_padded.png)"; // 你可以修改为所需路径

    std::string output_img_path = R"(D:\tmp\image\face_detector_ncnn.png)";
    std::string original_with_detection_output_img_path = R"(D:\tmp\image\face_detector_ncnn_with_original.png)";

    // 加载图像
    cv::Mat originalImg = cv::imread(original_img_path, cv::IMREAD_UNCHANGED);
    if (originalImg.empty()) {
        LOG_E("图片未找到: %s", original_img_path.c_str());
        return -1;
    }

    // 转换通道：如果图像有 4 通道，转换为 RGB；否则从 BGR 转换为 RGB
    if (originalImg.channels() == 4) {
        LOG_D("COLOR_BGRA2RGB");
        cv::cvtColor(originalImg, originalImg, cv::COLOR_BGRA2RGB);
    } else {
        LOG_D("COLOR_BGR2RGB");
        cv::cvtColor(originalImg, originalImg, cv::COLOR_BGR2RGB);
    }

    // 1. letterbox处理后得到 padded 图像，尺寸为 128x128，格式为 RGB
    PaddingParams padding_params{};
    cv::Mat padded = letterbox_padding(originalImg, cv::Size(128, 128), padding_params);

    ncnn::Mat mat_in;
    cv::Mat padded_float;

    if (useDebugParam) {
        mat_in = ncnn::Mat::from_pixels(padded.data, ncnn::Mat::PIXEL_RGB, padded.cols, padded.rows);
        const float norm_vals[3] = {1 / 255.f, 1 / 255.f, 1 / 255.f};
        mat_in.substract_mean_normalize(0, norm_vals);
        mat_in.dims = 4;
    }else {
        padded.convertTo(padded_float, CV_32FC3, 1.0 / 255.0);
        mat_in = ncnn::Mat(3, 128, 128, 1, padded_float.data);
    }
    print_ncnn_mat_shape(mat_in, "mat_in");

    ncnn::Net net;
    if (useGpu) {
        int gpu_count = ncnn::get_gpu_count();
        LOG_D("gpu_count:%d", gpu_count);
        if (gpu_count <= 0) {
            LOG_E("gpu_count<=0");
            return -1;
        }

        LOG_D("use_vulkan_compute");
        net.opt.use_vulkan_compute = true;

        // set specified vulkan device before loading param and model
        // net.set_vulkan_device(0); // use device-0

        net.opt.use_fp16_packed = false;
        net.opt.use_fp16_storage = false;
        net.opt.use_fp16_arithmetic = false;
        net.opt.use_int8_storage = false;
        net.opt.use_int8_arithmetic = false;
    }

    LOG_I("load_param: %s", param_path.c_str());
    if (net.load_param(param_path.c_str()) != 0) {
        LOG_E("加载 param 文件失败");
        return -1;
    }
    LOG_I("load_model: %s", bin_path.c_str());
    if (net.load_model(bin_path.c_str()) != 0) {
        LOG_E("加载 bin 文件失败");
        return -1;
    }

    ncnn::Extractor ex = net.create_extractor();
    // 设置输入节点名称为 "in0"
    LOG_D("ex.input");
    ex.input("in0", mat_in);

    // 执行推理，提取输出 "out0" 和 "out1"
    LOG_D("ex.extract");
    ncnn::Mat regressors, scores;
    ex.extract("out0", regressors);
    ex.extract("out1", scores);
    print_ncnn_mat_shape(regressors, "regressors");
    print_ncnn_mat_shape(scores, "scores");

    int num_regressors = regressors.w * regressors.h * regressors.c; // 896*16
    int num_scores = scores.w * scores.h * scores.c; // 896

    std::vector<float> reg_vec((float *) regressors.data, (float *) regressors.data + num_regressors);
    std::vector<float> score_vec((float *) scores.data, (float *) scores.data + num_scores);

    // 对 score_vec 执行 clip(-100,100) 并计算 sigmoid
    for (auto &s: score_vec) {
        if (s < -100.0f) s = -100.0f;
        if (s > 100.0f) s = 100.0f;
        s = 1.0f / (1.0f + std::exp(-s));
    }
    // 找到最大分数索引
    int max_index = std::distance(score_vec.begin(), std::max_element(score_vec.begin(), score_vec.end()));
    float max_score = score_vec[max_index];
    LOG_I("最大分数: %.4f, 索引: %d", max_score, max_index);

通过flag useGpu 切换使用cpu/gpu 推理
bool useGpu = true;
通过flag useDebugParam 切换是否使用手动调整过的param
bool useDebugParam = true;

模型是使用pnnx将onnx转换成的ncnn模型,
pnnx输出的模型转换输入:

Input                    in0                      0 1 in0
Permute                  permute_56               1 1 in0 1 0=4

手动调整一下可以传入常规的shape的tensor

Input                    in0                      0 1 in0
Permute                  permute_56               1 1 in0 1 0=6

区别是 permute 参数 type 修改

现在的现象是:
当 useGpu = false 时, useDebugParam 为 true/false 都可以正常输出
当 useGpu = true 时, useDebugParam 为 true/false 都可以输出, 但是数值是错误的

完整的项目见附件
ncnn-test.zip

输出的blob部分如下, 前2个blob, 使用cpu和gpu时完全一致, 第三个blob开始产生区别
blob.zip

The text was updated successfully, but these errors were encountered:

nihui · 2025-02-18T03:41:04Z

https://github.com/Tencent/ncnn/wiki/FAQ-ncnn-produce-wrong-result#disable-fp16
尝试禁用fp16测试下

XingRay · 2025-02-18T05:04:29Z

https://github.com/Tencent/ncnn/wiki/FAQ-ncnn-produce-wrong-result#disable-fp16 尝试禁用fp16测试下

已经尝试过启用和禁用下面的选项:

if (gpu_count > 0) {
LOG_D("use_vulkan_compute");
net.opt.use_vulkan_compute = true;

    // set specified vulkan device before loading param and model
    net.set_vulkan_device(0); // use device-0

    net.opt.use_fp16_packed = false;
    net.opt.use_fp16_storage = false;
    net.opt.use_fp16_arithmetic = false;
    net.opt.use_int8_storage = false;
    net.opt.use_int8_arithmetic = false;
}

结果是一样的, 我发现 blob "3" 前一小半部分数值是一样的, 从中间开始有区别,我使用对比工具:

右边可以看到前面一部分是相同的:

blob "3" 是图中这个算子的输出:

XingRay · 2025-02-19T16:10:58Z

我把程序从windows平台移植到android平台, 现象与windows平台运行结果一致:

使用 cpu推理结果正确
使用gpu推理可以返回结果, 但是数据是错误的

使用gpu推理时的日志如下:

00:09:41.055 D COLOR_BGRA2RGB
00:09:41.072 D mat_in shape: c=3, d=1, h=128, w=128, dims=4
00:09:41.073 I QUALCOMM build : fdd61e0, I20154638fb
Build Date : 10/07/20
Shader Compiler Version : EV031.27.05.01
Local Branch :
Remote Branch : refs/tags/AU_LINUX_ANDROID_LA.UM.8.3.R1.10.00.00.520.058
Remote Branch : NONE
Reconstruct Branch : NOTHING
00:09:41.073 I Build Config : S P 8.0.11 AArch64
00:09:41.074 W [0 Adreno (TM) 630] queueC=0[3] queueG=0[3] queueT=0[3]
00:09:41.074 W [0 Adreno (TM) 630] bugsbn1=1 bugbilz=0 bugcopc=0 bugihfa=1
00:09:41.074 W [0 Adreno (TM) 630] fp16-p/s/u/a=1/0/0/0 int8-p/s/u/a=1/0/0/0
00:09:41.074 W [0 Adreno (TM) 630] subgroup=64 basic/vote/ballot/shuffle=1/1/0/0
00:09:41.074 W [0 Adreno (TM) 630] fp16-8x8x16/16x8x8/16x8x16/16x16x16=0/0/0/0
00:09:41.074 D gpu_count:1
00:09:41.074 D use_vulkan_compute
00:09:41.074 I load_param: /storage/emulated/0/test/face_detection/face_detector.ncnn_debug.param
00:09:41.079 I load_model: /storage/emulated/0/test/face_detection/face_detector.ncnn.bin
00:09:44.132 D ex.input
00:09:44.132 D ex.extract
00:09:44.286 D regressors shape: c=1, d=1, h=896, w=16, dims=2
00:09:44.286 D scores shape: c=1, d=1, h=896, w=1, dims=2
00:09:44.287 I 最大分数: 0.3218, 索引: 691
00:09:44.290 I 检测结果保存至: /storage/emulated/0/test/output/face_detector_ncnn.png
00:09:44.465 I 原始图像检测结果保存至: /storage/emulated/0/test/output/face_detector_ncnn_with_original.png

初始化net的代码如下:

ncnn::Net net;
        if (useGpu) {
            int gpu_count = ncnn::get_gpu_count();
            LOG_D("gpu_count:%d", gpu_count);
            if (gpu_count <= 0) {
                LOG_E("gpu_count<=0");
                return;
            }

            LOG_D("use_vulkan_compute");
            net.opt.use_vulkan_compute = true;

            // set specified vulkan device before loading param and model
            // net.set_vulkan_device(0); // use device-0

            net.opt.use_fp16_packed = false;
            net.opt.use_fp16_storage = false;
            net.opt.use_fp16_arithmetic = false;
            net.opt.use_int8_storage = false;
            net.opt.use_int8_arithmetic = false;
        }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

使用vulkan进行推理时结果不正确 #5909

使用vulkan进行推理时结果不正确 #5909

XingRay commented Feb 17, 2025 •

edited

Loading

nihui commented Feb 18, 2025

XingRay commented Feb 18, 2025

XingRay commented Feb 19, 2025 •

edited

Loading

使用vulkan进行推理时结果不正确 #5909

使用vulkan进行推理时结果不正确 #5909

Comments

XingRay commented Feb 17, 2025 • edited Loading

error log | 日志或报错信息 | ログ

context | 编译/运行环境 | バックグラウンド

how to reproduce | 复现步骤 | 再現方法

more | 其他 | その他

nihui commented Feb 18, 2025

XingRay commented Feb 18, 2025

XingRay commented Feb 19, 2025 • edited Loading

XingRay commented Feb 17, 2025 •

edited

Loading

XingRay commented Feb 19, 2025 •

edited

Loading