MediaPipe 系列 14：推理 Calculator——集成 NCNN 模型完整指南（IMS 实战）

前言：为什么选择 NCNN？

14.1 NCNN 的核心优势

NCNN 是腾讯开源的高性能神经网络推理框架：

┌─────────────────────────────────────────────────────────────────────────┐
│                    NCNN 的核心优势                                       │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   ┌─────────────────────────────────────────────────────────┐          │
│   │  NCNN vs TFLite 对比：                                   │          │
│   │                                                         │          │
│   │  特性              NCNN           TFLite                │          │
│   │  ─────────────────────────────────────────────────────  │          │
│   │  部署平台          移动端优化      通用                  │          │
│   │  模型格式          .param+.bin    .tflite               │          │
│   │  模型大小          小             中                     │          │
│   │  量化支持          INT8/INT16     INT8                  │          │
│   │  Vulkan GPU        ✅             ✅                     │          │
│   │  ARM 优化          强             中                     │          │
│   │  IMS 使用          广泛           部分                  │          │
│   │                                                         │          │
│   └─────────────────────────────────────────────────────────┘          │
│                                                                         │
│   IMS DMS 选择 NCNN 的原因：                                             │
│   ┌─────────────────────────────────────────────────────────┐          │
│   │                                                         │          │
│   │   1. 移动端性能优异                                     │          │
│   │      • ARM NEON 深度优化                                │          │
│   │      • 高通平台部署成熟                                 │          │
│   │      • 红外人脸检测 30+ FPS                             │          │
│   │                                                         │          │
│   │   2. 模型部署便捷                                       │          │
│   │      • ONNX 一键转换                                    │          │
│   │      • 无需 TensorFlow 依赖                             │          │
│   │      • 模型文件小（便于 OTA）                           │          │
│   │                                                         │          │
│   │   3. 量化支持完善                                       │          │
│   │      • FP16 量化（精度无损）                            │          │
│   │      • INT8 量化（体积减半）                            │          │
│   │      • 灰度图优化（适合红外）                           │          │
│   │                                                         │          │
│   └─────────────────────────────────────────────────────────┘          │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

14.2 NCNN 架构概览

┌─────────────────────────────────────────────────────────────┐
│                    NCNN 架构概览                               │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   ┌─────────────────────────────────────────────┐              │
│   │         应用层                              │              │
│   │   MediaPipe Calculator / 直接调用           │              │
│   └─────────────────────────────────────────────┘              │
│                          │                                    │
│                          ▼                                    │
│   ┌─────────────────────────────────────────────┐              │
│   │         NCNN API                            │              │
│   │   ncnn::Net, ncnn::Mat, ncnn::Extractor    │              │
│   └─────────────────────────────────────────────┘              │
│                          │                                    │
│                          ▼                                    │
│   ┌─────────────────────────────────────────────┐              │
│   │         Layer 实现                          │              │
│   │   Convolution, Pooling, Activation, ...    │              │
│   └─────────────────────────────────────────────┘              │
│                          │                                    │
│                          ▼                                    │
│   ┌─────────────────────────────────────────────┐              │
│   │         后端加速                            │              │
│   │   CPU (ARM NEON) / GPU (Vulkan) / NPU      │              │
│   └─────────────────────────────────────────────┘              │
│                                                             │
│   核心类：                                                   │
│   ┌─────────────────────────────────────────────┐              │
│   │   ncnn::Net      = 网络容器                 │              │
│   │   ncnn::Mat      = 张量数据                 │              │
│   │   ncnn::Extractor = 推理执行器              │              │
│   │   ncnn::Layer    = 算子基类                 │              │
│   │   ncnn::Option   = 配置选项                 │              │
│   └─────────────────────────────────────────────┘              │
│                                                             │
└─────────────────────────────────────────────────────────────┘

十五、模型转换与量化

15.1 ONNX 转 NCNN

# ========== ONNX 转 NCNN ==========

# 基本转换
./onnx2ncnn model.onnx model.param model.bin

# 输出示例
# model.param - 网络结构（文本格式）
# model.bin   - 权重数据（二进制格式）

# ========== 常见问题处理 ==========

# 1. 不支持的算子
# onnx2ncnn 会输出警告，需要自定义 Layer
# Warning: unsupported op type: SomeOp

# 2. 动态 shape
# NCNN 不支持动态 shape，需要固定输入尺寸
# 使用 pnnx 工具处理复杂模型

# 3. 模型优化
# 使用 ncnnoptimize 优化模型结构
./ncnnoptimize model.param model.bin model_opt.param model_opt.bin 1

# 参数说明：
# 最后一个参数：
#   0 = FP32（保持精度）
#   1 = FP16（推荐，精度基本无损，速度提升）

15.2 模型量化

# ========== INT8 量化 ==========

# 步骤 1：准备校准数据
# 生成校准图像列表（100-500 张）
ls /data/calibration_images/*.jpg > images.txt

# 步骤 2：生成量化表
./ncnn2table model.param model.bin images.txt table.txt \
    --mean=127.5,127.5,127.5 \
    --norm=0.007843,0.007843,0.007843 \
    --shape=320,240 \
    --pixel=RGB

# 参数说明：
# --mean  = 均值（减去）
# --norm  = 归一化（乘以）
# --shape = 输入尺寸（宽,高）
# --pixel = 像素格式（RGB/BGR/GRAY）

# 步骤 3：应用量化
./ncnn2int8 model.param model.bin \
    model_int8.param model_int8.bin table.txt

# ========== FP16 量化（推荐）==========
# 直接使用 ncnnoptimize
./ncnnoptimize model.param model.bin \
    model_fp16.param model_fp16.bin 1

# FP16 vs INT8：
# FP16：精度损失小，速度提升 20-30%
# INT8：精度损失较大，速度提升 50-100%，体积减半

15.3 模型文件说明

# ========== model.param 文件格式 ==========

# 头部
7767517        # 魔数
210 232        # 层数 输入输出 blob 数

# 层定义
# 格式：layer_type layer_name input_count output_count input_blobs output_blobs params
Convolution    conv1    1 1 input conv1 0=32 1=3 2=1 3=1 4=0 5=1 6=3072
Pooling        pool1    1 1 conv1 pool1 0=0 1=2 2=2 3=0 4=0
ReLU           relu1    1 1 pool1 relu1
...

# ========== model.bin 文件格式 ==========
# 二进制格式，存储权重数据
# 不要尝试手动编辑

# ========== 查看模型信息 ==========
# 使用 ncnn model viewer 工具
# 或直接解析 .param 文本文件

十六、Bazel 集成配置

16.1 WORKSPACE 配置

# WORKSPACE

# ========== NCNN 依赖 ==========
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
load("@bazel_tools//tools/build_defs/repo:git.bzl", "git_repository")

# 方式 1：使用 HTTP 下载
http_archive(
    name = "ncnn",
    urls = ["https://github.com/Tencent/ncnn/archive/20240102.tar.gz"],
    sha256 = "your-sha256-here",
    strip_prefix = "ncnn-20240102",
    build_file = "@//third_party:ncnn.BUILD",
)

# 方式 2：使用 Git
git_repository(
    name = "ncnn",
    remote = "https://github.com/Tencent/ncnn.git",
    tag = "20240102",
    build_file = "@//third_party:ncnn.BUILD",
)

# ========== Vulkan 依赖（可选，GPU 加速）==========
new_local_repository(
    name = "vulkan",
    build_file = "@//third_party:vulkan.BUILD",
    path = "/usr/local",  # 或 Vulkan SDK 安装路径
)

16.2 NCNN BUILD 文件

# third_party/ncnn.BUILD

cc_library(
    name = "ncnn",
    srcs = glob([
        "src/*.cpp",
        "src/layer/*.cpp",
        "src/layer/arm/*.cpp",  # ARM 优化
    ]),
    hdrs = glob([
        "src/*.h",
        "src/layer/*.h",
        "src/layer/arm/*.h",
    ]),
    includes = ["src"],
    copts = [
        "-fno-rtti",
        "-fno-exceptions",
        "-DNCNN_USE_THREAD",  # 多线程支持
    ],
    defines = select({
        "//conditions:default": [],
        "@//platform:android_arm64": [
            "NCNN_INTRINSICS",  # ARM NEON
        ],
    }),
    visibility = ["//visibility:public"],
)

# ========== Vulkan 版本 ==========
cc_library(
    name = "ncnn_vulkan",
    srcs = glob([
        "src/*.cpp",
        "src/layer/*.cpp",
        "src/layer/vulkan/*.cpp",  # Vulkan 算子
    ]),
    hdrs = glob([
        "src/*.h",
        "src/layer/*.h",
        "src/layer/vulkan/*.h",
    ]),
    includes = ["src"],
    defines = ["NCNN_VULKAN=1"],
    deps = ["@vulkan//:vulkan"],
    visibility = ["//visibility:public"],
)

16.3 Calculator BUILD

# mediapipe/calculators/ims/BUILD

load("//mediapipe/framework/port:build_config.bzl", "mediapipe_cc_library")

cc_library(
    name = "ncnn_inference_calculator",
    srcs = ["ncnn_inference_calculator.cc"],
    hdrs = ["ncnn_inference_calculator.h"],
    visibility = ["//visibility:public"],
    deps = [
        "//mediapipe/framework:calculator_framework",
        "//mediapipe/framework:calculator_options_cc_proto",
        "//mediapipe/framework/formats:image_frame",
        "//mediapipe/framework/formats:image_frame_opencv",
        "//mediapipe/framework/port:opencv_imgproc",
        "//mediapipe/framework/port:ret_check",
        "//mediapipe/framework/port:status",
        "@ncnn//:ncnn",
        "@com_google_absl//absl/memory",
        "@com_google_absl//absl/strings",
    ],
    alwayslink = 1,
)

# ========== Android 版本 ==========
cc_library(
    name = "ncnn_inference_calculator_android",
    srcs = ["ncnn_inference_calculator.cc"],
    hdrs = ["ncnn_inference_calculator.h"],
    deps = [
        ":ncnn_inference_calculator",
    ],
    defines = [
        "NCNN_USE_THREAD",
        "NCNN_INTRINSICS",
    ],
    alwayslink = 1,
)

十七、NCNN 推理 Calculator 完整实现

17.1 Proto 定义

# ncnn_inference_options.proto
syntax = "proto3";

package mediapipe;

message NCNNInferenceOptions {
  // 输入尺寸
  optional int32 input_width = 1 [default = 320];
  optional int32 input_height = 2 [default = 320];
  
  // 归一化参数
  repeated float mean_vals = 3;   # 均值（减去）
  repeated float norm_vals = 4;   # 归一化（乘以）
  
  // 输入输出名称
  optional string input_name = 5 [default = "input"];
  optional string output_name = 6 [default = "output"];
  
  // 性能配置
  optional int32 num_threads = 7 [default = 4];
  optional bool use_vulkan = 8 [default = false];
  optional bool use_fp16 = 9 [default = true];
  
  // 后处理配置
  optional float score_threshold = 10 [default = 0.5];
  optional float nms_threshold = 11 [default = 0.45];
  optional int32 max_detections = 12 [default = 100];
  
  // 输入格式
  enum PixelFormat {
    RGB = 0;
    BGR = 1;
    GRAY = 2;
  }
  optional PixelFormat pixel_format = 13 [default = RGB];
}

17.2 Calculator 实现

// ncnn_inference_calculator.h
#ifndef MEDIAPIPE_CALCULATORS_IMS_NCNN_INFERENCE_CALCULATOR_H_
#define MEDIAPIPE_CALCULATORS_IMS_NCNN_INFERENCE_CALCULATOR_H_

#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/formats/image_frame.h"
#include "mediapipe/framework/formats/detection.pb.h"
#include "net.h"  // NCNN 头文件

namespace mediapipe {

class NCNNInferenceCalculator : public CalculatorBase {
 public:
  static absl::Status GetContract(CalculatorContract* cc);
  
  absl::Status Open(CalculatorContext* cc) override;
  absl::Status Process(CalculatorContext* cc) override;
  absl::Status Close(CalculatorContext* cc) override;

 private:
  // ========== NCNN 对象 ==========
  ncnn::Net net_;
  std::string input_name_;
  std::string output_name_;
  
  // ========== 配置参数 ==========
  int input_width_ = 320;
  int input_height_ = 320;
  int num_threads_ = 4;
  bool use_vulkan_ = false;
  bool use_fp16_ = true;
  
  float mean_vals_[3] = {0.f, 0.f, 0.f};
  float norm_vals_[3] = {1.f, 1.f, 1.f};
  int pixel_type_ = ncnn::Mat::PIXEL_RGB;
  
  float score_threshold_ = 0.5f;
  float nms_threshold_ = 0.45f;
  int max_detections_ = 100;
  
  // ========== 运行时状态 ==========
  bool model_loaded_ = false;
  int process_count_ = 0;
  
  // ========== 方法 ==========
  absl::Status LoadModel(const std::string& param_path, 
                         const std::string& bin_path);
  ncnn::Mat Preprocess(const ImageFrame& image);
  std::vector<Detection> Postprocess(const ncnn::Mat& output);
  std::vector<Detection> DecodeDetections(const ncnn::Mat& output);
  void NMS(std::vector<Detection>& detections);
};

}  // namespace mediapipe

#endif  // MEDIAPIPE_CALCULATORS_IMS_NCNN_INFERENCE_CALCULATOR_H_

// ncnn_inference_calculator.cc
#include "ncnn_inference_calculator.h"
#include "mediapipe/framework/formats/image_frame_opencv.h"
#include "mediapipe/framework/port/opencv_imgproc.h"
#include "mediapipe/framework/port/ret_check.h"
#include "mediapipe/framework/port/status.h"

namespace mediapipe {

// ========== GetContract ==========
absl::Status NCNNInferenceCalculator::GetContract(CalculatorContract* cc) {
  // 输入
  cc->Inputs().Tag("IMAGE").Set<ImageFrame>();
  
  // Side Packet（模型路径）
  cc->InputSidePackets().Tag("PARAM_PATH").Set<std::string>();
  cc->InputSidePackets().Tag("BIN_PATH").Set<std::string>();
  
  // 输出
  cc->Outputs().Tag("DETECTIONS").Set<std::vector<Detection>>();
  
  // Options
  cc->Options<NCNNInferenceOptions>();
  
  return absl::OkStatus();
}

// ========== Open ==========
absl::Status NCNNInferenceCalculator::Open(CalculatorContext* cc) {
  const auto& options = cc->Options<NCNNInferenceOptions>();
  
  // ========== 读取配置 ==========
  input_width_ = options.input_width();
  input_height_ = options.input_height();
  input_name_ = options.input_name();
  output_name_ = options.output_name();
  num_threads_ = options.num_threads();
  use_vulkan_ = options.use_vulkan();
  use_fp16_ = options.use_fp16();
  score_threshold_ = options.score_threshold();
  nms_threshold_ = options.nms_threshold();
  max_detections_ = options.max_detections();
  
  // 均值和归一化
  if (options.mean_vals_size() == 3) {
    mean_vals_[0] = options.mean_vals(0);
    mean_vals_[1] = options.mean_vals(1);
    mean_vals_[2] = options.mean_vals(2);
  }
  if (options.norm_vals_size() == 3) {
    norm_vals_[0] = options.norm_vals(0);
    norm_vals_[1] = options.norm_vals(1);
    norm_vals_[2] = options.norm_vals(2);
  }
  
  // 像素格式
  switch (options.pixel_format()) {
    case NCNNInferenceOptions::RGB:
      pixel_type_ = ncnn::Mat::PIXEL_RGB;
      break;
    case NCNNInferenceOptions::BGR:
      pixel_type_ = ncnn::Mat::PIXEL_BGR;
      break;
    case NCNNInferenceOptions::GRAY:
      pixel_type_ = ncnn::Mat::PIXEL_GRAY;
      break;
  }
  
  // ========== 加载模型 ==========
  std::string param_path = cc->InputSidePackets().Tag("PARAM_PATH").Get<std::string>();
  std::string bin_path = cc->InputSidePackets().Tag("BIN_PATH").Get<std::string>();
  
  MP_RETURN_IF_ERROR(LoadModel(param_path, bin_path));
  
  // ========== 配置网络选项 ==========
  net_.opt.lightmode = true;              // 轻量模式
  net_.opt.num_threads = num_threads_;    // 线程数
  net_.opt.use_fp16_packed = use_fp16_;   // FP16 打包
  net_.opt.use_fp16_storage = use_fp16_;  // FP16 存储
  net_.opt.use_shader_pack8 = true;       // Shader 打包
  
  // ========== Vulkan GPU 加速 ==========
  if (use_vulkan_) {
    // 初始化 Vulkan
    ncnn::create_gpu_instance();
    int gpu_device = ncnn::get_default_gpu_index();
    net_.set_vulkan_device(gpu_device);
    net_.opt.use_vulkan_compute = true;
    LOG(INFO) << "Vulkan GPU enabled, device: " << gpu_device;
  }
  
  LOG(INFO) << "NCNNInferenceCalculator initialized: "
            << "input_size=" << input_width_ << "x" << input_height_
            << ", threads=" << num_threads_
            << ", vulkan=" << use_vulkan_
            << ", fp16=" << use_fp16_;
  
  return absl::OkStatus();
}

// ========== Process ==========
absl::Status NCNNInferenceCalculator::Process(CalculatorContext* cc) {
  // 检查输入
  if (cc->Inputs().Tag("IMAGE").IsEmpty()) {
    return absl::OkStatus();
  }
  
  // ========== 1. 获取输入图像 ==========
  const ImageFrame& image = cc->Inputs().Tag("IMAGE").Get<ImageFrame>();
  
  // ========== 2. 预处理 ==========
  ncnn::Mat input = Preprocess(image);
  
  // ========== 3. 推理 ==========
  ncnn::Extractor ex = net_.create_extractor();
  ex.set_light_mode(true);
  ex.set_num_threads(num_threads_);
  
  // 设置输入
  ex.input(input_name_.c_str(), input);
  
  // 执行推理
  ncnn::Mat output;
  int ret = ex.extract(output_name_.c_str(), output);
  
  if (ret != 0) {
    LOG(WARNING) << "NCNN extract failed: " << ret;
    return absl::OkStatus();
  }
  
  // ========== 4. 后处理 ==========
  std::vector<Detection> detections = Postprocess(output);
  
  // ========== 5. 输出 ==========
  cc->Outputs().Tag("DETECTIONS").AddPacket(
      MakePacket<std::vector<Detection>>(detections).At(cc->InputTimestamp()));
  
  process_count_++;
  
  return absl::OkStatus();
}

// ========== Close ==========
absl::Status NCNNInferenceCalculator::Close(CalculatorContext* cc) {
  if (use_vulkan_) {
    ncnn::destroy_gpu_instance();
  }
  
  LOG(INFO) << "NCNNInferenceCalculator closed, processed " 
            << process_count_ << " frames";
  
  return absl::OkStatus();
}

// ========== 加载模型 ==========
absl::Status NCNNInferenceCalculator::LoadModel(
    const std::string& param_path, const std::string& bin_path) {
  
  // 加载 .param 文件
  int ret = net_.load_param(param_path.c_str());
  if (ret != 0) {
    return absl::InvalidArgumentError(
        "Failed to load param file: " + param_path + ", error=" + std::to_string(ret));
  }
  
  // 加载 .bin 文件
  ret = net_.load_model(bin_path.c_str());
  if (ret != 0) {
    return absl::InvalidArgumentError(
        "Failed to load model file: " + bin_path + ", error=" + std::to_string(ret));
  }
  
  model_loaded_ = true;
  
  LOG(INFO) << "NCNN model loaded: " << param_path;
  
  return absl::OkStatus();
}

// ========== 预处理 ==========
ncnn::Mat NCNNInferenceCalculator::Preprocess(const ImageFrame& image) {
  // 创建 OpenCV 视图
  cv::Mat mat = formats::MatView(&image);
  
  // 创建 NCNN Mat（自动 resize 和格式转换）
  ncnn::Mat input = ncnn::Mat::from_pixels_resize(
      mat.data,
      pixel_type_,
      mat.cols, mat.rows,
      input_width_, input_height_);
  
  // 归一化
  input.substract_mean_normalize(mean_vals_, norm_vals_);
  
  return input;
}

// ========== 后处理 ==========
std::vector<Detection> NCNNInferenceCalculator::Postprocess(
    const ncnn::Mat& output) {
  
  std::vector<Detection> detections;
  
  // 根据输出格式解码检测结果
  detections = DecodeDetections(output);
  
  // NMS 去重
  NMS(detections);
  
  // 限制数量
  if (detections.size() > max_detections_) {
    detections.resize(max_detections_);
  }
  
  return detections;
}

// ========== 解码检测结果 ==========
std::vector<Detection> NCNNInferenceCalculator::DecodeDetections(
    const ncnn::Mat& output) {
  
  std::vector<Detection> detections;
  
  // 假设输出格式 [N, 6]: [x1, y1, x2, y2, score, class]
  // 实际格式取决于模型
  
  const float* data = (const float*)output.data;
  int num = output.h;  // 检测数量
  
  for (int i = 0; i < num; ++i) {
    const float* det = data + i * 6;
    
    float score = det[4];
    if (score < score_threshold_) {
      continue;
    }
    
    Detection d;
    d.set_xmin(det[0]);
    d.set_ymin(det[1]);
    d.set_xmax(det[2]);
    d.set_ymax(det[3]);
    d.set_score(score);
    d.set_label_id(static_cast<int>(det[5]));
    
    detections.push_back(d);
  }
  
  return detections;
}

// ========== NMS 去重 ==========
void NCNNInferenceCalculator::NMS(std::vector<Detection>& detections) {
  // 按分数排序
  std::sort(detections.begin(), detections.end(),
            [](const Detection& a, const Detection& b) {
              return a.score() > b.score();
            });
  
  std::vector<bool> suppressed(detections.size(), false);
  
  for (size_t i = 0; i < detections.size(); ++i) {
    if (suppressed[i]) continue;
    
    for (size_t j = i + 1; j < detections.size(); ++j) {
      if (suppressed[j]) continue;
      
      // 计算 IoU
      float ix1 = std::max(detections[i].xmin(), detections[j].xmin());
      float iy1 = std::max(detections[i].ymin(), detections[j].ymin());
      float ix2 = std::min(detections[i].xmax(), detections[j].xmax());
      float iy2 = std::min(detections[i].ymax(), detections[j].ymax());
      
      float iw = std::max(0.f, ix2 - ix1);
      float ih = std::max(0.f, iy2 - iy1);
      float inter = iw * ih;
      
      float area_i = (detections[i].xmax() - detections[i].xmin()) *
                     (detections[i].ymax() - detections[i].ymin());
      float area_j = (detections[j].xmax() - detections[j].xmin()) *
                     (detections[j].ymax() - detections[j].ymin());
      
      float iou = inter / (area_i + area_j - inter);
      
      if (iou > nms_threshold_) {
        suppressed[j] = true;
      }
    }
  }
  
  // 移除被抑制的检测
  std::vector<Detection> result;
  for (size_t i = 0; i < detections.size(); ++i) {
    if (!suppressed[i]) {
      result.push_back(detections[i]);
    }
  }
  
  detections = result;
}

REGISTER_CALCULATOR(NCNNInferenceCalculator);

}  // namespace mediapipe

十八、Graph 配置示例

18.1 IMS DMS 人脸检测

# ims_dms_face_detection.pbtxt

input_stream: "IR_IMAGE:ir_image"
output_stream: "FACES:faces"

input_side_packet: "PARAM_PATH:param_path"
input_side_packet: "BIN_PATH:bin_path"

# ========== 流量限制 ==========
node {
  calculator: "FlowLimiterCalculator"
  input_stream: "ir_image"
  input_stream: "faces"
  input_stream_info: { tag_index: "faces" back_edge: true }
  output_stream: "throttled_image"
  options {
    [mediapipe.FlowLimiterCalculatorOptions.ext] {
      max_in_flight: 1
      max_in_queue: 1
    }
  }
}

# ========== 图像预处理 ==========
node {
  calculator: "IRPreprocessCalculator"
  input_stream: "IR_IMAGE:throttled_image"
  output_stream: "PROCESSED:preprocessed"
  options {
    [mediapipe.IRPreprocessOptions.ext] {
      target_width: 320
      target_height: 240
      equalize_histogram: true
    }
  }
}

# ========== NCNN 人脸检测 ==========
node {
  calculator: "NCNNInferenceCalculator"
  input_stream: "IMAGE:preprocessed"
  input_side_packet: "PARAM_PATH:param_path"
  input_side_packet: "BIN_PATH:bin_path"
  output_stream: "DETECTIONS:faces"
  options {
    [mediapipe.NCNNInferenceOptions.ext] {
      input_width: 320
      input_height: 240
      mean_vals: 127.5
      mean_vals: 127.5
      mean_vals: 127.5
      norm_vals: 0.007843  # 1/127.5
      norm_vals: 0.007843
      norm_vals: 0.007843
      input_name: "input.1"
      output_name: "output.1"
      num_threads: 4
      use_vulkan: true
      use_fp16: true
      score_threshold: 0.6
      nms_threshold: 0.45
      max_detections: 50
      pixel_format: RGB
    }
  }
}

18.2 多模型并行

# ========== Executor 配置 ==========
executor {
  name: "cpu_executor"
  type: "ThreadPool"
  options { num_threads: 4 }
}

executor {
  name: "gpu_executor"
  type: "ThreadPool"
  options { num_threads: 2 }
}

# ========== 人脸检测（CPU）==========
node {
  calculator: "NCNNInferenceCalculator"
  input_stream: "IMAGE:image"
  output_stream: "DETECTIONS:face_detections"
  executor: "cpu_executor"
  options {
    [mediapipe.NCNNInferenceOptions.ext] {
      input_width: 320
      input_height: 240
      num_threads: 4
      use_vulkan: false
    }
  }
}

# ========== 关键点检测（GPU）==========
node {
  calculator: "NCNNInferenceCalculator"
  input_stream: "IMAGE:face_crop"
  output_stream: "LANDMARKS:landmarks"
  executor: "gpu_executor"
  options {
    [mediapipe.NCNNInferenceOptions.ext] {
      input_width: 64
      input_height: 64
      num_threads: 2
      use_vulkan: true
    }
  }
}

十九、性能优化技巧

19.1 模型优化

┌─────────────────────────────────────────────────────────────┐
│                    模型优化技巧                               │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   1. 模型量化                                               │
│      ┌─────────────────────────────────────────────┐          │
│      │   FP16：精度损失小，速度提升 20-30%         │          │
│      │   INT8：精度损失中，速度提升 50-100%        │          │
│      │                                             │          │
│      │   推荐：FP16（平衡精度和速度）              │          │
│      └─────────────────────────────────────────────┘          │
│                                                             │
│   2. 输入尺寸                                               │
│      ┌─────────────────────────────────────────────┐          │
│      │   320x240：红外人脸检测常用                 │          │
│      │   64x64：关键点检测                         │          │
│      │                                             │          │
│      │   原则：在精度可接受范围内尽量小            │          │
│      └─────────────────────────────────────────────┘          │
│                                                             │
│   3. 灰度图优化                                             │
│      ┌─────────────────────────────────────────────┐          │
│      │   红外图像是灰度图                          │          │
│      │   使用 GRAY 格式代替 RGB                    │          │
│      │   输入数据量减少 66%                        │          │
│      │   处理速度提升 30-40%                       │          │
│      └─────────────────────────────────────────────┘          │
│                                                             │
│   4. 模型剪枝                                               │
│      ┌─────────────────────────────────────────────┐          │
│      │   去除冗余通道                              │          │
│      │   减少卷积层参数                            │          │
│      │   使用深度可分离卷积                        │          │
│      └─────────────────────────────────────────────┘          │
│                                                             │
└─────────────────────────────────────────────────────────────┘

19.2 运行时优化

// ========== 运行时优化配置 ==========

// 1. 线程数配置
// 根据 CPU 核心数选择最佳线程数
int optimal_threads = std::min(4, std::thread::hardware_concurrency());

// 2. 内存池
// NCNN 内置内存池，自动管理
net_.opt.lightmode = true;

// 3. FP16 计算
// 使用 FP16 存储和计算
net_.opt.use_fp16_packed = true;
net_.opt.use_fp16_storage = true;
net_.opt.use_fp16_arithmetic = true;  // 需要 CPU 支持

// 4. Vulkan GPU
// 高通平台 Vulkan 性能优异
net_.opt.use_vulkan_compute = true;

// 5. Winograd 优化（3x3 卷积）
net_.opt.use_winograd_convolution = true;

// 6. 打包优化
net_.opt.use_packing_layout = true;

二十、总结

要点	说明
模型转换	ONNX → NCNN（onnx2ncnn）
模型量化	FP16（推荐）/ INT8
加载模型	`load_param()` + `load_model()`
预处理	`ncnn::Mat::from_pixels_resize()`
推理	`Extractor::extract()`
GPU 加速	Vulkan `use_vulkan_compute`
线程优化	`num_threads` 配置
FP16 优化	`use_fp16_packed` / `use_fp16_storage`

下篇预告

MediaPipe 系列 15：推理 Calculator——集成 QNN 模型（高通平台）

深入讲解高通 QNN 框架集成，充分利用 NPU 加速。

参考资料

Tencent NCNN. GitHub Repository
NCNN Documentation. Model Quantization
Google AI Edge. MediaPipe Calculator Framework

系列进度： 14/55
更新时间： 2026-03-12

MediaPipe 系列 > Calculator 开发

#IMS #量化 #MediaPipe #高通 #推理 #NCNN #Vulkan

MediaPipe 系列 14：推理 Calculator——集成 NCNN 模型完整指南（IMS 实战）

https://dapalm.com/2026/03/13/MediaPipe系列14-推理Calculator：集成NCNN模型（IMS实战）/

作者

Mars

发布于

2026年3月13日

许可协议

GazeCapsNet详解：轻量级胶囊网络实现实时视线估计上一篇

视线采集硬件详解：从科研眼动仪到车载DMS摄像头下一篇