MediaPipe 系列 15：推理 Calculator——集成 QNN 模型完整指南（高通平台）

前言：为什么选择 QNN？

15.1 QNN 的核心优势

QNN（Qualcomm AI Neural Network）是高通专为移动端优化的推理框架：

┌─────────────────────────────────────────────────────────────────────────┐
│                    QNN 的核心优势                                        │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   ┌─────────────────────────────────────────────────────────┐          │
│   │  QNN vs NCNN vs TFLite 对比：                            │          │
│   │                                                         │          │
│   │  特性              QNN         NCNN        TFLite       │          │
│   │  ─────────────────────────────────────────────────────  │          │
│   │  部署平台          高通专用     通用        通用         │          │
│   │  DSP 加速          ✅          ✅          ❌            │          │
│   │  HTP/NPU 加速      ✅          ❌          ❌            │          │
│   │  推理延迟          极低         低          中           │          │
│   │  功耗              极低         中          中           │          │
│   │  模型格式          .so/.bin    .param+.bin .tflite      │          │
│   │  IMS 使用          主力        辅助        辅助          │          │
│   │                                                         │          │
│   └─────────────────────────────────────────────────────────┘          │
│                                                                         │
│   高通平台性能对比：                                                     │
│   ┌─────────────────────────────────────────────────────────┐          │
│   │                                                         │          │
│   │   平台：Snapdragon 8295（SA8295P）                      │          │
│   │                                                         │          │
│   │   人脸检测模型（320x240）：                              │          │
│   │   ┌─────────────────────────────────────────────┐        │          │
│   │   │   CPU (Kryo):    45ms                       │        │          │
│   │   │   GPU (Adreno):  12ms                       │        │          │
│   │   │   DSP (Hexagon): 8ms                        │        │          │
│   │   │   HTP (NPU):     3ms  ← QNN 推荐            │        │          │
│   │   └─────────────────────────────────────────────┘        │          │
│   │                                                         │          │
│   │   功耗对比：                                             │          │
│   │   ┌─────────────────────────────────────────────┐        │          │
│   │   │   CPU:    1500mW                           │        │          │
│   │   │   GPU:     800mW                           │        │          │
│   │   │   DSP:     400mW                           │        │          │
│   │   │   HTP:     200mW  ← QNN 推荐               │        │          │
│   │   └─────────────────────────────────────────────┘        │          │
│   │                                                         │          │
│   └─────────────────────────────────────────────────────────┘          │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

15.2 QNN 架构概览

┌─────────────────────────────────────────────────────────────┐
│                    QNN 架构概览                               │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   ┌─────────────────────────────────────────────┐              │
│   │         应用层                              │              │
│   │   MediaPipe Calculator / QNN Runtime API   │              │
│   └─────────────────────────────────────────────┘              │
│                          │                                    │
│                          ▼                                    │
│   ┌─────────────────────────────────────────────┐              │
│   │         QNN Runtime                         │              │
│   │   Graph / Context / Tensor / Backend       │              │
│   └─────────────────────────────────────────────┘              │
│                          │                                    │
│                          ▼                                    │
│   ┌─────────────────────────────────────────────┐              │
│   │         Backend（后端）                     │              │
│   │                                             │              │
│   │   ┌─────────┐ ┌─────────┐ ┌─────────┐     │              │
│   │   │   CPU   │ │   DSP   │ │   HTP   │     │              │
│   │   │ (Kryo)  │ │(Hexagon)│ │ (NPU)   │     │              │
│   │   │         │ │         │ │         │     │              │
│   │   │ 后备    │ │ 推荐    │ │ 最优    │     │              │
│   │   └─────────┘ └─────────┘ └─────────┘     │              │
│   │                                             │              │
│   └─────────────────────────────────────────────┘              │
│                                                             │
│   关键概念：                                                 │
│   ┌─────────────────────────────────────────────┐              │
│   │   Qnn_Handle    = QNN 实例句柄              │              │
│   │   Qnn_Context   = 推理上下文                │              │
│   │   Qnn_Graph     = 计算图                    │              │
│   │   Qnn_Tensor    = 张量数据                  │              │
│   │   Qnn_Backend   = 计算后端                  │              │
│   │   Qnn_ErrorCode = 错误码                    │              │
│   └─────────────────────────────────────────────┘              │
│                                                             │
└─────────────────────────────────────────────────────────────┘

十六、QNN SDK 环境配置

16.1 QNN SDK 结构

# ========== QNN SDK 目录结构 ==========

Qualcomm_AI_STACK/
├── QNN/
│   ├── include/                    # 头文件
│   │   ├── QnnInterface.h          # 核心 API
│   │   ├── QnnContext.h            # 上下文 API
│   │   ├── QnnGraph.h              # 图 API
│   │   ├── QnnTensor.h             # 张量 API
│   │   ├── QnnBackend.h            # 后端 API
│   │   └── QnnTypes.h              # 类型定义
│   │
│   ├── lib/                        # 库文件
│   │   ├── cpu/                    # CPU 后端
│   │   │   └── libQnnCpu.so
│   │   ├── dsp/                    # DSP 后端（旧平台）
│   │   │   └── libQnnDsp.so
│   │   ├── htp/                    # HTP 后端（新平台，推荐）
│   │   │   ├── libQnnHtp.so
│   │   │   ├── libQnnHtpPrepare.so
│   │   │   └── libQnnHtpV68Skel.so  # Hexagon v68
│   │   └── QnnSystem.so            # 系统库
│   │
│   ├── bin/                        # 工具
│   │   ├── qnn-onnx-converter      # ONNX 转换
│   │   ├── qnn-tflite-converter    # TFLite 转换
│   │   ├── qnn-net-run             # 推理测试
│   │   └── qnn-profile-viewer      # 性能分析
│   │
│   └── examples/                   # 示例代码
│       ├── QnnSampleApp/
│       └── QnnModelPal/
│
├── models/                         # 示例模型
│   └── inception_v3/
│
└── tools/                          # 辅助工具
    └── op_package_generator/

16.2 Bazel WORKSPACE 配置

# WORKSPACE

# ========== QNN SDK 依赖 ==========
new_local_repository(
    name = "qnn_sdk",
    path = "/path/to/Qualcomm_AI_STACK/QNN",
    build_file = "@//third_party:qnn.BUILD",
)

# ========== Hexagon SDK 依赖（DSP 开发）==========
new_local_repository(
    name = "hexagon_sdk",
    path = "/path/to/Qualcomm_AI_STACK/Hexagon_SDK",
    build_file = "@//third_party:hexagon.BUILD",
)

16.3 QNN BUILD 文件

# third_party/qnn.BUILD

# ========== QNN 核心库 ==========
cc_library(
    name = "qnn_interface",
    hdrs = glob([
        "include/*.h",
        "include/**/*.h",
    ]),
    includes = ["include"],
    visibility = ["//visibility:public"],
)

# ========== CPU 后端 ==========
cc_library(
    name = "qnn_cpu",
    srcs = glob([
        "lib/cpu/*.so",
    ]),
    hdrs = glob([
        "include/*.h",
        "include/**/*.h",
    ]),
    includes = ["include"],
    visibility = ["//visibility:public"],
)

# ========== HTP 后端（推荐）==========
cc_library(
    name = "qnn_htp",
    srcs = glob([
        "lib/htp/*.so",
        "lib/QnnSystem.so",
    ]),
    hdrs = glob([
        "include/*.h",
        "include/**/*.h",
    ]),
    includes = ["include"],
    visibility = ["//visibility:public"],
)

# ========== DSP 后端 ==========
cc_library(
    name = "qnn_dsp",
    srcs = glob([
        "lib/dsp/*.so",
        "lib/QnnSystem.so",
    ]),
    hdrs = glob([
        "include/*.h",
        "include/**/*.h",
    ]),
    includes = ["include"],
    visibility = ["//visibility:public"],
)

16.4 Calculator BUILD

# mediapipe/calculators/ims/BUILD

cc_library(
    name = "qnn_inference_calculator",
    srcs = ["qnn_inference_calculator.cc"],
    hdrs = ["qnn_inference_calculator.h"],
    visibility = ["//visibility:public"],
    deps = [
        "//mediapipe/framework:calculator_framework",
        "//mediapipe/framework:calculator_options_cc_proto",
        "//mediapipe/framework/formats:image_frame",
        "//mediapipe/framework/formats:image_frame_opencv",
        "//mediapipe/framework/port:opencv_imgproc",
        "//mediapipe/framework/port:ret_check",
        "//mediapipe/framework/port:status",
        "@qnn_sdk//:qnn_interface",
        "@qnn_sdk//:qnn_htp",
        "@com_google_absl//absl/memory",
        "@com_google_absl//absl/strings",
    ],
    alwayslink = 1,
)

十七、模型转换与编译

17.1 ONNX 转 QNN

# ========== ONNX 转 QNN ==========

# 步骤 1：转换模型
qnn-onnx-converter \
    --input_model model.onnx \
    --output_path model.cpp \
    --input_dimensions input:1,3,240,320 \
    --output_names output

# 参数说明：
# --input_model      ONNX 模型路径
# --output_path      输出 C++ 文件路径
# --input_dimensions 输入维度（格式: name:batch,channels,height,width）
# --output_names     输出节点名称

# 步骤 2：编译模型（生成 .so 共享库）
qnn-model-lib-generator \
    --model model.cpp \
    --output_dir output/ \
    --lib_name face_detection

# 输出文件：
# output/libface_detection.so    # 模型共享库

# ========== TFLite 转 QNN ==========

qnn-tflite-converter \
    --input_model model.tflite \
    --output_path model.cpp \
    --input_dimensions input:1,3,240,320

qnn-model-lib-generator \
    --model model.cpp \
    --output_dir output/ \
    --lib_name face_detection

17.2 模型量化

# ========== INT8 量化 ==========

# 步骤 1：准备校准数据
# 生成校准数据（Raw 格式）
python3 generate_calibration_data.py \
    --images calibration_images/ \
    --output calibration.raw

# 步骤 2：量化转换
qnn-onnx-converter \
    --input_model model.onnx \
    --output_path model_quantized.cpp \
    --input_dimensions input:1,3,240,320 \
    --quantization_overrides quantization_config.json \
    --input_data_type input:INT8 \
    --output_data_type output:INT8

# quantization_config.json 示例：
# {
#   "activation_encodings": {
#     "input": {"bitwidth": 8, "is_symmetric": false}
#   },
#   "param_encodings": {
#     "weights": {"bitwidth": 8, "is_symmetric": true}
#   }
# }

# 步骤 3：编译量化模型
qnn-model-lib-generator \
    --model model_quantized.cpp \
    --output_dir output/ \
    --lib_name face_detection_int8

17.3 模型优化

# ========== 模型优化 ==========

# 1. 图优化
qnn-model-validator \
    --model model.cpp \
    --optimize

# 2. 后端特定优化
# 针对特定后端生成优化模型
qnn-model-lib-generator \
    --model model.cpp \
    --output_dir output/ \
    --lib_name face_detection_htp \
    --backend HTP

# 3. 性能分析
qnn-net-run \
    --model output/libface_detection.so \
    --backend HTP \
    --input input.raw \
    --profile

十八、QNN Calculator 完整实现

18.1 Proto 定义

# qnn_inference_options.proto
syntax = "proto3";

package mediapipe;

message QNNInferenceOptions {
  // ========== 后端配置 ==========
  enum Backend {
    CPU = 0;
    DSP = 1;
    HTP = 2;    // 推荐
  }
  optional Backend backend = 1 [default = HTP];
  
  // ========== 输入配置 ==========
  optional int32 input_width = 2 [default = 320];
  optional int32 input_height = 3 [default = 240];
  optional int32 input_channels = 4 [default = 3];
  
  // ========== 张量名称 ==========
  optional string input_tensor_name = 5 [default = "input"];
  optional string output_tensor_name = 6 [default = "output"];
  
  // ========== 后处理配置 ==========
  optional float score_threshold = 7 [default = 0.5];
  optional float nms_threshold = 8 [default = 0.45];
  optional int32 max_detections = 9 [default = 100];
  
  // ========== 性能配置 ==========
  optional int32 num_threads = 10 [default = 4];  // CPU 后端线程数
  optional bool enable_profiling = 11 [default = false];
  
  // ========== 调试配置 ==========
  optional bool debug_output = 12 [default = false];
}

18.2 Calculator 头文件

// qnn_inference_calculator.h
#ifndef MEDIAPIPE_CALCULATORS_IMS_QNN_INFERENCE_CALCULATOR_H_
#define MEDIAPIPE_CALCULATORS_IMS_QNN_INFERENCE_CALCULATOR_H_

#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/formats/image_frame.h"
#include "mediapipe/framework/formats/detection.pb.h"

// QNN 头文件
#include "QnnInterface.h"
#include "QnnContext.h"
#include "QnnGraph.h"
#include "QnnTensor.h"
#include "QnnBackend.h"
#include "QnnTypes.h"

namespace mediapipe {

class QNNInferenceCalculator : public CalculatorBase {
 public:
  static absl::Status GetContract(CalculatorContract* cc);
  
  absl::Status Open(CalculatorContext* cc) override;
  absl::Status Process(CalculatorContext* cc) override;
  absl::Status Close(CalculatorContext* cc) override;

 private:
  // ========== QNN 资源 ==========
  void* backend_handle_ = nullptr;       // 后端库句柄
  void* model_handle_ = nullptr;         // 模型库句柄
  Qnn_BackendHandle_t backend_ = nullptr;
  Qnn_ContextHandle_t context_ = nullptr;
  Qnn_GraphHandle_t graph_ = nullptr;
  
  // ========== 张量 ==========
  std::vector<Qnn_Tensor_t> input_tensors_;
  std::vector<Qnn_Tensor_t> output_tensors_;
  
  // ========== 配置参数 ==========
  QNNInferenceOptions::Backend backend_type_ = QNNInferenceOptions::HTP;
  int input_width_ = 320;
  int input_height_ = 240;
  int input_channels_ = 3;
  std::string input_tensor_name_ = "input";
  std::string output_tensor_name_ = "output";
  float score_threshold_ = 0.5f;
  float nms_threshold_ = 0.45f;
  int max_detections_ = 100;
  
  // ========== 运行时状态 ==========
  bool initialized_ = false;
  int process_count_ = 0;
  
  // ========== 方法 ==========
  absl::Status InitializeBackend(const std::string& model_path);
  absl::Status LoadModel(const std::string& model_path);
  absl::Status CreateGraph();
  absl::Status CreateTensors();
  absl::Status AllocateTensorBuffers();
  void FreeTensorBuffers();
  
  std::vector<uint8_t> Preprocess(const ImageFrame& image);
  std::vector<Detection> Postprocess(const std::vector<Qnn_Tensor_t>& outputs);
  
  std::string GetBackendLibraryPath(QNNInferenceOptions::Backend backend);
};

}  // namespace mediapipe

#endif  // MEDIAPIPE_CALCULATORS_IMS_QNN_INFERENCE_CALCULATOR_H_

18.3 Calculator 实现

// qnn_inference_calculator.cc
#include "qnn_inference_calculator.h"
#include "mediapipe/framework/formats/image_frame_opencv.h"
#include "mediapipe/framework/port/opencv_imgproc.h"
#include "mediapipe/framework/port/ret_check.h"
#include "mediapipe/framework/port/status.h"

#include <dlfcn.h>  // dlopen, dlsym

namespace mediapipe {

// ========== GetContract ==========
absl::Status QNNInferenceCalculator::GetContract(CalculatorContract* cc) {
  cc->Inputs().Tag("IMAGE").Set<ImageFrame>();
  cc->InputSidePackets().Tag("MODEL_PATH").Set<std::string>();
  cc->Outputs().Tag("DETECTIONS").Set<std::vector<Detection>>();
  cc->Options<QNNInferenceOptions>();
  return absl::OkStatus();
}

// ========== Open ==========
absl::Status QNNInferenceCalculator::Open(CalculatorContext* cc) {
  const auto& options = cc->Options<QNNInferenceOptions>();
  
  // ========== 读取配置 ==========
  backend_type_ = options.backend();
  input_width_ = options.input_width();
  input_height_ = options.input_height();
  input_channels_ = options.input_channels();
  input_tensor_name_ = options.input_tensor_name();
  output_tensor_name_ = options.output_tensor_name();
  score_threshold_ = options.score_threshold();
  nms_threshold_ = options.nms_threshold();
  max_detections_ = options.max_detections();
  
  // ========== 获取模型路径 ==========
  std::string model_path = cc->InputSidePackets().Tag("MODEL_PATH").Get<std::string>();
  
  // ========== 初始化后端 ==========
  MP_RETURN_IF_ERROR(InitializeBackend(model_path));
  
  // ========== 加载模型 ==========
  MP_RETURN_IF_ERROR(LoadModel(model_path));
  
  // ========== 创建图 ==========
  MP_RETURN_IF_ERROR(CreateGraph());
  
  // ========== 创建张量 ==========
  MP_RETURN_IF_ERROR(CreateTensors());
  
  initialized_ = true;
  
  LOG(INFO) << "QNNInferenceCalculator initialized: "
            << "backend=" << Backend_Name(backend_type_)
            << ", input_size=" << input_width_ << "x" << input_height_;
  
  return absl::OkStatus();
}

// ========== InitializeBackend ==========
absl::Status QNNInferenceCalculator::InitializeBackend(
    const std::string& model_path) {
  
  // ========== 1. 加载后端库 ==========
  std::string backend_lib = GetBackendLibraryPath(backend_type_);
  
  backend_handle_ = dlopen(backend_lib.c_str(), RTLD_NOW);
  RET_CHECK(backend_handle_ != nullptr)
      << "Failed to load QNN backend: " << backend_lib
      << ", error: " << dlerror();
  
  // ========== 2. 获取后端函数 ==========
  typedef Qnn_ErrorCode_t (*QnnBackendInitFunc)(const QnnBackend_Config_t*);
  auto backend_init = (QnnBackendInitFunc)dlsym(backend_handle_, "QnnBackend_initialize");
  
  RET_CHECK(backend_init != nullptr) << "QnnBackend_initialize not found";
  
  // ========== 3. 初始化后端 ==========
  Qnn_ErrorCode_t err = backend_init(nullptr);
  RET_CHECK(err == QNN_SUCCESS) 
      << "Failed to initialize backend: " << err;
  
  // ========== 4. 获取后端 ID ==========
  typedef Qnn_ErrorCode_t (*QnnBackendGetIdFunc)(QnnBackend_Id_t*);
  auto get_id = (QnnBackendGetIdFunc)dlsym(backend_handle_, "QnnBackend_getId");
  
  QnnBackend_Id_t backend_id = 0;
  err = get_id(&backend_id);
  RET_CHECK(err == QNN_SUCCESS) << "Failed to get backend ID: " << err;
  
  LOG(INFO) << "QNN backend initialized: " << backend_lib;
  
  return absl::OkStatus();
}

// ========== LoadModel ==========
absl::Status QNNInferenceCalculator::LoadModel(const std::string& model_path) {
  // ========== 1. 加载模型共享库 ==========
  model_handle_ = dlopen(model_path.c_str(), RTLD_NOW);
  RET_CHECK(model_handle_ != nullptr)
      << "Failed to load model: " << model_path
      << ", error: " << dlerror();
  
  // ========== 2. 获取模型函数 ==========
  typedef Qnn_ErrorCode_t (*ModelComposeFunc)(
      Qnn_BackendHandle_t, Qnn_ContextHandle_t, Qnn_GraphHandle_t*);
  
  auto model_compose = (ModelComposeFunc)dlsym(model_handle_, "QnnModel_composeGraphs");
  RET_CHECK(model_compose != nullptr) << "QnnModel_composeGraphs not found";
  
  // ========== 3. 创建上下文 ==========
  Qnn_ErrorCode_t err = QnnContext_create(backend_, &context_);
  RET_CHECK(err == QNN_SUCCESS) << "Failed to create context: " << err;
  
  // ========== 4. 组合图 ==========
  err = model_compose(backend_, context_, &graph_);
  RET_CHECK(err == QNN_SUCCESS) << "Failed to compose graph: " << err;
  
  LOG(INFO) << "QNN model loaded: " << model_path;
  
  return absl::OkStatus();
}

// ========== CreateGraph ==========
absl::Status QNNInferenceCalculator::CreateGraph() {
  // ========== 1. 创建图配置 ==========
  QnnGraph_Config_t graph_config;
  memset(&graph_config, 0, sizeof(graph_config));
  graph_config.option = QNN_GRAPH_CONFIG_OPTION_NAME;
  graph_config.name = "dms_graph";
  
  // ========== 2. 创建图 ==========
  Qnn_ErrorCode_t err = QnnGraph_create(
      context_, 
      &graph_config, 
      &graph_);
  
  RET_CHECK(err == QNN_SUCCESS) << "Failed to create graph: " << err;
  
  // ========== 3. 完成图 ==========
  err = QnnGraph_finalize(graph_);
  RET_CHECK(err == QNN_SUCCESS) << "Failed to finalize graph: " << err;
  
  return absl::OkStatus();
}

// ========== CreateTensors ==========
absl::Status QNNInferenceCalculator::CreateTensors() {
  // ========== 1. 创建输入张量 ==========
  Qnn_Tensor_t input_tensor;
  memset(&input_tensor, 0, sizeof(input_tensor));
  
  input_tensor.version = QNN_TENSOR_VERSION_1;
  input_tensor.v1.id = 0;
  strncpy(input_tensor.v1.name, input_tensor_name_.c_str(), QNN_MAX_NAME_LEN - 1);
  input_tensor.v1.type = QNN_TENSOR_TYPE_APP_WRITE;
  input_tensor.v1.dataType = QNN_DATATYPE_UFIXED_POINT_8;
  
  // 设置形状
  input_tensor.v1.shape.rank = 4;
  input_tensor.v1.shape.dimensions[0] = 1;           // batch
  input_tensor.v1.shape.dimensions[1] = input_height_;
  input_tensor.v1.shape.dimensions[2] = input_width_;
  input_tensor.v1.shape.dimensions[3] = input_channels_;
  
  input_tensor.v1.memType = QNN_TENSORMEMTYPE_RAW;
  
  // 分配内存
  size_t input_size = 1 * input_height_ * input_width_ * input_channels_;
  input_tensor.v1.mem.raw.memSize = input_size;
  input_tensor.v1.mem.raw.data = malloc(input_size);
  
  RET_CHECK(input_tensor.v1.mem.raw.data != nullptr) << "Failed to allocate input buffer";
  
  input_tensors_.push_back(input_tensor);
  
  // ========== 2. 创建输出张量 ==========
  // 根据模型输出创建
  Qnn_Tensor_t output_tensor;
  memset(&output_tensor, 0, sizeof(output_tensor));
  
  output_tensor.version = QNN_TENSOR_VERSION_1;
  output_tensor.v1.id = 1;
  strncpy(output_tensor.v1.name, output_tensor_name_.c_str(), QNN_MAX_NAME_LEN - 1);
  output_tensor.v1.type = QNN_TENSOR_TYPE_APP_READ;
  output_tensor.v1.dataType = QNN_DATATYPE_FLOAT_32;
  
  // 输出形状根据模型确定
  output_tensor.v1.shape.rank = 2;
  output_tensor.v1.shape.dimensions[0] = 100;  // max detections
  output_tensor.v1.shape.dimensions[1] = 6;    // [x1, y1, x2, y2, score, class]
  
  output_tensor.v1.memType = QNN_TENSORMEMTYPE_RAW;
  
  size_t output_size = 100 * 6 * sizeof(float);
  output_tensor.v1.mem.raw.memSize = output_size;
  output_tensor.v1.mem.raw.data = malloc(output_size);
  
  RET_CHECK(output_tensor.v1.mem.raw.data != nullptr) << "Failed to allocate output buffer";
  
  output_tensors_.push_back(output_tensor);
  
  return absl::OkStatus();
}

// ========== Process ==========
absl::Status QNNInferenceCalculator::Process(CalculatorContext* cc) {
  if (!initialized_) {
    return absl::InternalError("Calculator not initialized");
  }
  
  if (cc->Inputs().Tag("IMAGE").IsEmpty()) {
    return absl::OkStatus();
  }
  
  // ========== 1. 获取输入图像 ==========
  const ImageFrame& image = cc->Inputs().Tag("IMAGE").Get<ImageFrame>();
  
  // ========== 2. 预处理 ==========
  std::vector<uint8_t> input_data = Preprocess(image);
  
  // ========== 3. 复制到输入张量 ==========
  std::memcpy(input_tensors_[0].v1.mem.raw.data, 
              input_data.data(), 
              input_data.size());
  
  // ========== 4. 执行推理 ==========
  Qnn_ErrorCode_t err = QnnGraph_execute(
      graph_,
      input_tensors_.data(), input_tensors_.size(),
      output_tensors_.data(), output_tensors_.size(),
      nullptr, nullptr);
  
  if (err != QNN_SUCCESS) {
    LOG(WARNING) << "QNN execution failed: " << err;
    return absl::OkStatus();
  }
  
  // ========== 5. 后处理 ==========
  std::vector<Detection> detections = Postprocess(output_tensors_);
  
  // ========== 6. 输出 ==========
  cc->Outputs().Tag("DETECTIONS").AddPacket(
      MakePacket<std::vector<Detection>>(detections).At(cc->InputTimestamp()));
  
  process_count_++;
  
  return absl::OkStatus();
}

// ========== Close ==========
absl::Status QNNInferenceCalculator::Close(CalculatorContext* cc) {
  // ========== 1. 释放张量内存 ==========
  FreeTensorBuffers();
  
  // ========== 2. 释放图 ==========
  if (graph_) {
    QnnGraph_free(graph_);
    graph_ = nullptr;
  }
  
  // ========== 3. 释放上下文 ==========
  if (context_) {
    QnnContext_free(context_);
    context_ = nullptr;
  }
  
  // ========== 4. 关闭模型库 ==========
  if (model_handle_) {
    dlclose(model_handle_);
    model_handle_ = nullptr;
  }
  
  // ========== 5. 关闭后端库 ==========
  if (backend_handle_) {
    dlclose(backend_handle_);
    backend_handle_ = nullptr;
  }
  
  LOG(INFO) << "QNNInferenceCalculator closed, processed " 
            << process_count_ << " frames";
  
  return absl::OkStatus();
}

// ========== 辅助方法实现 ==========

void QNNInferenceCalculator::FreeTensorBuffers() {
  for (auto& tensor : input_tensors_) {
    if (tensor.v1.mem.raw.data) {
      free(tensor.v1.mem.raw.data);
      tensor.v1.mem.raw.data = nullptr;
    }
  }
  
  for (auto& tensor : output_tensors_) {
    if (tensor.v1.mem.raw.data) {
      free(tensor.v1.mem.raw.data);
      tensor.v1.mem.raw.data = nullptr;
    }
  }
}

std::string QNNInferenceCalculator::GetBackendLibraryPath(
    QNNInferenceOptions::Backend backend) {
  switch (backend) {
    case QNNInferenceOptions::CPU:
      return "/vendor/lib/libQnnCpu.so";
    case QNNInferenceOptions::DSP:
      return "/vendor/lib/libQnnDsp.so";
    case QNNInferenceOptions::HTP:
      return "/vendor/lib/libQnnHtp.so";
    default:
      return "/vendor/lib/libQnnHtp.so";
  }
}

std::vector<uint8_t> QNNInferenceCalculator::Preprocess(const ImageFrame& image) {
  cv::Mat mat = formats::MatView(&image);
  
  // 缩放
  cv::Mat resized;
  cv::resize(mat, resized, cv::Size(input_width_, input_height_));
  
  // 转换为 uint8 vector
  std::vector<uint8_t> data(resized.total() * resized.elemSize());
  std::memcpy(data.data(), resized.data, data.size());
  
  return data;
}

std::vector<Detection> QNNInferenceCalculator::Postprocess(
    const std::vector<Qnn_Tensor_t>& outputs) {
  std::vector<Detection> detections;
  
  if (outputs.empty()) return detections;
  
  const float* data = (const float*)outputs[0].v1.mem.raw.data;
  int num = outputs[0].v1.shape.dimensions[0];
  
  for (int i = 0; i < num; ++i) {
    const float* det = data + i * 6;
    
    float score = det[4];
    if (score < score_threshold_) continue;
    
    Detection d;
    d.set_xmin(det[0]);
    d.set_ymin(det[1]);
    d.set_xmax(det[2]);
    d.set_ymax(det[3]);
    d.set_score(score);
    d.set_label_id(static_cast<int>(det[5]));
    
    detections.push_back(d);
  }
  
  return detections;
}

REGISTER_CALCULATOR(QNNInferenceCalculator);

}  // namespace mediapipe

十九、Graph 配置示例

19.1 IMS DMS 完整配置

# ims_dms_qnn_graph.pbtxt

input_stream: "IR_IMAGE:ir_image"
output_stream: "FACES:faces"

input_side_packet: "MODEL_PATH:model_path"

# ========== Executor 配置 ==========
executor {
  name: "htp_executor"
  type: "ThreadPool"
  options { num_threads: 2 }
}

# ========== 流量限制 ==========
node {
  calculator: "FlowLimiterCalculator"
  input_stream: "ir_image"
  input_stream: "faces"
  input_stream_info: { tag_index: "faces" back_edge: true }
  output_stream: "throttled_image"
}

# ========== QNN 人脸检测 ==========
node {
  calculator: "QNNInferenceCalculator"
  input_stream: "IMAGE:throttled_image"
  input_side_packet: "MODEL_PATH:model_path"
  output_stream: "DETECTIONS:faces"
  executor: "htp_executor"
  options {
    [mediapipe.QNNInferenceOptions.ext] {
      backend: HTP
      input_width: 320
      input_height: 240
      input_channels: 3
      input_tensor_name: "input.1"
      output_tensor_name: "output.1"
      score_threshold: 0.6
      nms_threshold: 0.45
      max_detections: 50
    }
  }
}

二十、DSP 会话问题排查

20.1 常见错误及解决

┌─────────────────────────────────────────────────────────────┐
│                    常见错误排查                               │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   错误 1：QNN_DSP_SESSION_OPEN_FAILED (0x6b)               │
│   ┌─────────────────────────────────────────────┐              │
│   │   原因：fastrpc_shell_unsigned_3 缺失       │              │
│   │                                             │              │
│   │   解决：                                     │              │
│   │   adb push fastrpc_shell_unsigned_3 \       │              │
│   │       /data_fota/ds_ims/models/qnn/.../     │              │
│   │   adb shell chmod 755 fastrpc_shell_*       │              │
│   └─────────────────────────────────────────────┘              │
│                                                             │
│   错误 2：QNN_BACKEND_UNSUPPORTED (0x03)                    │
│   ┌─────────────────────────────────────────────┐              │
│   │   原因：后端不支持或库缺失                   │              │
│   │                                             │              │
│   │   解决：                                     │              │
│   │   • 检查平台是否支持 HTP                   │              │
│   │   • 检查 libQnnHtp.so 是否存在             │              │
│   │   • 尝试使用 DSP 后端                      │              │
│   └─────────────────────────────────────────────┘              │
│                                                             │
│   错误 3：QNN_GRAPH_FINALIZE_FAILED (0x41)                  │
│   ┌─────────────────────────────────────────────┐              │
│   │   原因：图创建失败（算子不支持）             │              │
│   │                                             │              │
│   │   解决：                                     │              │
│   │   • 检查模型是否有不支持的算子             │              │
│   │   • 使用 qnn-model-validator 验证模型      │              │
│   │   • 简化模型结构                           │              │
│   └─────────────────────────────────────────────┘              │
│                                                             │
│   错误 4：内存不足                                           │
│   ┌─────────────────────────────────────────────┐              │
│   │   原因：DSP 内存不足                         │              │
│   │                                             │              │
│   │   解决：                                     │              │
│   │   • 减小输入尺寸                           │              │
│   │   • 使用 INT8 量化                         │              │
│   │   • 检查 DSP 内存使用情况                  │              │
│   └─────────────────────────────────────────────┘              │
│                                                             │
└─────────────────────────────────────────────────────────────┘

20.2 调试命令

# ========== 检查 QNN 环境 ==========

# 检查后端库
adb shell ls -la /vendor/lib/libQnn*

# 检查 DSP 状态
adb shell cat /sys/kernel/debug/ion/heaps/system

# 检查 FastRPC
adb shell ls -la /dev/ion
adb shell ls -la /dev/fastrpc*

# 检查 QNN 日志
adb logcat -s QNN

# ========== 性能分析 ==========

# 使用 qnn-net-run 测试
adb push qnn-net-run /data/local/tmp/
adb push model.so /data/local/tmp/
adb push input.raw /data/local/tmp/

adb shell /data/local/tmp/qnn-net-run \
    --model /data/local/tmp/model.so \
    --backend HTP \
    --input /data/local/tmp/input.raw \
    --profile

# ========== 内存分析 ==========
adb shell cat /proc/meminfo | grep -i ion

二十一、总结

要点	说明
QNN SDK	高通 AI 推理框架
后端选择	HTP > DSP > CPU
模型转换	ONNX/TFLite → .so
性能优化	INT8 量化、输入尺寸优化
调试	qnn-net-run、logcat
常见问题	fastrpc_shell、内存不足

下篇预告

MediaPipe 系列 16：后处理 Calculator——解析模型输出

深入讲解模型输出解析、NMS 去重、坐标变换等后处理技术。

参考资料

Qualcomm. QNN SDK Documentation
Qualcomm. Hexagon DSP SDK
Google AI Edge. MediaPipe Calculator Framework

系列进度： 15/55
更新时间： 2026-03-12

MediaPipe 系列 > Calculator 开发

#MediaPipe #QNN #高通 #NPU #DSP #Hexagon #HTP

MediaPipe 系列 15：推理 Calculator——集成 QNN 模型完整指南（高通平台）

https://dapalm.com/2026/03/13/MediaPipe系列15-推理Calculator：集成QNN模型（高通平台）/

作者

Mars

发布于

2026年3月13日

许可协议

ISO 26262 功能安全：DMS 软件开发合规指南上一篇

Edge AI 车内监控：隐私保护与实时响应的双赢下一篇