MediaPipe 系列 15:推理 Calculator——集成 QNN 模型完整指南(高通平台)

前言:为什么选择 QNN?

15.1 QNN 的核心优势

QNN(Qualcomm AI Neural Network)是高通专为移动端优化的推理框架:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
┌─────────────────────────────────────────────────────────────────────────┐
QNN 的核心优势
├─────────────────────────────────────────────────────────────────────────┤

┌─────────────────────────────────────────────────────────┐
QNN vs NCNN vs TFLite 对比:

特性 QNN NCNN TFLite
─────────────────────────────────────────────────────
部署平台 高通专用 通用 通用
DSP 加速
HTP/NPU 加速
推理延迟 极低
功耗 极低
模型格式 .so/.bin .param+.bin .tflite
IMS 使用 主力 辅助 辅助

└─────────────────────────────────────────────────────────┘

高通平台性能对比:
┌─────────────────────────────────────────────────────────┐

平台:Snapdragon 8295(SA8295P)

人脸检测模型(320x240):
┌─────────────────────────────────────────────┐
CPU (Kryo): 45ms
GPU (Adreno): 12ms
DSP (Hexagon): 8ms
HTP (NPU): 3ms QNN 推荐
└─────────────────────────────────────────────┘

功耗对比:
┌─────────────────────────────────────────────┐
CPU: 1500mW
GPU: 800mW
DSP: 400mW
HTP: 200mW QNN 推荐
└─────────────────────────────────────────────┘

└─────────────────────────────────────────────────────────┘

└─────────────────────────────────────────────────────────────────────────┘

15.2 QNN 架构概览

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
┌─────────────────────────────────────────────────────────────┐
│ QNN 架构概览 │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────┐ │
│ │ 应用层 │ │
│ │ MediaPipe Calculator / QNN Runtime API │ │
│ └─────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────┐ │
│ │ QNN Runtime │ │
│ │ Graph / Context / Tensor / Backend │ │
│ └─────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────┐ │
│ │ Backend(后端) │ │
│ │ │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │ CPU │ │ DSP │ │ HTP │ │ │
│ │ │ (Kryo) │ │(Hexagon)│ │ (NPU) │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ 后备 │ │ 推荐 │ │ 最优 │ │ │
│ │ └─────────┘ └─────────┘ └─────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────┘ │
│ │
│ 关键概念: │
│ ┌─────────────────────────────────────────────┐ │
│ │ Qnn_Handle = QNN 实例句柄 │ │
│ │ Qnn_Context = 推理上下文 │ │
│ │ Qnn_Graph = 计算图 │ │
│ │ Qnn_Tensor = 张量数据 │ │
│ │ Qnn_Backend = 计算后端 │ │
│ │ Qnn_ErrorCode = 错误码 │ │
│ └─────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘

十六、QNN SDK 环境配置

16.1 QNN SDK 结构

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# ========== QNN SDK 目录结构 ==========

Qualcomm_AI_STACK/
├── QNN/
│ ├── include/ # 头文件
│ │ ├── QnnInterface.h # 核心 API
│ │ ├── QnnContext.h # 上下文 API
│ │ ├── QnnGraph.h # 图 API
│ │ ├── QnnTensor.h # 张量 API
│ │ ├── QnnBackend.h # 后端 API
│ │ └── QnnTypes.h # 类型定义
│ │
│ ├── lib/ # 库文件
│ │ ├── cpu/ # CPU 后端
│ │ │ └── libQnnCpu.so
│ │ ├── dsp/ # DSP 后端(旧平台)
│ │ │ └── libQnnDsp.so
│ │ ├── htp/ # HTP 后端(新平台,推荐)
│ │ │ ├── libQnnHtp.so
│ │ │ ├── libQnnHtpPrepare.so
│ │ │ └── libQnnHtpV68Skel.so # Hexagon v68
│ │ └── QnnSystem.so # 系统库
│ │
│ ├── bin/ # 工具
│ │ ├── qnn-onnx-converter # ONNX 转换
│ │ ├── qnn-tflite-converter # TFLite 转换
│ │ ├── qnn-net-run # 推理测试
│ │ └── qnn-profile-viewer # 性能分析
│ │
│ └── examples/ # 示例代码
│ ├── QnnSampleApp/
│ └── QnnModelPal/

├── models/ # 示例模型
│ └── inception_v3/

└── tools/ # 辅助工具
└── op_package_generator/

16.2 Bazel WORKSPACE 配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# WORKSPACE

# ========== QNN SDK 依赖 ==========
new_local_repository(
name = "qnn_sdk",
path = "/path/to/Qualcomm_AI_STACK/QNN",
build_file = "@//third_party:qnn.BUILD",
)

# ========== Hexagon SDK 依赖(DSP 开发)==========
new_local_repository(
name = "hexagon_sdk",
path = "/path/to/Qualcomm_AI_STACK/Hexagon_SDK",
build_file = "@//third_party:hexagon.BUILD",
)

16.3 QNN BUILD 文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# third_party/qnn.BUILD

# ========== QNN 核心库 ==========
cc_library(
name = "qnn_interface",
hdrs = glob([
"include/*.h",
"include/**/*.h",
]),
includes = ["include"],
visibility = ["//visibility:public"],
)

# ========== CPU 后端 ==========
cc_library(
name = "qnn_cpu",
srcs = glob([
"lib/cpu/*.so",
]),
hdrs = glob([
"include/*.h",
"include/**/*.h",
]),
includes = ["include"],
visibility = ["//visibility:public"],
)

# ========== HTP 后端(推荐)==========
cc_library(
name = "qnn_htp",
srcs = glob([
"lib/htp/*.so",
"lib/QnnSystem.so",
]),
hdrs = glob([
"include/*.h",
"include/**/*.h",
]),
includes = ["include"],
visibility = ["//visibility:public"],
)

# ========== DSP 后端 ==========
cc_library(
name = "qnn_dsp",
srcs = glob([
"lib/dsp/*.so",
"lib/QnnSystem.so",
]),
hdrs = glob([
"include/*.h",
"include/**/*.h",
]),
includes = ["include"],
visibility = ["//visibility:public"],
)

16.4 Calculator BUILD

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# mediapipe/calculators/ims/BUILD

cc_library(
name = "qnn_inference_calculator",
srcs = ["qnn_inference_calculator.cc"],
hdrs = ["qnn_inference_calculator.h"],
visibility = ["//visibility:public"],
deps = [
"//mediapipe/framework:calculator_framework",
"//mediapipe/framework:calculator_options_cc_proto",
"//mediapipe/framework/formats:image_frame",
"//mediapipe/framework/formats:image_frame_opencv",
"//mediapipe/framework/port:opencv_imgproc",
"//mediapipe/framework/port:ret_check",
"//mediapipe/framework/port:status",
"@qnn_sdk//:qnn_interface",
"@qnn_sdk//:qnn_htp",
"@com_google_absl//absl/memory",
"@com_google_absl//absl/strings",
],
alwayslink = 1,
)

十七、模型转换与编译

17.1 ONNX 转 QNN

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# ========== ONNX 转 QNN ==========

# 步骤 1:转换模型
qnn-onnx-converter \
--input_model model.onnx \
--output_path model.cpp \
--input_dimensions input:1,3,240,320 \
--output_names output

# 参数说明:
# --input_model ONNX 模型路径
# --output_path 输出 C++ 文件路径
# --input_dimensions 输入维度(格式: name:batch,channels,height,width)
# --output_names 输出节点名称

# 步骤 2:编译模型(生成 .so 共享库)
qnn-model-lib-generator \
--model model.cpp \
--output_dir output/ \
--lib_name face_detection

# 输出文件:
# output/libface_detection.so # 模型共享库

# ========== TFLite 转 QNN ==========

qnn-tflite-converter \
--input_model model.tflite \
--output_path model.cpp \
--input_dimensions input:1,3,240,320

qnn-model-lib-generator \
--model model.cpp \
--output_dir output/ \
--lib_name face_detection

17.2 模型量化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# ========== INT8 量化 ==========

# 步骤 1:准备校准数据
# 生成校准数据(Raw 格式)
python3 generate_calibration_data.py \
--images calibration_images/ \
--output calibration.raw

# 步骤 2:量化转换
qnn-onnx-converter \
--input_model model.onnx \
--output_path model_quantized.cpp \
--input_dimensions input:1,3,240,320 \
--quantization_overrides quantization_config.json \
--input_data_type input:INT8 \
--output_data_type output:INT8

# quantization_config.json 示例:
# {
# "activation_encodings": {
# "input": {"bitwidth": 8, "is_symmetric": false}
# },
# "param_encodings": {
# "weights": {"bitwidth": 8, "is_symmetric": true}
# }
# }

# 步骤 3:编译量化模型
qnn-model-lib-generator \
--model model_quantized.cpp \
--output_dir output/ \
--lib_name face_detection_int8

17.3 模型优化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# ========== 模型优化 ==========

# 1. 图优化
qnn-model-validator \
--model model.cpp \
--optimize

# 2. 后端特定优化
# 针对特定后端生成优化模型
qnn-model-lib-generator \
--model model.cpp \
--output_dir output/ \
--lib_name face_detection_htp \
--backend HTP

# 3. 性能分析
qnn-net-run \
--model output/libface_detection.so \
--backend HTP \
--input input.raw \
--profile

十八、QNN Calculator 完整实现

18.1 Proto 定义

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# qnn_inference_options.proto
syntax = "proto3";

package mediapipe;

message QNNInferenceOptions {
// ========== 后端配置 ==========
enum Backend {
CPU = 0;
DSP = 1;
HTP = 2; // 推荐
}
optional Backend backend = 1 [default = HTP];

// ========== 输入配置 ==========
optional int32 input_width = 2 [default = 320];
optional int32 input_height = 3 [default = 240];
optional int32 input_channels = 4 [default = 3];

// ========== 张量名称 ==========
optional string input_tensor_name = 5 [default = "input"];
optional string output_tensor_name = 6 [default = "output"];

// ========== 后处理配置 ==========
optional float score_threshold = 7 [default = 0.5];
optional float nms_threshold = 8 [default = 0.45];
optional int32 max_detections = 9 [default = 100];

// ========== 性能配置 ==========
optional int32 num_threads = 10 [default = 4]; // CPU 后端线程数
optional bool enable_profiling = 11 [default = false];

// ========== 调试配置 ==========
optional bool debug_output = 12 [default = false];
}

18.2 Calculator 头文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
// qnn_inference_calculator.h
#ifndef MEDIAPIPE_CALCULATORS_IMS_QNN_INFERENCE_CALCULATOR_H_
#define MEDIAPIPE_CALCULATORS_IMS_QNN_INFERENCE_CALCULATOR_H_

#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/formats/image_frame.h"
#include "mediapipe/framework/formats/detection.pb.h"

// QNN 头文件
#include "QnnInterface.h"
#include "QnnContext.h"
#include "QnnGraph.h"
#include "QnnTensor.h"
#include "QnnBackend.h"
#include "QnnTypes.h"

namespace mediapipe {

class QNNInferenceCalculator : public CalculatorBase {
public:
static absl::Status GetContract(CalculatorContract* cc);

absl::Status Open(CalculatorContext* cc) override;
absl::Status Process(CalculatorContext* cc) override;
absl::Status Close(CalculatorContext* cc) override;

private:
// ========== QNN 资源 ==========
void* backend_handle_ = nullptr; // 后端库句柄
void* model_handle_ = nullptr; // 模型库句柄
Qnn_BackendHandle_t backend_ = nullptr;
Qnn_ContextHandle_t context_ = nullptr;
Qnn_GraphHandle_t graph_ = nullptr;

// ========== 张量 ==========
std::vector<Qnn_Tensor_t> input_tensors_;
std::vector<Qnn_Tensor_t> output_tensors_;

// ========== 配置参数 ==========
QNNInferenceOptions::Backend backend_type_ = QNNInferenceOptions::HTP;
int input_width_ = 320;
int input_height_ = 240;
int input_channels_ = 3;
std::string input_tensor_name_ = "input";
std::string output_tensor_name_ = "output";
float score_threshold_ = 0.5f;
float nms_threshold_ = 0.45f;
int max_detections_ = 100;

// ========== 运行时状态 ==========
bool initialized_ = false;
int process_count_ = 0;

// ========== 方法 ==========
absl::Status InitializeBackend(const std::string& model_path);
absl::Status LoadModel(const std::string& model_path);
absl::Status CreateGraph();
absl::Status CreateTensors();
absl::Status AllocateTensorBuffers();
void FreeTensorBuffers();

std::vector<uint8_t> Preprocess(const ImageFrame& image);
std::vector<Detection> Postprocess(const std::vector<Qnn_Tensor_t>& outputs);

std::string GetBackendLibraryPath(QNNInferenceOptions::Backend backend);
};

} // namespace mediapipe

#endif // MEDIAPIPE_CALCULATORS_IMS_QNN_INFERENCE_CALCULATOR_H_

18.3 Calculator 实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
// qnn_inference_calculator.cc
#include "qnn_inference_calculator.h"
#include "mediapipe/framework/formats/image_frame_opencv.h"
#include "mediapipe/framework/port/opencv_imgproc.h"
#include "mediapipe/framework/port/ret_check.h"
#include "mediapipe/framework/port/status.h"

#include <dlfcn.h> // dlopen, dlsym

namespace mediapipe {

// ========== GetContract ==========
absl::Status QNNInferenceCalculator::GetContract(CalculatorContract* cc) {
cc->Inputs().Tag("IMAGE").Set<ImageFrame>();
cc->InputSidePackets().Tag("MODEL_PATH").Set<std::string>();
cc->Outputs().Tag("DETECTIONS").Set<std::vector<Detection>>();
cc->Options<QNNInferenceOptions>();
return absl::OkStatus();
}

// ========== Open ==========
absl::Status QNNInferenceCalculator::Open(CalculatorContext* cc) {
const auto& options = cc->Options<QNNInferenceOptions>();

// ========== 读取配置 ==========
backend_type_ = options.backend();
input_width_ = options.input_width();
input_height_ = options.input_height();
input_channels_ = options.input_channels();
input_tensor_name_ = options.input_tensor_name();
output_tensor_name_ = options.output_tensor_name();
score_threshold_ = options.score_threshold();
nms_threshold_ = options.nms_threshold();
max_detections_ = options.max_detections();

// ========== 获取模型路径 ==========
std::string model_path = cc->InputSidePackets().Tag("MODEL_PATH").Get<std::string>();

// ========== 初始化后端 ==========
MP_RETURN_IF_ERROR(InitializeBackend(model_path));

// ========== 加载模型 ==========
MP_RETURN_IF_ERROR(LoadModel(model_path));

// ========== 创建图 ==========
MP_RETURN_IF_ERROR(CreateGraph());

// ========== 创建张量 ==========
MP_RETURN_IF_ERROR(CreateTensors());

initialized_ = true;

LOG(INFO) << "QNNInferenceCalculator initialized: "
<< "backend=" << Backend_Name(backend_type_)
<< ", input_size=" << input_width_ << "x" << input_height_;

return absl::OkStatus();
}

// ========== InitializeBackend ==========
absl::Status QNNInferenceCalculator::InitializeBackend(
const std::string& model_path) {

// ========== 1. 加载后端库 ==========
std::string backend_lib = GetBackendLibraryPath(backend_type_);

backend_handle_ = dlopen(backend_lib.c_str(), RTLD_NOW);
RET_CHECK(backend_handle_ != nullptr)
<< "Failed to load QNN backend: " << backend_lib
<< ", error: " << dlerror();

// ========== 2. 获取后端函数 ==========
typedef Qnn_ErrorCode_t (*QnnBackendInitFunc)(const QnnBackend_Config_t*);
auto backend_init = (QnnBackendInitFunc)dlsym(backend_handle_, "QnnBackend_initialize");

RET_CHECK(backend_init != nullptr) << "QnnBackend_initialize not found";

// ========== 3. 初始化后端 ==========
Qnn_ErrorCode_t err = backend_init(nullptr);
RET_CHECK(err == QNN_SUCCESS)
<< "Failed to initialize backend: " << err;

// ========== 4. 获取后端 ID ==========
typedef Qnn_ErrorCode_t (*QnnBackendGetIdFunc)(QnnBackend_Id_t*);
auto get_id = (QnnBackendGetIdFunc)dlsym(backend_handle_, "QnnBackend_getId");

QnnBackend_Id_t backend_id = 0;
err = get_id(&backend_id);
RET_CHECK(err == QNN_SUCCESS) << "Failed to get backend ID: " << err;

LOG(INFO) << "QNN backend initialized: " << backend_lib;

return absl::OkStatus();
}

// ========== LoadModel ==========
absl::Status QNNInferenceCalculator::LoadModel(const std::string& model_path) {
// ========== 1. 加载模型共享库 ==========
model_handle_ = dlopen(model_path.c_str(), RTLD_NOW);
RET_CHECK(model_handle_ != nullptr)
<< "Failed to load model: " << model_path
<< ", error: " << dlerror();

// ========== 2. 获取模型函数 ==========
typedef Qnn_ErrorCode_t (*ModelComposeFunc)(
Qnn_BackendHandle_t, Qnn_ContextHandle_t, Qnn_GraphHandle_t*);

auto model_compose = (ModelComposeFunc)dlsym(model_handle_, "QnnModel_composeGraphs");
RET_CHECK(model_compose != nullptr) << "QnnModel_composeGraphs not found";

// ========== 3. 创建上下文 ==========
Qnn_ErrorCode_t err = QnnContext_create(backend_, &context_);
RET_CHECK(err == QNN_SUCCESS) << "Failed to create context: " << err;

// ========== 4. 组合图 ==========
err = model_compose(backend_, context_, &graph_);
RET_CHECK(err == QNN_SUCCESS) << "Failed to compose graph: " << err;

LOG(INFO) << "QNN model loaded: " << model_path;

return absl::OkStatus();
}

// ========== CreateGraph ==========
absl::Status QNNInferenceCalculator::CreateGraph() {
// ========== 1. 创建图配置 ==========
QnnGraph_Config_t graph_config;
memset(&graph_config, 0, sizeof(graph_config));
graph_config.option = QNN_GRAPH_CONFIG_OPTION_NAME;
graph_config.name = "dms_graph";

// ========== 2. 创建图 ==========
Qnn_ErrorCode_t err = QnnGraph_create(
context_,
&graph_config,
&graph_);

RET_CHECK(err == QNN_SUCCESS) << "Failed to create graph: " << err;

// ========== 3. 完成图 ==========
err = QnnGraph_finalize(graph_);
RET_CHECK(err == QNN_SUCCESS) << "Failed to finalize graph: " << err;

return absl::OkStatus();
}

// ========== CreateTensors ==========
absl::Status QNNInferenceCalculator::CreateTensors() {
// ========== 1. 创建输入张量 ==========
Qnn_Tensor_t input_tensor;
memset(&input_tensor, 0, sizeof(input_tensor));

input_tensor.version = QNN_TENSOR_VERSION_1;
input_tensor.v1.id = 0;
strncpy(input_tensor.v1.name, input_tensor_name_.c_str(), QNN_MAX_NAME_LEN - 1);
input_tensor.v1.type = QNN_TENSOR_TYPE_APP_WRITE;
input_tensor.v1.dataType = QNN_DATATYPE_UFIXED_POINT_8;

// 设置形状
input_tensor.v1.shape.rank = 4;
input_tensor.v1.shape.dimensions[0] = 1; // batch
input_tensor.v1.shape.dimensions[1] = input_height_;
input_tensor.v1.shape.dimensions[2] = input_width_;
input_tensor.v1.shape.dimensions[3] = input_channels_;

input_tensor.v1.memType = QNN_TENSORMEMTYPE_RAW;

// 分配内存
size_t input_size = 1 * input_height_ * input_width_ * input_channels_;
input_tensor.v1.mem.raw.memSize = input_size;
input_tensor.v1.mem.raw.data = malloc(input_size);

RET_CHECK(input_tensor.v1.mem.raw.data != nullptr) << "Failed to allocate input buffer";

input_tensors_.push_back(input_tensor);

// ========== 2. 创建输出张量 ==========
// 根据模型输出创建
Qnn_Tensor_t output_tensor;
memset(&output_tensor, 0, sizeof(output_tensor));

output_tensor.version = QNN_TENSOR_VERSION_1;
output_tensor.v1.id = 1;
strncpy(output_tensor.v1.name, output_tensor_name_.c_str(), QNN_MAX_NAME_LEN - 1);
output_tensor.v1.type = QNN_TENSOR_TYPE_APP_READ;
output_tensor.v1.dataType = QNN_DATATYPE_FLOAT_32;

// 输出形状根据模型确定
output_tensor.v1.shape.rank = 2;
output_tensor.v1.shape.dimensions[0] = 100; // max detections
output_tensor.v1.shape.dimensions[1] = 6; // [x1, y1, x2, y2, score, class]

output_tensor.v1.memType = QNN_TENSORMEMTYPE_RAW;

size_t output_size = 100 * 6 * sizeof(float);
output_tensor.v1.mem.raw.memSize = output_size;
output_tensor.v1.mem.raw.data = malloc(output_size);

RET_CHECK(output_tensor.v1.mem.raw.data != nullptr) << "Failed to allocate output buffer";

output_tensors_.push_back(output_tensor);

return absl::OkStatus();
}

// ========== Process ==========
absl::Status QNNInferenceCalculator::Process(CalculatorContext* cc) {
if (!initialized_) {
return absl::InternalError("Calculator not initialized");
}

if (cc->Inputs().Tag("IMAGE").IsEmpty()) {
return absl::OkStatus();
}

// ========== 1. 获取输入图像 ==========
const ImageFrame& image = cc->Inputs().Tag("IMAGE").Get<ImageFrame>();

// ========== 2. 预处理 ==========
std::vector<uint8_t> input_data = Preprocess(image);

// ========== 3. 复制到输入张量 ==========
std::memcpy(input_tensors_[0].v1.mem.raw.data,
input_data.data(),
input_data.size());

// ========== 4. 执行推理 ==========
Qnn_ErrorCode_t err = QnnGraph_execute(
graph_,
input_tensors_.data(), input_tensors_.size(),
output_tensors_.data(), output_tensors_.size(),
nullptr, nullptr);

if (err != QNN_SUCCESS) {
LOG(WARNING) << "QNN execution failed: " << err;
return absl::OkStatus();
}

// ========== 5. 后处理 ==========
std::vector<Detection> detections = Postprocess(output_tensors_);

// ========== 6. 输出 ==========
cc->Outputs().Tag("DETECTIONS").AddPacket(
MakePacket<std::vector<Detection>>(detections).At(cc->InputTimestamp()));

process_count_++;

return absl::OkStatus();
}

// ========== Close ==========
absl::Status QNNInferenceCalculator::Close(CalculatorContext* cc) {
// ========== 1. 释放张量内存 ==========
FreeTensorBuffers();

// ========== 2. 释放图 ==========
if (graph_) {
QnnGraph_free(graph_);
graph_ = nullptr;
}

// ========== 3. 释放上下文 ==========
if (context_) {
QnnContext_free(context_);
context_ = nullptr;
}

// ========== 4. 关闭模型库 ==========
if (model_handle_) {
dlclose(model_handle_);
model_handle_ = nullptr;
}

// ========== 5. 关闭后端库 ==========
if (backend_handle_) {
dlclose(backend_handle_);
backend_handle_ = nullptr;
}

LOG(INFO) << "QNNInferenceCalculator closed, processed "
<< process_count_ << " frames";

return absl::OkStatus();
}

// ========== 辅助方法实现 ==========

void QNNInferenceCalculator::FreeTensorBuffers() {
for (auto& tensor : input_tensors_) {
if (tensor.v1.mem.raw.data) {
free(tensor.v1.mem.raw.data);
tensor.v1.mem.raw.data = nullptr;
}
}

for (auto& tensor : output_tensors_) {
if (tensor.v1.mem.raw.data) {
free(tensor.v1.mem.raw.data);
tensor.v1.mem.raw.data = nullptr;
}
}
}

std::string QNNInferenceCalculator::GetBackendLibraryPath(
QNNInferenceOptions::Backend backend) {
switch (backend) {
case QNNInferenceOptions::CPU:
return "/vendor/lib/libQnnCpu.so";
case QNNInferenceOptions::DSP:
return "/vendor/lib/libQnnDsp.so";
case QNNInferenceOptions::HTP:
return "/vendor/lib/libQnnHtp.so";
default:
return "/vendor/lib/libQnnHtp.so";
}
}

std::vector<uint8_t> QNNInferenceCalculator::Preprocess(const ImageFrame& image) {
cv::Mat mat = formats::MatView(&image);

// 缩放
cv::Mat resized;
cv::resize(mat, resized, cv::Size(input_width_, input_height_));

// 转换为 uint8 vector
std::vector<uint8_t> data(resized.total() * resized.elemSize());
std::memcpy(data.data(), resized.data, data.size());

return data;
}

std::vector<Detection> QNNInferenceCalculator::Postprocess(
const std::vector<Qnn_Tensor_t>& outputs) {
std::vector<Detection> detections;

if (outputs.empty()) return detections;

const float* data = (const float*)outputs[0].v1.mem.raw.data;
int num = outputs[0].v1.shape.dimensions[0];

for (int i = 0; i < num; ++i) {
const float* det = data + i * 6;

float score = det[4];
if (score < score_threshold_) continue;

Detection d;
d.set_xmin(det[0]);
d.set_ymin(det[1]);
d.set_xmax(det[2]);
d.set_ymax(det[3]);
d.set_score(score);
d.set_label_id(static_cast<int>(det[5]));

detections.push_back(d);
}

return detections;
}

REGISTER_CALCULATOR(QNNInferenceCalculator);

} // namespace mediapipe

十九、Graph 配置示例

19.1 IMS DMS 完整配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# ims_dms_qnn_graph.pbtxt

input_stream: "IR_IMAGE:ir_image"
output_stream: "FACES:faces"

input_side_packet: "MODEL_PATH:model_path"

# ========== Executor 配置 ==========
executor {
name: "htp_executor"
type: "ThreadPool"
options { num_threads: 2 }
}

# ========== 流量限制 ==========
node {
calculator: "FlowLimiterCalculator"
input_stream: "ir_image"
input_stream: "faces"
input_stream_info: { tag_index: "faces" back_edge: true }
output_stream: "throttled_image"
}

# ========== QNN 人脸检测 ==========
node {
calculator: "QNNInferenceCalculator"
input_stream: "IMAGE:throttled_image"
input_side_packet: "MODEL_PATH:model_path"
output_stream: "DETECTIONS:faces"
executor: "htp_executor"
options {
[mediapipe.QNNInferenceOptions.ext] {
backend: HTP
input_width: 320
input_height: 240
input_channels: 3
input_tensor_name: "input.1"
output_tensor_name: "output.1"
score_threshold: 0.6
nms_threshold: 0.45
max_detections: 50
}
}
}

二十、DSP 会话问题排查

20.1 常见错误及解决

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
┌─────────────────────────────────────────────────────────────┐
│ 常见错误排查 │
├─────────────────────────────────────────────────────────────┤
│ │
│ 错误 1:QNN_DSP_SESSION_OPEN_FAILED (0x6b) │
│ ┌─────────────────────────────────────────────┐ │
│ │ 原因:fastrpc_shell_unsigned_3 缺失 │ │
│ │ │ │
│ │ 解决: │ │
│ │ adb push fastrpc_shell_unsigned_3 \ │ │
│ │ /data_fota/ds_ims/models/qnn/.../ │ │
│ │ adb shell chmod 755 fastrpc_shell_* │ │
│ └─────────────────────────────────────────────┘ │
│ │
│ 错误 2:QNN_BACKEND_UNSUPPORTED (0x03) │
│ ┌─────────────────────────────────────────────┐ │
│ │ 原因:后端不支持或库缺失 │ │
│ │ │ │
│ │ 解决: │ │
│ │ • 检查平台是否支持 HTP │ │
│ │ • 检查 libQnnHtp.so 是否存在 │ │
│ │ • 尝试使用 DSP 后端 │ │
│ └─────────────────────────────────────────────┘ │
│ │
│ 错误 3:QNN_GRAPH_FINALIZE_FAILED (0x41) │
│ ┌─────────────────────────────────────────────┐ │
│ │ 原因:图创建失败(算子不支持) │ │
│ │ │ │
│ │ 解决: │ │
│ │ • 检查模型是否有不支持的算子 │ │
│ │ • 使用 qnn-model-validator 验证模型 │ │
│ │ • 简化模型结构 │ │
│ └─────────────────────────────────────────────┘ │
│ │
│ 错误 4:内存不足 │
│ ┌─────────────────────────────────────────────┐ │
│ │ 原因:DSP 内存不足 │ │
│ │ │ │
│ │ 解决: │ │
│ │ • 减小输入尺寸 │ │
│ │ • 使用 INT8 量化 │ │
│ │ • 检查 DSP 内存使用情况 │ │
│ └─────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘

20.2 调试命令

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# ========== 检查 QNN 环境 ==========

# 检查后端库
adb shell ls -la /vendor/lib/libQnn*

# 检查 DSP 状态
adb shell cat /sys/kernel/debug/ion/heaps/system

# 检查 FastRPC
adb shell ls -la /dev/ion
adb shell ls -la /dev/fastrpc*

# 检查 QNN 日志
adb logcat -s QNN

# ========== 性能分析 ==========

# 使用 qnn-net-run 测试
adb push qnn-net-run /data/local/tmp/
adb push model.so /data/local/tmp/
adb push input.raw /data/local/tmp/

adb shell /data/local/tmp/qnn-net-run \
--model /data/local/tmp/model.so \
--backend HTP \
--input /data/local/tmp/input.raw \
--profile

# ========== 内存分析 ==========
adb shell cat /proc/meminfo | grep -i ion

二十一、总结

要点 说明
QNN SDK 高通 AI 推理框架
后端选择 HTP > DSP > CPU
模型转换 ONNX/TFLite → .so
性能优化 INT8 量化、输入尺寸优化
调试 qnn-net-run、logcat
常见问题 fastrpc_shell、内存不足

下篇预告

MediaPipe 系列 16:后处理 Calculator——解析模型输出

深入讲解模型输出解析、NMS 去重、坐标变换等后处理技术。


参考资料

  1. Qualcomm. QNN SDK Documentation
  2. Qualcomm. Hexagon DSP SDK
  3. Google AI Edge. MediaPipe Calculator Framework

系列进度: 15/55
更新时间: 2026-03-12


MediaPipe 系列 15:推理 Calculator——集成 QNN 模型完整指南(高通平台)
https://dapalm.com/2026/03/13/MediaPipe系列15-推理Calculator:集成QNN模型(高通平台)/
作者
Mars
发布于
2026年3月13日
许可协议