MediaPipe 系列 16:后处理 Calculator——解析模型输出完整指南

前言:为什么需要后处理?

16.1 后处理的重要性

模型推理输出需要经过后处理才能得到最终结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
┌─────────────────────────────────────────────────────────────────────────┐
│ 后处理的重要性 │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ 问题:模型输出如何转换为可用的检测结果? │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ 挑战: │ │
│ │ │ │
│ │ • 模型输出是原始数值,不是检测结果 │ │
│ │ • 需要解码检测框坐标 │ │
│ │ • 需要去重(NMS) │ │
│ │ • 需要提取关键点 │ │
│ │ • 需要转换为 IMS 可用格式 │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ 解决方案:后处理 Calculator │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ 1. 检测框解码 Calculator │ │
│ │ • Anchor 解码 │ │
│ │ • 坐标转换 │ │
│ │ │ │
│ │ 2. NMS Calculator │ │
│ │ • 非极大值抑制 │ │
│ │ • IoU 计算 │ │
│ │ │ │
│ │ 3. 关键点解析 Calculator │ │
│ │ • 坐标归一化 │ │
│ │ • ROI 映射 │ │
│ │ │ │
│ │ 4. 疲劳检测后处理 Calculator │ │
│ │ • 眼睛状态解析 │ │
│ │ • EAR 计算 │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘

16.2 常见后处理任务

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
┌─────────────────────────────────────────────────────────────┐
│ 常见后处理任务 │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────┐ │
│ │ 1. 检测框解码 │ │
│ │ ┌─────────────────────────────────────┐ │ │
│ │ │ 输入:[x_center, y_center, w, h] │ │ │
│ │ │ 输出:[x1, y1, x2, y2] │ │ │
│ │ └─────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────┐ │
│ │ 2. NMS 过滤 │ │
│ │ ┌─────────────────────────────────────┐ │ │
│ │ │ 输入:多个重叠检测框 │ │ │
│ │ │ 输出:去重后的检测框 │ │ │
│ │ └─────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────┐ │
│ │ 3. 关键点解析 │ │
│ │ ┌─────────────────────────────────────┐ │ │
│ │ │ 输入:归一化坐标 [0, 1] │ │ │
│ │ │ 输出:原图坐标 [pixel] │ │ │
│ │ └─────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────┐ │
│ │ 4. 分类结果解析 │ │
│ │ ┌─────────────────────────────────────┐ │ │
│ │ │ 输入:[class0, class1, ...] │ │ │
│ │ │ 输出:[class_id, score] │ │ │
│ │ └─────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘

十七、后处理流程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
┌─────────────────────────────────────────────────────────────┐
│ 后处理流程 │
├─────────────────────────────────────────────────────────────┤
│ │
│ 模型输出 Tensor │
│ ┌─────────────────────────────────────────────┐ │
│ │ [batch, num_anchors, 4] # 坐标 │ │
│ │ [batch, num_anchors, num_classes] # 分数 │ │
│ │ [batch, num_landmarks, 2] # 关键点 │ │
│ └─────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────┐ │
│ │ 步骤 1:检测框解码 │ │
│ │ ┌─────────────────────────────────────┐ │ │
│ │ │ Anchor 解码 │ │ │
│ │ │ [x_center, y_center, w, h] │ │ │
│ │ │ → [x1, y1, x2, y2] │ │ │
│ │ └─────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────┐ │
│ │ 步骤 2:置信度过滤 │ │
│ │ ┌─────────────────────────────────────┐ │ │
│ │ │ score < threshold → 丢弃 │ │ │
│ │ └─────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────┐ │
│ │ 步骤 3:NMS 过滤 │ │
│ │ ┌─────────────────────────────────────┐ │ │
│ │ │ IoU > threshold → 抑制 │ │ │
│ │ └─────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────┐ │
│ │ 步骤 4:关键点解析 │ │
│ │ ┌─────────────────────────────────────┐ │ │
│ │ │ 归一化坐标 → 原图坐标 │ │ │
│ │ └─────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ 最终检测结果 │
│ ┌─────────────────────────────────────────────┐ │
│ │ [Detection, Detection, ...] │ │
│ └─────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘

十八、检测框解码 Calculator

18.1 SSD Anchor 解码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
// detection_decode_calculator.h
#ifndef MEDIAPIPE_CALCULATORS_DETECTION_DETECTION_DECODE_CALCULATOR_H_
#define MEDIAPIPE_CALCULATORS_DETECTION_DETECTION_DECODE_CALCULATOR_H_

#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/formats/detection.pb.h"
#include "mediapipe/framework/formats/image_frame.h"
#include "mediapipe/framework/formats/landmark.pb.h"

namespace mediapipe {

// ========== Proto Options ==========
/*
syntax = "proto3";
package mediapipe;

message DetectionDecodeOptions {
optional float score_threshold = 1 [default = 0.5];
optional int32 num_classes = 2 [default = 1];

enum AnchorFormat {
SSD = 0; // SSD 格式
YOLO = 1; // YOLO 格式
}
optional AnchorFormat anchor_format = 3 [default = SSD];

// SSD Anchor 配置
message SSDAnchor {
optional int32 num_layers = 1 [default = 6];
optional int32 num_anchors_per_layer = 2 [default = 6];
repeated float strides = 3;
repeated float scales = 4;
repeated float aspect_ratios = 5;
}
optional SSDAnchor ssd_anchor = 6;
}
*/

// ========== Anchor 配置 ==========
struct Anchor {
float x_center;
float y_center;
float width;
float height;
float x_scale;
float y_scale;
float w_scale;
float h_scale;
};

// ========== 检测框解码 Calculator ==========
class DetectionDecodeCalculator : public CalculatorBase {
public:
static absl::Status GetContract(CalculatorContract* cc) {
cc->Inputs().Tag("TENSORS").Set<std::vector<Tensor>>();
cc->Inputs().Tag("ANCHORS").Set<std::vector<Anchor>>();
cc->Outputs().Tag("DETECTIONS").Set<std::vector<Detection>>();
cc->Options<DetectionDecodeOptions>();
return absl::OkStatus();
}

absl::Status Open(CalculatorContext* cc) override {
const auto& options = cc->Options<DetectionDecodeOptions>();

score_threshold_ = options.score_threshold();
num_classes_ = options.num_classes();
anchor_format_ = options.anchor_format();

LOG(INFO) << "DetectionDecodeCalculator initialized: "
<< "threshold=" << score_threshold_
<< ", classes=" << num_classes_;

return absl::OkStatus();
}

absl::Status Process(CalculatorContext* cc) override {
if (cc->Inputs().Tag("TENSORS").IsEmpty()) {
return absl::OkStatus();
}

if (cc->Inputs().Tag("ANCHORS").IsEmpty()) {
return absl::OkStatus();
}

const auto& tensors = cc->Inputs().Tag("TENSORS").Get<std::vector<Tensor>>();
const auto& anchors = cc->Inputs().Tag("ANCHORS").Get<std::vector<Anchor>>();

// 解码检测框
std::vector<Detection> detections;

// 检查输入维度
if (tensors.empty()) {
return absl::OkStatus();
}

const float* box_data = tensors[0].data<float>();
const float* score_data = tensors[1].data<float>();

int num_anchors = tensors[0].shape().dims[1];
int batch_size = tensors[0].shape().dims[0];

LOG(INFO) << "Decoding " << num_anchors << " anchors";

// 解码每个 anchor
for (int i = 0; i < num_anchors; ++i) {
// ========== 1. 找到最大类别分数 ==========
int max_class = 0;
float max_score = score_data[i * num_classes_];

for (int c = 1; c < num_classes_; ++c) {
float score = score_data[i * num_classes_ + c];
if (score > max_score) {
max_score = score;
max_class = c;
}
}

// ========== 2. 置信度过滤 ==========
if (max_score < score_threshold_) {
continue;
}

// ========== 3. 获取 Anchor ==========
if (i >= anchors.size()) break;
const Anchor& anchor = anchors[i];

// ========== 4. 解码边界框(SSD 格式)==========
float y_center = box_data[i * 4 + 0] / anchor.y_scale + anchor.y_center;
float x_center = box_data[i * 4 + 1] / anchor.x_scale + anchor.x_center;
float h = std::exp(box_data[i * 4 + 2] / anchor.h_scale) * anchor.h;
float w = std::exp(box_data[i * 4 + 3] / anchor.w_scale) * anchor.w;

// 转换为 xmin, ymin, xmax, ymax
float ymin = y_center - h / 2;
float xmin = x_center - w / 2;
float ymax = y_center + h / 2;
float xmax = x_center + w / 2;

// ========== 5. 创建检测结果 ==========
Detection det;
det.set_xmin(xmin);
det.set_ymin(ymin);
det.set_xmax(xmax);
det.set_ymax(ymax);
det.set_score(max_score);
det.set_label_id(max_class);

detections.push_back(det);
}

LOG(INFO) << "Decoded " << detections.size() << " detections";

cc->Outputs().Tag("DETECTIONS").AddPacket(
MakePacket<std::vector<Detection>>(detections).At(cc->InputTimestamp()));

return absl::OkStatus();
}

private:
float score_threshold_ = 0.5f;
int num_classes_ = 1;
DetectionDecodeOptions::AnchorFormat anchor_format_ =
DetectionDecodeOptions::SSD;
};

REGISTER_CALCULATOR(DetectionDecodeCalculator);

} // namespace mediapipe

#endif // MEDIAPIPE_CALCULATORS_DETECTION_DETECTION_DECODE_CALCULATOR_H_

18.2 YOLO 格式解码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// YOLO 格式解码示例
absl::Status Process(CalculatorContext* cc) override {
const auto& tensors = cc->Inputs().Tag("TENSORS").Get<std::vector<Tensor>>();
const float* data = tensors[0].data<float>();

int num_anchors = tensors[0].shape().dims[1];
int num_classes = tensors[0].shape().dims[2] - 5; // 5 = x, y, w, h, conf

for (int i = 0; i < num_anchors; ++i) {
float x = data[i * (num_classes + 5) + 0];
float y = data[i * (num_classes + 5) + 1];
float w = data[i * (num_classes + 5) + 2];
float h = data[i * (num_classes + 5) + 3];
float conf = data[i * (num_classes + 5) + 4];

// 找到最大类别
int max_class = 0;
float max_score = conf;

for (int c = 0; c < num_classes; ++c) {
float score = data[i * (num_classes + 5) + 5 + c];
if (score > max_score) {
max_score = score;
max_class = c;
}
}

// 置信度过滤
if (max_score < score_threshold_) continue;

// 转换为 xmin, ymin, xmax, ymax
Detection det;
det.set_xmin(x - w / 2);
det.set_ymin(y - h / 2);
det.set_xmax(x + w / 2);
det.set_ymax(y + h / 2);
det.set_score(max_score);
det.set_label_id(max_class);

detections.push_back(det);
}

return absl::OkStatus();
}

十九、NMS Calculator

19.1 非极大值抑制实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
// nms_calculator.h
#ifndef MEDIAPIPE_CALCULATORS_DETECTION_NMS_CALCULATOR_H_
#define MEDIAPIPE_CALCULATORS_DETECTION_NMS_CALCULATOR_H_

#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/formats/detection.pb.h"

namespace mediapipe {

// ========== NMS Calculator ==========
class NMSCalculator : public CalculatorBase {
public:
static absl::Status GetContract(CalculatorContract* cc) {
cc->Inputs().Tag("DETECTIONS").Set<std::vector<Detection>>();
cc->Outputs().Tag("DETECTIONS").Set<std::vector<Detection>>();
cc->Options<NMSOptions>();
return absl::OkStatus();
}

absl::Status Open(CalculatorContext* cc) override {
const auto& options = cc->Options<NMSOptions>();

iou_threshold_ = options.iou_threshold();
max_detections_ = options.max_detections();
sort_by_ = options.sort_by();

LOG(INFO) << "NMSCalculator initialized: "
<< "iou_threshold=" << iou_threshold_
<< ", max_detections=" << max_detections_;

return absl::OkStatus();
}

absl::Status Process(CalculatorContext* cc) override {
if (cc->Inputs().Tag("DETECTIONS").IsEmpty()) {
return absl::OkStatus();
}

const auto& detections =
cc->Inputs().Tag("DETECTIONS").Get<std::vector<Detection>>();

if (detections.empty()) {
return absl::OkStatus();
}

// ========== 1. 按分数排序 ==========
std::vector<int> indices(detections.size());
std::iota(indices.begin(), indices.end(), 0);

switch (sort_by_) {
case NMSOptions::SCORE:
std::sort(indices.begin(), indices.end(),
[&detections](int a, int b) {
return detections[a].score() > detections[b].score();
});
break;
case NMSOptions::AREA:
std::sort(indices.begin(), indices.end(),
[&detections](int a, int b) {
float area_a = (detections[a].xmax() - detections[a].xmin()) *
(detections[a].ymax() - detections[a].ymin());
float area_b = (detections[b].xmax() - detections[b].xmin()) *
(detections[b].ymax() - detections[b].ymin());
return area_a > area_b;
});
break;
case NMSOptions::NONE:
// 不排序
break;
}

// ========== 2. NMS 过滤 ==========
std::vector<bool> suppressed(detections.size(), false);
std::vector<Detection> result;

for (int idx : indices) {
if (suppressed[idx]) continue;

// 添加到结果
result.push_back(detections[idx]);

// 限制最大数量
if (result.size() >= max_detections_) {
break;
}

// 抑制重叠检测
for (int j : indices) {
if (suppressed[j]) continue;

float iou = ComputeIOU(detections[idx], detections[j]);
if (iou > iou_threshold_) {
suppressed[j] = true;
}
}
}

LOG(INFO) << "NMS: " << detections.size() << " → " << result.size();

cc->Outputs().Tag("DETECTIONS").AddPacket(
MakePacket<std::vector<Detection>>(result).At(cc->InputTimestamp()));

return absl::OkStatus();
}

private:
float iou_threshold_ = 0.45f;
int max_detections_ = 100;
NMSOptions::SortBy sort_by_ = NMSOptions::SCORE;

// ========== IoU 计算 ==========
float ComputeIOU(const Detection& a, const Detection& b) {
// 计算交集
float x1 = std::max(a.xmin(), b.xmin());
float y1 = std::max(a.ymin(), b.ymin());
float x2 = std::min(a.xmax(), b.xmax());
float y2 = std::min(a.ymax(), b.ymax());

float intersection = std::max(0.0f, x2 - x1) * std::max(0.0f, y2 - y1);

// 计算面积
float area_a = (a.xmax() - a.xmin()) * (a.ymax() - a.ymin());
float area_b = (b.xmax() - b.xmin()) * (b.ymax() - b.ymin());

// 计算并集
float union_area = area_a + area_b - intersection;

// IoU
if (union_area == 0) return 0.0f;
return intersection / union_area;
}

// ========== Soft NMS(可选)==========
float SoftNMS(const std::vector<Detection>& detections,
float iou_threshold,
float sigma = 0.5,
float score_threshold = 0.5) {
std::vector<float> scores;
for (const auto& det : detections) {
scores.push_back(det.score());
}

std::vector<bool> suppressed(detections.size(), false);
std::vector<float> weights = scores;

for (int i = 0; i < detections.size(); ++i) {
if (suppressed[i]) continue;

for (int j = 0; j < detections.size(); ++j) {
if (i == j || suppressed[j]) continue;

float iou = ComputeIOU(detections[i], detections[j]);
if (iou > iou_threshold) {
float weight = exp(-(iou * iou) / sigma);
weights[j] *= weight;
}
}
}

// 更新分数
for (int i = 0; i < detections.size(); ++i) {
detections[i].set_score(std::min(detections[i].score(), weights[i]));
}

return 0; // 返回更新后的结果
}
};

REGISTER_CALCULATOR(NMSCalculator);

} // namespace mediapipe

#endif // MEDIAPIPE_CALCULATORS_DETECTION_NMS_CALCULATOR_H_

二十、关键点解析 Calculator

20.1 人脸关键点解析

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
// landmark_decode_calculator.h
#ifndef MEDIAPIPE_CALCULATORS_DETECTION_LANDMARK_DECODE_CALCULATOR_H_
#define MEDIAPIPE_CALCULATORS_DETECTION_LANDMARK_DECODE_CALCULATOR_H_

#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/formats/detection.pb.h"
#include "mediapipe/framework/formats/landmark.pb.h"
#include "mediapipe/framework/formats/image_frame.h"

namespace mediapipe {

// ========== Proto Options ==========
/*
syntax = "proto3";
package mediapipe;

message LandmarkDecodeOptions {
optional int32 num_landmarks = 1 [default = 468]; // FaceMesh
optional int32 input_width = 2 [default = 320];
optional int32 input_height = 3 [default = 240];
optional bool normalize = 4 [default = true];
}
*/

// ========== 关键点解析 Calculator ==========
class LandmarkDecodeCalculator : public CalculatorBase {
public:
static absl::Status GetContract(CalculatorContract* cc) {
cc->Inputs().Tag("TENSORS").Set<std::vector<Tensor>>();
cc->Inputs().Tag("ROIS").Set<std::vector<Rect>>();
cc->Outputs().Tag("LANDMARKS").Set<std::vector<LandmarkList>>();
cc->Options<LandmarkDecodeOptions>();
return absl::OkStatus();
}

absl::Status Open(CalculatorContext* cc) override {
const auto& options = cc->Options<LandmarkDecodeOptions>();

num_landmarks_ = options.num_landmarks();
input_width_ = options.input_width();
input_height_ = options.input_height();
normalize_ = options.normalize();

LOG(INFO) << "LandmarkDecodeCalculator initialized: "
<< "num_landmarks=" << num_landmarks_
<< ", input_size=" << input_width_ << "x" << input_height_;

return absl::OkStatus();
}

absl::Status Process(CalculatorContext* cc) override {
if (cc->Inputs().Tag("TENSORS").IsEmpty()) {
return absl::OkStatus();
}

if (cc->Inputs().Tag("ROIS").IsEmpty()) {
return absl::OkStatus();
}

const auto& tensors = cc->Inputs().Tag("TENSORS").Get<std::vector<Tensor>>();
const auto& rois = cc->Inputs().Tag("ROIS").Get<std::vector<Rect>>();

const float* landmark_data = tensors[0].data<float>();

int num_faces = rois.size();
int num_landmark_pairs = tensors[0].shape().dims[1] / 2; // x, y 坐标

std::vector<LandmarkList> all_landmarks;

for (int f = 0; f < num_faces; ++f) {
const Rect& roi = rois[f];

LandmarkList landmarks;

for (int l = 0; l < num_landmarks_; ++l) {
if (l >= num_landmark_pairs) break;

// 获取归一化坐标
float x_norm = landmark_data[f * num_landmark_pairs * 2 + l * 2 + 0];
float y_norm = landmark_data[f * num_landmark_pairs * 2 + l * 2 + 1];

Landmark landmark;

if (normalize_) {
// 归一化坐标 -> 原图坐标
float img_x = x_norm * roi.width() + roi.x();
float img_y = y_norm * roi.height() + roi.y();

landmark.set_x(img_x);
landmark.set_y(img_y);
} else {
// 已经是像素坐标
landmark.set_x(x_norm);
landmark.set_y(y_norm);
}

landmarks.add_landmark()->CopyFrom(landmark);
}

all_landmarks.push_back(landmarks);
}

cc->Outputs().Tag("LANDMARKS").AddPacket(
MakePacket<std::vector<LandmarkList>>(all_landmarks)
.At(cc->InputTimestamp()));

return absl::OkStatus();
}

private:
int num_landmarks_ = 468;
int input_width_ = 320;
int input_height_ = 240;
bool normalize_ = true;
};

REGISTER_CALCULATOR(LandmarkDecodeCalculator);

} // namespace mediapipe

#endif // MEDIAPIPE_CALCULATORS_DETECTION_LANDMARK_DECODE_CALCULATOR_H_

二十一、分割掩码解析

21.1 实现示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// segmentation_decode_calculator.h
class SegmentationDecodeCalculator : public CalculatorBase {
public:
static absl::Status GetContract(CalculatorContract* cc) {
cc->Inputs().Tag("TENSORS").Set<std::vector<Tensor>>();
cc->Outputs().Tag("MASKS").Set<std::vector<std::vector<float>>>();
cc->Options<SegmentationDecodeOptions>();
return absl::OkStatus();
}

absl::Status Process(CalculatorContext* cc) override {
if (cc->Inputs().Tag("TENSORS").IsEmpty()) {
return absl::OkStatus();
}

const auto& tensors = cc->Inputs().Tag("TENSORS").Get<std::vector<Tensor>>();
const float* mask_data = tensors[0].data<float>();

int num_masks = tensors[0].shape().dims[1];
int mask_height = tensors[0].shape().dims[2];
int mask_width = tensors[0].shape().dims[3];

std::vector<std::vector<float>> masks;

for (int m = 0; m < num_masks; ++m) {
std::vector<float> mask(mask_height * mask_width);

const float* mask_ptr = mask_data + m * mask_height * mask_width;
std::memcpy(mask.data(), mask_ptr, mask.size() * sizeof(float));

masks.push_back(mask);
}

cc->Outputs().Tag("MASKS").AddPacket(
MakePacket<std::vector<std::vector<float>>>(masks)
.At(cc->InputTimestamp()));

return absl::OkStatus();
}
};

REGISTER_CALCULATOR(SegmentationDecodeCalculator);

二十二、IMS 实战:疲劳检测后处理

22.1 眼睛状态解析

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
// eye_state_decode_calculator.h
#ifndef MEDIAPIPE_CALCULATORS_FATIGUE_EYE_STATE_DECODE_CALCULATOR_H_
#define MEDIAPIPE_CALCULATORS_FATIGUE_EYE_STATE_DECODE_CALCULATOR_H_

#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/formats/detection.pb.h"

namespace mediapipe {

// ========== 眼睛状态消息 ==========
message EyeState {
bool left_eye_open = 1;
bool right_eye_open = 2;
float left_score = 3;
float right_score = 4;
float ear_left = 5;
float ear_right = 6;
}

// ========== 眼睛状态解码 Calculator ==========
class EyeStateDecodeCalculator : public CalculatorBase {
public:
static absl::Status GetContract(CalculatorContract* cc) {
cc->Inputs().Tag("TENSORS").Set<std::vector<Tensor>>();
cc->Outputs().Tag("EYE_STATE").Set<EyeState>();
cc->Options<EyeStateDecodeOptions>();
return absl::OkStatus();
}

absl::Status Open(CalculatorContext* cc) override {
const auto& options = cc->Options<EyeStateDecodeOptions>();

ear_threshold_ = options.ear_threshold();

LOG(INFO) << "EyeStateDecodeCalculator initialized: "
<< "ear_threshold=" << ear_threshold_;

return absl::OkStatus();
}

absl::Status Process(CalculatorContext* cc) override {
if (cc->Inputs().Tag("TENSORS").IsEmpty()) {
return absl::OkStatus();
}

const auto& tensors = cc->Inputs().Tag("TENSORS").Get<std::vector<Tensor>>();
const float* data = tensors[0].data<float>();

// 模型输出格式:
// [left_eye_open_score, right_eye_open_score]
// 或者
// [left_ear, right_ear, left_pupil_ratio, right_pupil_ratio]

float left_score = data[0];
float right_score = data[1];

// 计算眼睛睁开状态
bool left_eye_open = left_score > 0.5f;
bool right_eye_open = right_score > 0.5f;

// 计算 EAR(Eye Aspect Ratio)
float ear_left = 0.0f;
float ear_right = 0.0f;

if (tensors.size() > 1) {
const float* ear_data = tensors[1].data<float>();
ear_left = ear_data[0];
ear_right = ear_data[1];
}

// 创建眼睛状态
EyeState state;
state.set_left_eye_open(left_eye_open);
state.set_right_eye_open(right_eye_open);
state.set_left_score(left_score);
state.set_right_score(right_score);
state.set_ear_left(ear_left);
state.set_ear_right(ear_right);

LOG(INFO) << "Eye state: left=" << left_eye_open
<< " (" << left_score << "), right=" << right_eye_open
<< " (" << right_score << ")"
<< ", EAR: " << ear_left << ", " << ear_right;

cc->Outputs().Tag("EYE_STATE").AddPacket(
MakePacket<EyeState>(state).At(cc->InputTimestamp()));

return absl::OkStatus();
}

private:
float ear_threshold_ = 0.2f;
};

REGISTER_CALCULATOR(EyeStateDecodeCalculator);

} // namespace mediapipe

#endif // MEDIAPIPE_CALCULATORS_FATIGUE_EYE_STATE_DECODE_CALCULATOR_H_

22.2 头部姿态解析

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// head_pose_decode_calculator.h
class HeadPoseDecodeCalculator : public CalculatorBase {
public:
static absl::Status GetContract(CalculatorContract* cc) {
cc->Inputs().Tag("TENSORS").Set<std::vector<Tensor>>();
cc->Outputs().Tag("HEAD_POSE").Set<HeadPose>();
cc->Options<HeadPoseDecodeOptions>();
return absl::OkStatus();
}

absl::Status Process(CalculatorContext* cc) override {
if (cc->Inputs().Tag("TENSORS").IsEmpty()) {
return absl::OkStatus();
}

const auto& tensors = cc->Inputs().Tag("TENSORS").Get<std::vector<Tensor>>();
const float* data = tensors[0].data<float>();

// 模型输出:[pitch, yaw, roll]
float pitch = data[0];
float yaw = data[1];
float roll = data[2];

// 创建头部姿态
HeadPose pose;
pose.set_pitch(pitch);
pose.set_yaw(yaw);
pose.set_roll(roll);

cc->Outputs().Tag("HEAD_POSE").AddPacket(
MakePacket<HeadPose>(pose).At(cc->InputTimestamp()));

return absl::OkStatus();
}
};

REGISTER_CALCULATOR(HeadPoseDecodeCalculator);

二十三、完整后处理流水线

23.1 Graph 配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
# face_detection_with_landmarks_graph.pbtxt

# ========== 输入输出 ==========
input_stream: "IMAGE:image"
output_stream: "DETECTIONS:faces"
output_stream: "LANDMARKS:landmarks"

# ========== 步骤 1:检测框解码 ==========
node {
calculator: "DetectionDecodeCalculator"
input_stream: "TENSORS:raw_tensors"
input_stream: "ANCHORS:anchors"
output_stream: "DETECTIONS:raw_detections"
options {
[mediapipe.DetectionDecodeOptions.ext] {
score_threshold: 0.5
num_classes: 1
}
}
}

# ========== 步骤 2:NMS 过滤 ==========
node {
calculator: "NMSCalculator"
input_stream: "DETECTIONS:raw_detections"
output_stream: "DETECTIONS:filtered_detections"
options {
[mediapipe.NMSOptions.ext] {
iou_threshold: 0.45
max_detections: 50
sort_by: SCORE
}
}
}

# ========== 步骤 3:ROI 提取 ==========
node {
calculator: "RoiFromDetectionCalculator"
input_stream: "DETECTIONS:filtered_detections"
output_stream: "ROIS:rois"
}

# ========== 步骤 4:关键点解码 ==========
node {
calculator: "LandmarkDecodeCalculator"
input_stream: "TENSORS:landmark_tensors"
input_stream: "ROIS:rois"
output_stream: "LANDMARKS:landmarks"
options {
[mediapipe.LandmarkDecodeOptions.ext] {
num_landmarks: 468
input_width: 320
input_height: 240
normalize: true
}
}
}

# ========== 流量限制 ==========
node {
calculator: "FlowLimiterCalculator"
input_stream: "image"
input_stream: "faces"
input_stream_info: { tag_index: "faces" back_edge: true }
output_stream: "throttled_image"
}

# ========== 流水线 ==========
node {
calculator: "FaceDetectionGraph"
input_stream: "IMAGE:throttled_image"
input_stream: "LANDMARKS:landmarks"
output_stream: "DETECTIONS:faces"
output_stream: "LANDMARKS:landmarks"
}

二十四、总结

后处理任务 Calculator 说明
检测框解码 DetectionDecodeCalculator Anchor 解码
NMS 过滤 NMSCalculator 去重
关键点解析 LandmarkDecodeCalculator 坐标转换
眼睛状态 EyeStateDecodeCalculator 疲劳检测
头部姿态 HeadPoseDecodeCalculator 姿态估计

下篇预告

MediaPipe 系列 17:数据聚合 Calculator——多流同步

深入讲解如何同步多个输入流、聚合数据、处理时序数据。


参考资料

  1. Google AI Edge. MediaPipe Detection Calculators
  2. Redwood AI. YOLOv5 Post-Processing
  3. Google AI Edge. MediaPipe Landmark Calculators

系列进度: 16/55
更新时间: 2026-03-12


MediaPipe 系列 16:后处理 Calculator——解析模型输出完整指南
https://dapalm.com/2026/03/13/MediaPipe系列16-后处理Calculator:解析模型输出/
作者
Mars
发布于
2026年3月13日
许可协议