MediaPipe 系列 45:IMS DMS 架构——危险行为检测完整指南

前言:危险行为检测的重要性

45.1 Euro NCAP 要求

Euro NCAP 2026 对危险行为检测的要求:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
┌─────────────────────────────────────────────────────────────────────────┐
│ Euro NCAP 2026 危险行为检测要求 │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ 检测场景: │
│ ├── 打电话(Phone Use)— 手持电话靠近头部 │
│ ├── 吸烟(Smoking)— 手靠近嘴部,检测吸烟动作 │
│ ├── 喝水/进食(Eating/Drinking)— 手持物品靠近嘴部 │
│ ├── 捡东西(Reaching)— 身体大幅度前倾或侧倾 │
│ └── 双手离盘(Hands Off Wheel)— 双手同时离开方向盘 │
│ │
│ 检测要求: │
│ ├── 检测时间:< 3 秒 │
│ ├── 误报率:< 5% │
│ ├── 漏报率:< 10% │
│ └── 多场景适应:白天/夜间/隧道/逆光 │
│ │
│ 告警策略: │
│ ├── 打电话:立即告警 │
│ ├── 吸烟/喝水:延时告警(3秒确认) │
│ └── 双手离盘:立即告警(L2+ 场景) │
│ │
└─────────────────────────────────────────────────────────────────────────┘

45.2 危险行为分类

行为类型 风险等级 检测难度 Euro NCAP 权重
打电话 🔴 高
双手离盘 🔴 高
捡东西 🟡 中
吸烟 🟡 中
喝水/进食 🟢 低

四十六、检测方法概述

46.1 方法对比

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
┌─────────────────────────────────────────────────────────────────────────┐
│ 危险行为检测方法对比 │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ 方法一:基于规则(Rule-Based) │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ 优点: │ │
│ │ • 可解释性强 │ │
│ │ • 计算量小 │ │
│ │ • 容易调试和调整 │ │
│ │ │ │
│ │ 缺点: │ │
│ │ • 泛化能力差 │ │
│ │ • 需要大量人工调参 │ │
│ │ • 复杂场景准确率低 │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ 方法二:深度学习(Deep Learning) │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ 优点: │ │
│ │ • 泛化能力强 │ │
│ │ • 自动学习特征 │ │
│ │ • 复杂场景表现好 │ │
│ │ │ │
│ │ 缺点: │ │
│ │ • 需要大量标注数据 │ │
│ │ • 计算量大 │ │
│ │ • 黑盒,难以解释 │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ 混合方案:规则 + 深度学习 │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ • 规则做初步筛选(快速、可解释) │ │
│ │ • 深度学习做精细判断(准确、泛化) │ │
│ │ • 结合两者优点 │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘

46.2 检测流程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
┌─────────────────────────────────────────────────────────────────────────┐
│ 危险行为检测流程 │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ 输入:IR/RGB Camera
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Holistic │ → Face + Pose + Hand 关键点 │
│ │ Detection │ │
│ └─────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Keypoint │ → 过滤低置信度点 │
│ │ Validation3D 坐标转换 │
│ └─────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Spatial │ → 计算关键点空间关系 │
│ │ Analysis │ 手到脸//耳的距离 │
│ └─────────────────┘ 身体倾斜角度 │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Rule-Based │ → 初步判断 │
│ │ Pre-filter │ 打电话:手在耳边? │
│ └─────────────────┘ 喝水:手在嘴边? │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Sequence │ → 时序确认 │
│ │ Classifier │ 避免瞬时误判 │
│ └─────────────────┘ 持续时间验证 │
│ │ │
│ ▼ │
│ 输出:危险行为类型 + 置信度 │
│ │
└─────────────────────────────────────────────────────────────────────────┘

四十七、关键点空间关系分析

47.1 关键点定义

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// ========== 人脸关键点(Face Mesh)==========
// 关键点索引
constexpr int NOSE_TIP = 1; // 鼻尖
constexpr int LEFT_EAR = 234; // 左耳
constexpr int RIGHT_EAR = 454; // 右耳
constexpr int MOUTH_CENTER = 13; // 嘴巴中心
constexpr int LEFT_EYE = 33; // 左眼
constexpr int RIGHT_EYE = 263; // 右眼
constexpr int FOREHEAD = 10; // 额头
constexpr int CHIN = 152; // 下巴

// ========== 手部关键点(Hand Tracking)==========
// 关键点索引
constexpr int WRIST = 0; // 手腕
constexpr int THUMB_TIP = 4; // 大拇指指尖
constexpr int INDEX_TIP = 8; // 食指指尖
constexpr int MIDDLE_TIP = 12; // 中指指尖
constexpr int RING_TIP = 16; // 无名指指尖
constexpr int PINKY_TIP = 20; // 小指指尖

// ========== 身体关键点(Pose)==========
// 关键点索引
constexpr int LEFT_SHOULDER = 11; // 左肩
constexpr int RIGHT_SHOULDER = 12; // 右肩
constexpr int LEFT_ELBOW = 13; // 左肘
constexpr int RIGHT_ELBOW = 14; // 右肘
constexpr int LEFT_WRIST = 15; // 左手腕(身体关键点)
constexpr int RIGHT_WRIST = 16; // 右手腕(身体关键点)
constexpr int LEFT_HIP = 23; // 左髋
constexpr int RIGHT_HIP = 24; // 右髋
constexpr int NOSE = 0; // 鼻子(身体关键点)

47.2 空间距离计算

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
// spatial_analyzer.h
#ifndef MEDIAPIPE_CALCULATORS_IMS_SPATIAL_ANALYZER_H_
#define MEDIAPIPE_CALCULATORS_IMS_SPATIAL_ANALYZER_H_

#include "mediapipe/framework/formats/landmark.pb.h"
#include <cmath>

namespace mediapipe {
namespace ims {

// ========== 3D 点结构 ==========
struct Point3D {
float x, y, z;

Point3D() : x(0), y(0), z(0) {}
Point3D(float x_, float y_, float z_) : x(x_), y(y_), z(z_) {}

// 欧氏距离
static float Distance(const Point3D& a, const Point3D& b) {
return std::sqrt(std::pow(a.x - b.x, 2) +
std::pow(a.y - b.y, 2) +
std::pow(a.z - b.z, 2));
}

// 2D 距离(忽略 Z)
static float Distance2D(const Point3D& a, const Point3D& b) {
return std::sqrt(std::pow(a.x - b.x, 2) +
std::pow(a.y - b.y, 2));
}

// 向量运算
Point3D operator-(const Point3D& other) const {
return Point3D(x - other.x, y - other.y, z - other.z);
}

Point3D operator+(const Point3D& other) const {
return Point3D(x + other.x, y + other.y, z + other.z);
}

Point3D operator*(float scalar) const {
return Point3D(x * scalar, y * scalar, z * scalar);
}

float Length() const {
return std::sqrt(x * x + y * y + z * z);
}

Point3D Normalize() const {
float len = Length();
if (len < 1e-6f) return Point3D();
return Point3D(x / len, y / len, z / len);
}
};

// ========== 空间分析器 ==========
class SpatialAnalyzer {
public:
// 从 NormalizedLandmarkList 提取 Point3D
static Point3D GetPoint(const NormalizedLandmarkList& landmarks, int index) {
if (index >= landmarks.landmark_size()) {
return Point3D();
}
const auto& lm = landmarks.landmark(index);
return Point3D(lm.x(), lm.y(), lm.z());
}

// ========== 打电话检测 ==========
// 判断手是否靠近耳朵
static bool IsHandNearEar(const Point3D& hand_wrist,
const Point3D& hand_index_tip,
const Point3D& left_ear,
const Point3D& right_ear,
float threshold = 0.15f) {
// 计算手心位置(手腕和食指指尖的中点)
Point3D hand_center = (hand_wrist + hand_index_tip) * 0.5f;

// 计算到两耳的距离
float dist_left = Point3D::Distance2D(hand_center, left_ear);
float dist_right = Point3D::Distance2D(hand_center, right_ear);
float min_dist = std::min(dist_left, dist_right);

return min_dist < threshold;
}

// ========== 吸烟/喝水检测 ==========
// 判断手是否靠近嘴部
static bool IsHandNearMouth(const Point3D& hand_wrist,
const Point3D& hand_index_tip,
const Point3D& mouth_center,
const Point3D& nose_tip,
float threshold = 0.12f) {
// 计算手心位置
Point3D hand_center = (hand_wrist + hand_index_tip) * 0.5f;

// 计算到嘴部的距离
float dist_mouth = Point3D::Distance2D(hand_center, mouth_center);

// 额外条件:手在鼻子以下(避免误判摸脸)
bool is_below_nose = hand_center.y > nose_tip.y;

return dist_mouth < threshold && is_below_nose;
}

// ========== 捡东西检测 ==========
// 判断身体是否大幅前倾
static bool IsBodyLeaningForward(const Point3D& nose,
const Point3D& left_shoulder,
const Point3D& right_shoulder,
const Point3D& left_hip,
const Point3D& right_hip,
float threshold = 15.0f) {
// 计算肩膀中心
Point3D shoulder_center = (left_shoulder + right_shoulder) * 0.5f;

// 计算髋部中心
Point3D hip_center = (left_hip + right_hip) * 0.5f;

// 计算躯干向量
Point3D torso = shoulder_center - hip_center;

// 计算前倾角度
// 前倾时,Z 分量会增大(假设 Z 朝向摄像头)
float lean_angle = std::atan2(torso.z, torso.y) * 180.0f / M_PI;

return std::abs(lean_angle) > threshold;
}

// ========== 双手离盘检测 ==========
// 判断双手是否离开方向盘位置
static bool AreHandsOffWheel(const Point3D& left_wrist,
const Point3D& right_wrist,
const Point3D& left_shoulder,
const Point3D& right_shoulder,
float min_height_ratio = 0.3f,
float max_width_ratio = 0.8f) {
// 计算肩膀宽度
float shoulder_width = Point3D::Distance2D(left_shoulder, right_shoulder);

// 计算手腕高度(相对于肩膀)
float left_wrist_height = left_shoulder.y - left_wrist.y;
float right_wrist_height = right_shoulder.y - right_wrist.y;

// 计算手腕水平距离
float wrist_distance = Point3D::Distance2D(left_wrist, right_wrist);

// 双手离盘判断:
// 1. 手腕高度低于肩膀一定比例(手放下了)
// 2. 或双手水平距离过大(手张开了)

bool hands_too_low = (left_wrist_height < shoulder_width * min_height_ratio) &&
(right_wrist_height < shoulder_width * min_height_ratio);

bool hands_too_wide = wrist_distance > shoulder_width * max_width_ratio;

return hands_too_low || hands_too_wide;
}

// ========== 手势识别 ==========
// 判断是否为打电话手势
static bool IsPhoneCallGesture(const NormalizedLandmarkList& hand_landmarks) {
// 打电话手势特征:
// 1. 大拇指和小指伸展
// 2. 食指、中指、无名指弯曲

if (hand_landmarks.landmark_size() < 21) {
return false;
}

// 获取关键点
Point3D thumb_tip = GetPoint(hand_landmarks, THUMB_TIP);
Point3D index_tip = GetPoint(hand_landmarks, INDEX_TIP);
Point3D middle_tip = GetPoint(hand_landmarks, MIDDLE_TIP);
Point3D ring_tip = GetPoint(hand_landmarks, RING_TIP);
Point3D pinky_tip = GetPoint(hand_landmarks, PINKY_TIP);
Point3D wrist = GetPoint(hand_landmarks, WRIST);

// MCP 关节(判断手指是否弯曲)
Point3D index_mcp = GetPoint(hand_landmarks, 5);
Point3D middle_mcp = GetPoint(hand_landmarks, 9);
Point3D ring_mcp = GetPoint(hand_landmarks, 13);
Point3D pinky_mcp = GetPoint(hand_landmarks, 17);

// 判断手指是否伸展(指尖比 MCP 更远离手腕)
bool index_extended = index_tip.y < index_mcp.y;
bool middle_extended = middle_tip.y < middle_mcp.y;
bool ring_extended = ring_tip.y < ring_mcp.y;
bool pinky_extended = pinky_tip.y < pinky_mcp.y;

// 打电话手势:小指伸展,其他弯曲
bool is_phone_gesture = pinky_extended &&
!index_extended &&
!middle_extended &&
!ring_extended;

return is_phone_gesture;
}
};

} // namespace ims
} // namespace mediapipe

#endif // MEDIAPIPE_CALCULATORS_IMS_SPATIAL_ANALYZER_H_

四十八、Dangerous Behavior Calculator 实现

48.1 完整 Calculator

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
// dangerous_behavior_calculator.h
#ifndef MEDIAPIPE_CALCULATORS_IMS_DANGEROUS_BEHAVIOR_CALCULATOR_H_
#define MEDIAPIPE_CALCULATORS_IMS_DANGEROUS_BEHAVIOR_CALCULATOR_H_

#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/formats/landmark.pb.h"
#include "spatial_analyzer.h"

namespace mediapipe {

// ========== 危险行为消息 ==========
message DangerousBehaviorResult {
enum BehaviorType {
NONE = 0;
PHONE_CALL = 1;
SMOKING = 2;
DRINKING = 3;
REACHING = 4;
HANDS_OFF_WHEEL = 5;
}

BehaviorType behavior = 1;
float confidence = 2;
uint64 timestamp_ms = 3;
int confirmation_frames = 4;
}

// ========== Dangerous Behavior Calculator ==========
class DangerousBehaviorCalculator : public CalculatorBase {
public:
static absl::Status GetContract(CalculatorContract* cc) {
cc->Inputs().Tag("FACE_LANDMARKS").Set<std::vector<NormalizedLandmarkList>>();
cc->Inputs().Tag("POSE_LANDMARKS").Set<std::vector<NormalizedLandmarkList>>();
cc->Inputs().Tag("LEFT_HAND_LANDMARKS").Set<std::vector<NormalizedLandmarkList>>();
cc->Inputs().Tag("RIGHT_HAND_LANDMARKS").Set<std::vector<NormalizedLandmarkList>>();
cc->Outputs().Tag("BEHAVIOR").Set<DangerousBehaviorResult>();
cc->Outputs().Tag("ALERT").Set<bool>();
cc->Options<DangerousBehaviorOptions>();
return absl::OkStatus();
}

absl::Status Open(CalculatorContext* cc) override {
const auto& options = cc->Options<DangerousBehaviorOptions>();

// 阈值配置
phone_call_threshold_ = options.phone_call_threshold();
smoking_threshold_ = options.smoking_threshold();
drinking_threshold_ = options.drinking_threshold();
reaching_threshold_ = options.reaching_threshold();
hands_off_threshold_ = options.hands_off_threshold();

// 确认帧数
confirmation_frames_ = options.confirmation_frames();

return absl::OkStatus();
}

absl::Status Process(CalculatorContext* cc) override {
using namespace ims;

DangerousBehaviorResult result;
result.set_timestamp_ms(cc->InputTimestamp().Value() / 1000);

// ========== 获取关键点 ==========
std::vector<Point3D> face_points;
std::vector<Point3D> pose_points;
std::vector<Point3D> left_hand_points;
std::vector<Point3D> right_hand_points;

// 提取人脸关键点
if (!cc->Inputs().Tag("FACE_LANDMARKS").IsEmpty()) {
const auto& faces =
cc->Inputs().Tag("FACE_LANDMARKS").Get<std::vector<NormalizedLandmarkList>>();
if (!faces.empty()) {
for (int i = 0; i < faces[0].landmark_size(); ++i) {
face_points.push_back(SpatialAnalyzer::GetPoint(faces[0], i));
}
}
}

// 提取身体关键点
if (!cc->Inputs().Tag("POSE_LANDMARKS").IsEmpty()) {
const auto& poses =
cc->Inputs().Tag("POSE_LANDMARKS").Get<std::vector<NormalizedLandmarkList>>();
if (!poses.empty()) {
for (int i = 0; i < poses[0].landmark_size(); ++i) {
pose_points.push_back(SpatialAnalyzer::GetPoint(poses[0], i));
}
}
}

// 提取左手关键点
if (!cc->Inputs().Tag("LEFT_HAND_LANDMARKS").IsEmpty()) {
const auto& hands =
cc->Inputs().Tag("LEFT_HAND_LANDMARKS").Get<std::vector<NormalizedLandmarkList>>();
if (!hands.empty()) {
for (int i = 0; i < hands[0].landmark_size(); ++i) {
left_hand_points.push_back(SpatialAnalyzer::GetPoint(hands[0], i));
}
}
}

// 提取右手关键点
if (!cc->Inputs().Tag("RIGHT_HAND_LANDMARKS").IsEmpty()) {
const auto& hands =
cc->Inputs().Tag("RIGHT_HAND_LANDMARKS").Get<std::vector<NormalizedLandmarkList>>();
if (!hands.empty()) {
for (int i = 0; i < hands[0].landmark_size(); ++i) {
right_hand_points.push_back(SpatialAnalyzer::GetPoint(hands[0], i));
}
}
}

// ========== 危险行为检测 ==========
DangerousBehaviorResult::BehaviorType detected_behavior =
DangerousBehaviorResult::NONE;
float confidence = 0.0f;

// 1. 打电话检测
if (face_points.size() > 454 &&
(left_hand_points.size() >= 21 || right_hand_points.size() >= 21)) {

Point3D left_ear = face_points[234];
Point3D right_ear = face_points[454];

// 检查右手是否靠近左耳(右手打电话)
if (right_hand_points.size() >= 21) {
if (SpatialAnalyzer::IsHandNearEar(
right_hand_points[WRIST],
right_hand_points[INDEX_TIP],
left_ear, right_ear, phone_call_threshold_)) {
detected_behavior = DangerousBehaviorResult::PHONE_CALL;
confidence = 0.8f;
}
}

// 检查左手是否靠近右耳(左手打电话)
if (left_hand_points.size() >= 21 &&
detected_behavior == DangerousBehaviorResult::NONE) {
if (SpatialAnalyzer::IsHandNearEar(
left_hand_points[WRIST],
left_hand_points[INDEX_TIP],
left_ear, right_ear, phone_call_threshold_)) {
detected_behavior = DangerousBehaviorResult::PHONE_CALL;
confidence = 0.8f;
}
}
}

// 2. 吸烟/喝水检测
if (detected_behavior == DangerousBehaviorResult::NONE &&
face_points.size() > 13 &&
(left_hand_points.size() >= 21 || right_hand_points.size() >= 21)) {

Point3D mouth_center = face_points[13];
Point3D nose_tip = face_points[1];

// 检查手是否靠近嘴部
if (right_hand_points.size() >= 21) {
if (SpatialAnalyzer::IsHandNearMouth(
right_hand_points[WRIST],
right_hand_points[INDEX_TIP],
mouth_center, nose_tip, smoking_threshold_)) {
// 区分吸烟和喝水(需要序列分析)
detected_behavior = DangerousBehaviorResult::DRINKING;
confidence = 0.6f;
}
}

if (left_hand_points.size() >= 21 &&
detected_behavior == DangerousBehaviorResult::NONE) {
if (SpatialAnalyzer::IsHandNearMouth(
left_hand_points[WRIST],
left_hand_points[INDEX_TIP],
mouth_center, nose_tip, smoking_threshold_)) {
detected_behavior = DangerousBehaviorResult::DRINKING;
confidence = 0.6f;
}
}
}

// 3. 捡东西检测
if (detected_behavior == DangerousBehaviorResult::NONE &&
pose_points.size() > 24) {

if (SpatialAnalyzer::IsBodyLeaningForward(
pose_points[NOSE],
pose_points[LEFT_SHOULDER],
pose_points[RIGHT_SHOULDER],
pose_points[LEFT_HIP],
pose_points[RIGHT_HIP],
reaching_threshold_)) {
detected_behavior = DangerousBehaviorResult::REACHING;
confidence = 0.7f;
}
}

// 4. 双手离盘检测
if (detected_behavior == DangerousBehaviorResult::NONE &&
pose_points.size() > 24) {

// 使用身体关键点判断
if (SpatialAnalyzer::AreHandsOffWheel(
pose_points[LEFT_WRIST],
pose_points[RIGHT_WRIST],
pose_points[LEFT_SHOULDER],
pose_points[RIGHT_SHOULDER],
hands_off_threshold_)) {
detected_behavior = DangerousBehaviorResult::HANDS_OFF_WHEEL;
confidence = 0.9f;
}
}

// ========== 连续帧确认 ==========
if (detected_behavior == current_behavior_) {
confirmation_counter_++;
} else {
current_behavior_ = detected_behavior;
confirmation_counter_ = 1;
}

result.set_behavior(detected_behavior);
result.set_confidence(confidence);
result.set_confirmation_frames(confirmation_counter_);

// ========== 告警判断 ==========
bool alert = false;
if (confirmation_counter_ >= confirmation_frames_) {
if (detected_behavior == DangerousBehaviorResult::PHONE_CALL ||
detected_behavior == DangerousBehaviorResult::HANDS_OFF_WHEEL) {
alert = true;
} else if (detected_behavior == DangerousBehaviorResult::SMOKING ||
detected_behavior == DangerousBehaviorResult::DRINKING) {
// 吸烟/喝水需要更长确认时间
alert = confirmation_counter_ >= confirmation_frames_ * 2;
}
}

cc->Outputs().Tag("BEHAVIOR").AddPacket(
MakePacket<DangerousBehaviorResult>(result).At(cc->InputTimestamp()));
cc->Outputs().Tag("ALERT").AddPacket(
MakePacket<bool>(alert).At(cc->InputTimestamp()));

return absl::OkStatus();
}

private:
// 阈值
float phone_call_threshold_ = 0.15f;
float smoking_threshold_ = 0.12f;
float drinking_threshold_ = 0.12f;
float reaching_threshold_ = 15.0f;
float hands_off_threshold_ = 0.3f;

// 确认帧数
int confirmation_frames_ = 5;

// 状态
DangerousBehaviorResult::BehaviorType current_behavior_ =
DangerousBehaviorResult::NONE;
int confirmation_counter_ = 0;
};

REGISTER_CALCULATOR(DangerousBehaviorCalculator);

} // namespace mediapipe

#endif // MEDIAPIPE_CALCULATORS_IMS_DANGEROUS_BEHAVIOR_CALCULATOR_H_

四十九、Graph 配置

49.1 完整危险行为检测 Graph

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
# ims_dangerous_behavior_detection_graph.pbtxt

input_stream: "IMAGE:ir_image"
output_stream: "BEHAVIOR:behavior_result"
output_stream: "ALERT:alert"

# ========== 1. Holistic 检测(Face + Pose + Hands)==========
node {
calculator: "HolisticGpu"
input_stream: "IMAGE:ir_image"
output_stream: "FACE_LANDMARKS:face_landmarks"
output_stream: "POSE_LANDMARKS:pose_landmarks"
output_stream: "LEFT_HAND_LANDMARKS:left_hand_landmarks"
output_stream: "RIGHT_HAND_LANDMARKS:right_hand_landmarks"
options {
[mediapipe.HolisticOptions.ext] {
enable_segmentation: false
refine_face_landmarks: true
model_complexity: 1
}
}
}

# ========== 2. 关键点有效性检查 ==========
node {
calculator: "LandmarkQualityFilterCalculator"
input_stream: "FACE_LANDMARKS:face_landmarks"
input_stream: "POSE_LANDMARKS:pose_landmarks"
input_stream: "LEFT_HAND_LANDMARKS:left_hand_landmarks"
input_stream: "RIGHT_HAND_LANDMARKS:right_hand_landmarks"
output_stream: "FILTERED_FACE:face"
output_stream: "FILTERED_POSE:pose"
output_stream: "FILTERED_LEFT_HAND:left_hand"
output_stream: "FILTERED_RIGHT_HAND:right_hand"
options {
[mediapipe.LandmarkQualityFilterOptions.ext] {
min_visibility: 0.5
min_presence: 0.5
}
}
}

# ========== 3. 危险行为检测 ==========
node {
calculator: "DangerousBehaviorCalculator"
input_stream: "FACE_LANDMARKS:face"
input_stream: "POSE_LANDMARKS:pose"
input_stream: "LEFT_HAND_LANDMARKS:left_hand"
input_stream: "RIGHT_HAND_LANDMARKS:right_hand"
output_stream: "BEHAVIOR:behavior_result"
output_stream: "ALERT:alert"
options {
[mediapipe.DangerousBehaviorOptions.ext] {
phone_call_threshold: 0.15
smoking_threshold: 0.12
drinking_threshold: 0.12
reaching_threshold: 15.0
hands_off_threshold: 0.3
confirmation_frames: 5
}
}
}

# ========== 4. 告警管理 ==========
node {
calculator: "AlertManagerCalculator"
input_stream: "ALERT:alert"
input_stream: "BEHAVIOR:behavior_result"
output_stream: "MANAGED_ALERT:final_alert"
options {
[mediapipe.AlertManagerOptions.ext] {
cooldown_ms: 5000
max_alerts_per_minute: 10
}
}
}

五十、总结

要点 说明
Euro NCAP 要求 打电话、吸烟、喝水、捡东西、双手离盘
检测方法 规则 + 深度学习混合方案
关键点空间分析 手到脸/耳/嘴的距离、身体倾斜角度
时序确认 连续帧确认避免误判

下篇预告

MediaPipe 系列 46:IMS OMS 架构——乘员检测流水线

深入讲解 OMS 乘员检测、儿童存在检测(CPD)、Euro NCAP 2026 要求。


参考资料

  1. Euro NCAP. “Assessment Protocol - Safe Driving” (2026)
  2. MediaPipe. Holistic Solution

系列进度: 45/55
更新时间: 2026-03-12


MediaPipe 系列 45:IMS DMS 架构——危险行为检测完整指南
https://dapalm.com/2026/03/13/MediaPipe系列45-IMS-DMS架构:危险行为检测/
作者
Mars
发布于
2026年3月13日
许可协议