MediaPipe 系列 46:IMS OMS 架构——乘员检测流水线完整实现

一、OMS 业务背景

1.1 为什么需要 OMS?

OMS (Occupant Monitoring System) 是 Euro NCAP 2025+ 的强制要求:

  • 儿童安全: 后排儿童离车检测(CPD - Child Presence Detection)
  • 安全带提醒: 检测乘客是否系安全带
  • 气囊控制: 根据乘员位置和体型调整气囊部署策略
  • 人数统计: 车辆过载检测、智能锁车提醒

1.2 Euro NCAP OMS 要求

测试场景 要求 检测阈值
前排空座椅 检测无人 准确率 ≥ 95%
前排有成人 检测乘员 准确率 ≥ 95%
后排有儿童 检测儿童 准确率 ≥ 90%
多人检测 统计人数 准确率 ≥ 85%
儿童遗留 车内遗留儿童 响应时间 < 30 秒

本篇聚焦乘员检测流水线——检测车内所有乘员的位置、数量和类型。

1.3 检测指标体系

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
┌─────────────────────────────────────────────────────────────┐
│ 乘员检测指标体系 │
├─────────────────────────────────────────────────────────────┤
│ │
│ 一级指标(直接观测) │
│ ├── Occupant Count:乘员数量 │
│ ├── Occupant Position:乘员位置(前排/后排) │
│ ├── Occupant Type:乘员类型(成人/儿童/婴儿) │
│ └── Occupant Confidence:检测置信度 │
│ │
│ 二级指标(人体分析) │
│ ├── Face Bounding Box:人脸边界框 │
│ ├── Body Bounding Box:人体边界框 │
│ ├── Keypoints:关键点(肩、腰、膝盖) │
│ ├── Body Size:身体尺寸(身高/体重估算) │
│ └── Seat Position:座位位置 │
│ │
│ 三级指标(融合决策) │
│ ├── Seat Occupancy:座椅占用状态 │
│ ├── Occupant ID:乘员追踪 ID
│ ├── Presence Duration:停留时间 │
│ └── CPD Alert:儿童遗留告警 │
│ │
└─────────────────────────────────────────────────────────────┘

二、乘员检测原理

2.1 检测流程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
┌─────────────────────────────────────────────────────────────┐
│ 乘员检测流程 │
├─────────────────────────────────────────────────────────────┤
│ │
│ Step 1: 人体检测(粗定位) │
│ ┌─────────────────────────────────┐ │
│ │ ┌─────┐ ┌─────┐ ┌─────┐ │ │
│ │ │ 1 │ │ 2 │ │ 3 │ │ │
│ │ └─────┘ └─────┘ └─────┘ │ │
│ │ 检测图像中所有人体 │ │
│ └─────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Step 2: 人脸检测(精定位) │
│ ┌─────────────────────────────────┐ │
│ │ ┌─────┐ ┌─────┐ ┌─────┐ │ │
│ │ │ ● │ │ ● │ │ │ │ │
│ │ └─────┘ └─────┘ └─────┘ │ │
│ │ 检测每个人体中的人脸 │ │
│ └─────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Step 3: 乘员分类(成人/儿童) │
│ ┌─────────────────────────────────┐ │
│ │ ┌─────┐ ┌─────┐ ┌─────┐ │ │
│ │ │ A │ │ A │ │ C │ │ │
│ │ (成人) (成人) (儿童) │ │
│ └─────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Step 4: 座位映射(位置判断) │
│ ┌─────────────────────────────────┐ │
│ │ 驾驶位 副驾 后左 后右 │ │
│ │ ┌─────┐ ┌─────┐ │ │
│ │ │ 1 │ │ 2 │ │ │
│ │ └─────┘ └─────┘ │ │
│ │ ┌─────┐ ┌─────┐ │ │
│ │ │ 3 │ │ │ │ │
│ │ └─────┘ └─────┘ │ │
│ └─────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘

2.2 人体检测

使用 YOLOv8 或 SSD 进行人体检测:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
┌─────────────────────────────────────────────────────────────┐
│ 人体检测模型 │
├─────────────────────────────────────────────────────────────┤
│ │
│ 输入图像: 640×480 或 1280×720 │
│ │
│ ┌───────────────────────────────────────────┐ │
│ │ 人体检测输出 │ │
│ │ [x, y, width, height, confidence] × N │ │
│ └───────────────────────────────────────────┘ │
│ │
│ 模型选择: │
│ - YOLOv8-nano: 速度最快,适合嵌入式 │
│ - YOLOv8-small: 平衡速度和精度 │
│ - MobileNetV3-SSD: 轻量级,适合移动端 │
│ │
│ 检测阈值: │
│ - confidence_threshold: 0.5 │
│ - iou_threshold: 0.45 (NMS) │
│ │
└─────────────────────────────────────────────────────────────┘

2.3 人脸检测

在人体边界框内检测人脸:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
def detect_faces_in_person(person_bbox, image):
"""
在人体边界框内检测人脸

Args:
person_bbox: 人体边界框 [x, y, width, height]
image: 输入图像

Returns:
List[face_bbox]: 人脸边界框列表
"""
# 裁剪人体区域
x, y, w, h = person_bbox
padding = 0.2 # 扩展 20%

crop_x = max(0, int(x - padding * w))
crop_y = max(0, int(y - padding * h))
crop_w = min(image.shape[1] - crop_x, int(w * (1 + 2 * padding)))
crop_h = min(image.shape[0] - crop_y, int(h * (1 + 2 * padding)))

person_roi = image[crop_y:crop_y+crop_h, crop_x:crop_x+crop_w]

# 人脸检测
face_detections = face_detector.detect(person_roi)

# 转换回原图坐标
faces = []
for face in face_detections:
face_bbox = [
crop_x + face.x,
crop_y + face.y,
face.width,
face.height
]
faces.append(face_bbox)

return faces

2.4 乘员类型分类

基于身体尺寸判断乘员类型:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
┌─────────────────────────────────────────────────────────────┐
│ 乘员类型判断 │
├─────────────────────────────────────────────────────────────┤
│ │
│ 特征提取: │
│ ├── 人脸高度 (face_height) │
│ ├── 肩宽 (shoulder_width) │
│ ├── 头肩比例 (head_to_shoulder_ratio) │
│ └── 身体高度 (body_height) │
│ │
│ 分类规则: │
│ │
│ 成人:
│ ├── face_height > 0.15 * image_height │
│ ├── shoulder_width > 0.30 * image_width │
│ └── head_to_shoulder_ratio < 0.4 │
│ │
│ 儿童 (6-12 岁):
│ ├── face_height: 0.10-0.15 * image_height │
│ ├── shoulder_width: 0.20-0.30 * image_width │
│ └── head_to_shoulder_ratio: 0.4-0.5 │
│ │
│ 婴儿 (0-5 岁):
│ ├── face_height < 0.10 * image_height │
│ ├── shoulder_width < 0.20 * image_width │
│ └── head_to_shoulder_ratio > 0.5 │
│ │
│ 座椅检测(可选): │
│ └── 检测儿童座椅 → 判定为儿童 │
│ │
└─────────────────────────────────────────────────────────────┘

三、座位映射逻辑

3.1 车内区域划分

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
┌─────────────────────────────────────────────────────────────┐
│ 车内座位布局 │
├─────────────────────────────────────────────────────────────┤
│ │
│ 前挡风玻璃 │
│ ┌───────────────┐ │
│ │ │ │
│ 驾驶位 │ │ 副驾位 │
│ (POSITION_0) │ │ (POSITION_1) │
│ ┌─────────┐ │ │ ┌─────────┐ │
│ │ │ │ │ │ │ │
│ │ ● │ │ │ │ ● │ │
│ │ │ │ │ │ │ │
│ └─────────┘ │ │ └─────────┘ │
│ │ │ │
│ │ │ │
│ └───────────────┘ │
│ │
│ 后排左侧 后排中间 后排右侧 │
│ (POSITION_2) (POSITION_3) (POSITION_4) │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ │ │ │ │ │ │
│ │ ● │ │ ● │ │ ● │ │
│ │ │ │ │ │ │ │
│ └─────────┘ └─────────┘ └─────────┘ │
│ │
│ 座位定义: │
│ - POSITION_0: 驾驶位(左前) │
│ - POSITION_1: 副驾位(右前) │
│ - POSITION_2: 后排左侧 │
│ - POSITION_3: 后排中间 │
│ - POSITION_4: 后排右侧 │
│ │
└─────────────────────────────────────────────────────────────┘

3.2 座位判断算法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
enum SeatPosition {
SEAT_DRIVER = 0, // 驾驶位
SEAT_FRONT_PASSENGER = 1, // 副驾位
SEAT_REAR_LEFT = 2, // 后排左侧
SEAT_REAR_CENTER = 3, // 后排中间
SEAT_REAR_RIGHT = 4 // 后排右侧
};

struct SeatRegion {
float x_min; // 归一化 X 范围 [0, 1]
float x_max;
float y_min; // 归一化 Y 范围 [0, 1]
float y_max;
};

// 定义座位区域
std::map<SeatPosition, SeatRegion> kSeatRegions = {
{SEAT_DRIVER, {0.05f, 0.35f, 0.10f, 0.60f}},
{SEAT_FRONT_PASSENGER, {0.65f, 0.95f, 0.10f, 0.60f}},
{SEAT_REAR_LEFT, {0.05f, 0.35f, 0.65f, 0.95f}},
{SEAT_REAR_CENTER, {0.35f, 0.65f, 0.65f, 0.95f}},
{SEAT_REAR_RIGHT, {0.65f, 0.95f, 0.65f, 0.95f}}
};

SeatPosition DetermineSeatPosition(
const BoundingBox& bbox,
const CameraConfig& config) {

// 计算边界框中心
float center_x = bbox.x + bbox.width / 2.0f;
float center_y = bbox.y + bbox.height / 2.0f;

// 归一化坐标
float normalized_x = center_x / config.image_width;
float normalized_y = center_y / config.image_height;

// 遍历所有座位区域
for (const auto& [seat, region] : kSeatRegions) {
if (normalized_x >= region.x_min &&
normalized_x <= region.x_max &&
normalized_y >= region.y_min &&
normalized_y <= region.y_max) {
return seat;
}
}

// 默认返回未知
return SEAT_DRIVER; // 默认归为驾驶位
}

3.3 多摄像头融合

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
┌─────────────────────────────────────────────────────────────┐
│ 多摄像头布局 │
├─────────────────────────────────────────────────────────────┤
│ │
│ Camera 1 (OMS Front) Camera 2 (OMS Rear) │
│ 覆盖前排座位 覆盖后排座位 │
│ │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ ┌───┐ │ │ ┌───┐ │ │
│ │ │ ● │ │ │ │ ● │ │ │
│ │ └───┘ │ │ └───┘ │ │
│ │ ┌───┐ │ │ ┌───┐ │ │
│ │ │ ● │ │ │ │ ● │ │ │
│ │ └───┘ │ │ └───┘ │ │
│ └─────────────┘ └─────────────┘ │
│ │
│ 融合策略: │
1. 每个 Camera 独立检测 │
2. 合并所有检测结果 │
3. 去重(基于 IOU) │
4. 分配唯一 Occupant ID │
│ │
└─────────────────────────────────────────────────────────────┘

四、完整流水线架构

4.1 架构图

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
┌─────────────────────────────────────────────────────────────────────────┐
IMS OMS 乘员检测完整流水线
├─────────────────────────────────────────────────────────────────────────┤

输入层
┌─────────────┐
OMS Camera 1280×720 @ 30fps (广角,覆盖全车)
└─────────────┘


检测层
┌─────────────┐ ┌─────────────┐
│Body │────▶│Face
│Detection │Detection
│(YOLOv8) │(BlazeFace)
└─────────────┘ └─────────────┘


┌─────────────┐ ┌─────────────┐
│Person │Face
│BBoxes │BBoxes
│[x,y,w,h]×N │[x,y,w,h]×M
└─────────────┘ └─────────────┘

匹配层
┌─────────────────────────────────────────────┐
Face-Person Matcher
- 将人脸关联到人体
- 基于包含关系
- 输出: PersonWithFace
└─────────────────────────────────────────────┘


┌─────────────────────────────────────────────┐
Occupant (person + face)
[person_bbox, face_bbox, confidence]
└─────────────────────────────────────────────┘

分析层
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│Occupant │────▶│Seat │────▶│Occupant
│Type │Position │Tracker
│Classifier │Mapper │(ID分配)
└─────────────┘ └─────────────┘ └─────────────┘


┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│Type │Seat ID │Occupant ID
│(Adult/Child)│ │(0-4) │(unique)
└─────────────┘ └─────────────┘ └─────────────┘

融合层
┌─────────────────────────────────────────────────────┐
Occupant Aggregator
- 统计人数
- 座椅占用状态
- 儿童检测告警
└─────────────────────────────────────────────────────┘


输出层
┌─────────────────────────────────────────────────────────┐
OccupancyResult {
occupant_count: 3,
occupants: [
{id: 1, seat: 0, type: ADULT, confidence: 0.92},
{id: 2, seat: 1, type: ADULT, confidence: 0.88},
{id: 3, seat: 4, type: CHILD, confidence: 0.95}
],
seat_occupancy: [true, true, false, false, true],
child_detected: true,
cpd_alert: false
}
└─────────────────────────────────────────────────────────┘

└─────────────────────────────────────────────────────────────────────────┘

五、完整 Graph 配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
# mediapipe/graphs/ims/oms_occupant_graph.pbtxt

# ============== 输入输出定义 ==============
input_stream: "OMS_IMAGE:oms_image"
input_stream: "TIMESTAMP:timestamp"

output_stream: "OCCUPANCY_RESULT:occupancy_result"
output_stream: "ALERT:alert"

# ============== 1. 人体检测 ==============
node {
calculator: "ObjectDetectionCalculator"
input_stream: "IMAGE:oms_image"
output_stream: "DETECTIONS:person_detections"
options {
[mediapipe.ObjectDetectionOptions.ext] {
model_path: "/models/yolov8n_person.tflite"
label_map: "person"
score_threshold: 0.5
iou_threshold: 0.45
max_detections: 10
}
}
}

# ============== 2. 人脸检测(在人体区域内) ==============
node {
calculator: "MultiRegionFaceDetectionCalculator"
input_stream: "IMAGE:oms_image"
input_stream: "PERSON_DETECTIONS:person_detections"
output_stream: "FACE_DETECTIONS:face_detections"
output_stream: "PERSON_FACES:person_faces"
options {
[mediapipe.MultiRegionFaceDetectionOptions.ext] {
model_path: "/models/blazeface.tflite"
score_threshold: 0.5
max_detections_per_person: 2
}
}
}

# ============== 3. 乘员检测器 ==============
node {
calculator: "OccupantDetectorCalculator"
input_stream: "PERSON_DETECTIONS:person_detections"
input_stream: "PERSON_FACES:person_faces"
input_stream: "IMAGE:oms_image"
output_stream: "OCCUPANTS:occupants"
output_stream: "OCCUPANT_COUNT:occupant_count"
options {
[mediapipe.OccupantDetectorOptions.ext] {
# 座位配置
image_width: 1280
image_height: 720

# 分类阈值
adult_min_face_height_ratio: 0.15
child_min_face_height_ratio: 0.10
child_max_face_height_ratio: 0.15
infant_max_face_height_ratio: 0.10

# 置信度阈值
min_person_confidence: 0.5
min_face_confidence: 0.5
min_occupant_confidence: 0.6
}
}
}

# ============== 4. 乘员追踪 ==============
node {
calculator: "OccupantTrackerCalculator"
input_stream: "OCCUPANTS:occupants"
input_stream: "TIMESTAMP:timestamp"
output_stream: "TRACKED_OCCUPANTS:tracked_occupants"
options {
[mediapipe.OccupantTrackerOptions.ext] {
# 追踪配置
max_distance: 0.2 # 最大位移距离(归一化)
max_age: 30 # 最大丢失帧数
min_hits: 3 # 最小确认帧数
}
}
}

# ============== 5. 座椅占用状态 ==============
node {
calculator: "SeatOccupancyCalculator"
input_stream: "TRACKED_OCCUPANTS:tracked_occupants"
output_stream: "SEAT_OCCUPANCY:seat_occupancy"
output_stream: "OCCUPANT_BY_SEAT:occupant_by_seat"
options {
[mediapipe.SeatOccupancyOptions.ext] {
# 座位区域(归一化坐标)
seat_0_x_min: 0.05
seat_0_x_max: 0.35
seat_0_y_min: 0.10
seat_0_y_max: 0.60

seat_1_x_min: 0.65
seat_1_x_max: 0.95
seat_1_y_min: 0.10
seat_1_y_max: 0.60

seat_2_x_min: 0.05
seat_2_x_max: 0.35
seat_2_y_min: 0.65
seat_2_y_max: 0.95

seat_3_x_min: 0.35
seat_3_x_max: 0.65
seat_3_y_min: 0.65
seat_3_y_max: 0.95

seat_4_x_min: 0.65
seat_4_x_max: 0.95
seat_4_y_min: 0.65
seat_4_y_max: 0.95
}
}
}

# ============== 6. 儿童检测告警 ==============
node {
calculator: "ChildPresenceDetectionCalculator"
input_stream: "TRACKED_OCCUPANTS:tracked_occupants"
input_stream: "TIMESTAMP:timestamp"
output_stream: "CPD_ALERT:cpd_alert"
options {
[mediapipe.ChildPresenceDetectionOptions.ext] {
# 告警配置
alert_threshold_seconds: 30 # 儿童单独在车内 30 秒告警
require_no_adult: true # 要求无成人
}
}
}

# ============== 7. 结果汇总 ==============
node {
calculator: "OccupancyResultAggregatorCalculator"
input_stream: "TRACKED_OCCUPANTS:tracked_occupants"
input_stream: "OCCUPANT_COUNT:occupant_count"
input_stream: "SEAT_OCCUPANCY:seat_occupancy"
input_stream: "CPD_ALERT:cpd_alert"
output_stream: "OCCUPANCY_RESULT:occupancy_result"
output_stream: "ALERT:alert"
}

六、核心 Calculator 实现

6.1 OccupantDetectorCalculator

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
// mediapipe/calculators/ims/occupant_detector_calculator.h
#ifndef MEDIAPIPE_CALCULATORS_IMS_OCCUPANT_DETECTOR_CALCULATOR_H_
#define MEDIAPIPE_CALCULATORS_IMS_OCCUPANT_DETECTOR_CALCULATOR_H_

#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/formats/detection.pb.h"
#include "mediapipe/framework/formats/image_frame.h"
#include "mediapipe/calculators/ims/occupant_detector_options.pb.h"

namespace mediapipe {

// 乘员类型
enum OccupantType {
OCCUPANT_ADULT = 0, // 成人
OCCUPANT_CHILD = 1, // 儿童 (6-12 岁)
OCCUPANT_INFANT = 2, // 婴儿 (0-5 岁)
OCCUPANT_UNKNOWN = 3 // 未知
};

// 座位位置
enum SeatPosition {
SEAT_DRIVER = 0,
SEAT_FRONT_PASSENGER = 1,
SEAT_REAR_LEFT = 2,
SEAT_REAR_CENTER = 3,
SEAT_REAR_RIGHT = 4,
SEAT_UNKNOWN = 5
};

// 乘员信息
struct Occupant {
int id; // 乘员 ID
OccupantType type; // 乘员类型
SeatPosition seat; // 座位位置
BoundingBox person_bbox; // 人体边界框
BoundingBox face_bbox; // 人脸边界框
float confidence; // 置信度
float face_height_ratio; // 人脸高度比例
float body_height_ratio; // 身体高度比例
bool has_face; // 是否检测到人脸
int64_t timestamp;
};

class OccupantDetectorCalculator : public CalculatorBase {
public:
static absl::Status GetContract(CalculatorContract* cc);

absl::Status Open(CalculatorContext* cc) override;
absl::Status Process(CalculatorContext* cc) override;

private:
// 匹配人脸到人体
void MatchFacesToPersons(
const std::vector<Detection>& person_detections,
const std::vector<std::vector<Detection>>& person_faces,
std::vector<Occupant>* occupants);

// 分类乘员类型
OccupantType ClassifyOccupantType(
const Occupant& occupant,
const ImageFrame& image);

// 判断座位位置
SeatPosition DetermineSeatPosition(
const BoundingBox& bbox,
const ImageFrame& image);

// 计算特征
void CalculateFeatures(
const BoundingBox& person_bbox,
const BoundingBox& face_bbox,
const ImageFrame& image,
Occupant* occupant);

// 配置
int image_width_;
int image_height_;
float adult_min_face_height_ratio_;
float child_min_face_height_ratio_;
float child_max_face_height_ratio_;
float infant_max_face_height_ratio_;
float min_person_confidence_;
float min_face_confidence_;
float min_occupant_confidence_;

// ID 计数器
int next_occupant_id_ = 1;
};

} // namespace mediapipe

#endif // MEDIAPIPE_CALCULATORS_IMS_OCCUPANT_DETECTOR_CALCULATOR_H_
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
// mediapipe/calculators/ims/occupant_detector_calculator.cc
#include "mediapipe/calculators/ims/occupant_detector_calculator.h"
#include "mediapipe/framework/port/logging.h"
#include "mediapipe/framework/formats/landmark.pb.h"

namespace mediapipe {

using mediapipe::Detection;
using mediapipe::ImageFrame;

absl::Status OccupantDetectorCalculator::GetContract(CalculatorContract* cc) {
cc->Inputs().Tag("PERSON_DETECTIONS").Set<std::vector<Detection>>();
cc->Inputs().Tag("PERSON_FACES").Set<std::vector<std::vector<Detection>>>();
cc->Inputs().Tag("IMAGE").Set<ImageFrame>();

cc->Outputs().Tag("OCCUPANTS").Set<std::vector<Occupant>>();
cc->Outputs().Tag("OCCUPANT_COUNT").Set<int>();

cc->Options<OccupantDetectorOptions>();

return absl::OkStatus();
}

absl::Status OccupantDetectorCalculator::Open(CalculatorContext* cc) {
const auto& options = cc->Options<OccupantDetectorOptions>();

image_width_ = options.image_width();
image_height_ = options.image_height();
adult_min_face_height_ratio_ = options.adult_min_face_height_ratio();
child_min_face_height_ratio_ = options.child_min_face_height_ratio();
child_max_face_height_ratio_ = options.child_max_face_height_ratio();
infant_max_face_height_ratio_ = options.infant_max_face_height_ratio();
min_person_confidence_ = options.min_person_confidence();
min_face_confidence_ = options.min_face_confidence();
min_occupant_confidence_ = options.min_occupant_confidence();

LOG(INFO) << "OccupantDetectorCalculator initialized";

return absl::OkStatus();
}

absl::Status OccupantDetectorCalculator::Process(CalculatorContext* cc) {
if (cc->Inputs().Tag("PERSON_DETECTIONS").IsEmpty() ||
cc->Inputs().Tag("PERSON_FACES").IsEmpty() ||
cc->Inputs().Tag("IMAGE").IsEmpty()) {
return absl::OkStatus();
}

const auto& person_detections =
cc->Inputs().Tag("PERSON_DETECTIONS").Get<std::vector<Detection>>();

const auto& person_faces =
cc->Inputs().Tag("PERSON_FACES").Get<std::vector<std::vector<Detection>>>();

const auto& image = cc->Inputs().Tag("IMAGE").Get<ImageFrame>();

// 匹配人脸到人体
std::vector<Occupant> occupants;
MatchFacesToPersons(person_detections, person_faces, &occupants);

// 分析每个乘员
for (auto& occupant : occupants) {
// 计算特征
if (occupant.has_face) {
CalculateFeatures(occupant.person_bbox, occupant.face_bbox, image, &occupant);

// 分类乘员类型
occupant.type = ClassifyOccupantType(occupant, image);
} else {
// 无人脸,基于身体尺寸推断
CalculateFeatures(occupant.person_bbox, {}, image, &occupant);
occupant.type = OCCUPANT_UNKNOWN;
}

// 判断座位位置
occupant.seat = DetermineSeatPosition(occupant.person_bbox, image);

// 分配 ID
occupant.id = next_occupant_id_++;
occupant.timestamp = cc->InputTimestamp().Value();

VLOG(1) << "Occupant " << occupant.id
<< ": type=" << occupant.type
<< ", seat=" << occupant.seat
<< ", confidence=" << occupant.confidence;
}

// 过滤低置信度乘员
std::vector<Occupant> filtered_occupants;
for (const auto& occupant : occupants) {
if (occupant.confidence >= min_occupant_confidence_) {
filtered_occupants.push_back(occupant);
}
}

// 输出
cc->Outputs().Tag("OCCUPANTS").AddPacket(
MakePacket<std::vector<Occupant>>(filtered_occupants)
.At(cc->InputTimestamp()));

cc->Outputs().Tag("OCCUPANT_COUNT").AddPacket(
MakePacket<int>(static_cast<int>(filtered_occupants.size()))
.At(cc->InputTimestamp()));

return absl::OkStatus();
}

void OccupantDetectorCalculator::MatchFacesToPersons(
const std::vector<Detection>& person_detections,
const std::vector<std::vector<Detection>>& person_faces,
std::vector<Occupant>* occupants) {

// 确保人体和数量匹配
int num_persons = person_detections.size();
int num_person_faces = person_faces.size();
int min_count = std::min(num_persons, num_person_faces);

for (int i = 0; i < min_count; ++i) {
const auto& person = person_detections[i];

// 跳过低置信度人体
if (person.score()[0] < min_person_confidence_) {
continue;
}

Occupant occupant;
occupant.person_bbox = {
person.location_data().relative_bounding_box().xmin(),
person.location_data().relative_bounding_box().ymin(),
person.location_data().relative_bounding_box().width(),
person.location_data().relative_bounding_box().height()
};
occupant.confidence = person.score()[0];
occupant.has_face = false;

// 匹配人脸
if (i < num_person_faces && !person_faces[i].empty()) {
const auto& faces = person_faces[i];

// 选择最佳人脸(置信度最高)
const auto* best_face = &faces[0];
float best_score = faces[0].score()[0];

for (const auto& face : faces) {
if (face.score()[0] > best_score) {
best_score = face.score()[0];
best_face = &face;
}
}

if (best_score >= min_face_confidence_) {
occupant.face_bbox = {
best_face->location_data().relative_bounding_box().xmin(),
best_face->location_data().relative_bounding_box().ymin(),
best_face->location_data().relative_bounding_box().width(),
best_face->location_data().relative_bounding_box().height()
};
occupant.has_face = true;
}
}

occupants->push_back(occupant);
}
}

void OccupantDetectorCalculator::CalculateFeatures(
const BoundingBox& person_bbox,
const BoundingBox& face_bbox,
const ImageFrame& image,
Occupant* occupant) {

// 身体特征
occupant->body_height_ratio = person_bbox.height;

// 人脸特征
if (occupant->has_face) {
occupant->face_height_ratio = face_bbox.height;

// 确保人脸在人体内
float face_center_x = face_bbox.x + face_bbox.width / 2.0f;
float face_center_y = face_bbox.y + face_bbox.height / 2.0f;

float person_min_x = person_bbox.x;
float person_max_x = person_bbox.x + person_bbox.width;
float person_min_y = person_bbox.y;
float person_max_y = person_bbox.y + person_bbox.height;

// 检查人脸是否在人体区域内
if (face_center_x >= person_min_x && face_center_x <= person_max_x &&
face_center_y >= person_min_y && face_center_y <= person_max_y) {
// 人脸在人体内,增加置信度
occupant->confidence *= 1.1f;
}
}
}

OccupantType OccupantDetectorCalculator::ClassifyOccupantType(
const Occupant& occupant,
const ImageFrame& image) {

if (!occupant.has_face) {
return OCCUPANT_UNKNOWN;
}

float face_ratio = occupant.face_height_ratio;

// 婴儿 (0-5 岁): 人脸高度比例 < 10%
if (face_ratio < infant_max_face_height_ratio_) {
return OCCUPANT_INFANT;
}

// 儿童 (6-12 岁): 10% <= 人脸高度比例 < 15%
if (face_ratio >= child_min_face_height_ratio_ &&
face_ratio < child_max_face_height_ratio_) {
return OCCUPANT_CHILD;
}

// 成人: 人脸高度比例 >= 15%
if (face_ratio >= adult_min_face_height_ratio_) {
return OCCUPANT_ADULT;
}

return OCCUPANT_UNKNOWN;
}

SeatPosition OccupantDetectorCalculator::DetermineSeatPosition(
const BoundingBox& bbox,
const ImageFrame& image) {

// 计算边界框中心
float center_x = bbox.x + bbox.width / 2.0f;
float center_y = bbox.y + bbox.height / 2.0f;

// 归一化坐标
float normalized_x = center_x / image_width_;
float normalized_y = center_y / image_height_;

// 驾驶位 (左前)
if (normalized_x >= 0.05f && normalized_x <= 0.35f &&
normalized_y >= 0.10f && normalized_y <= 0.60f) {
return SEAT_DRIVER;
}

// 副驾位 (右前)
if (normalized_x >= 0.65f && normalized_x <= 0.95f &&
normalized_y >= 0.10f && normalized_y <= 0.60f) {
return SEAT_FRONT_PASSENGER;
}

// 后排左侧
if (normalized_x >= 0.05f && normalized_x <= 0.35f &&
normalized_y >= 0.65f && normalized_y <= 0.95f) {
return SEAT_REAR_LEFT;
}

// 后排中间
if (normalized_x >= 0.35f && normalized_x <= 0.65f &&
normalized_y >= 0.65f && normalized_y <= 0.95f) {
return SEAT_REAR_CENTER;
}

// 后排右侧
if (normalized_x >= 0.65f && normalized_x <= 0.95f &&
normalized_y >= 0.65f && normalized_y <= 0.95f) {
return SEAT_REAR_RIGHT;
}

return SEAT_UNKNOWN;
}

REGISTER_CALCULATOR(OccupantDetectorCalculator);

} // namespace mediapipe

6.2 OccupantTrackerCalculator

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
// mediapipe/calculators/ims/occupant_tracker_calculator.h
#ifndef MEDIAPIPE_CALCULATORS_IMS_OCCUPANT_TRACKER_CALCULATOR_H_
#define MEDIAPIPE_CALCULATORS_IMS_OCCUPANT_TRACKER_CALCULATOR_H_

#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/calculators/ims/occupant_detector_calculator.h"
#include <map>
#include <deque>

namespace mediapipe {

// 追踪状态
enum TrackState {
STATE_TENTATIVE = 0, // 临时(需要更多帧确认)
STATE_CONFIRMED = 1, // 已确认
STATE_DELETED = 2 // 已删除
};

// 追踪对象
struct Track {
int id; // 追踪 ID
Occupant occupant; // 乘员信息
TrackState state; // 追踪状态
int hits; // 命中帧数
int age; // 丢失帧数
BoundingBox smooth_bbox; // 平滑后的边界框
std::deque<BoundingBox> bbox_history; // 历史边界框
};

class OccupantTrackerCalculator : public CalculatorBase {
public:
static absl::Status GetContract(CalculatorContract* cc);

absl::Status Open(CalculatorContext* cc) override;
absl::Status Process(CalculatorContext* cc) override;

private:
// 匹配检测到追踪
void MatchDetectionsToTracks(
const std::vector<Occupant>& detections,
std::map<int, Track>* tracks);

// 计算距离
float CalculateDistance(
const BoundingBox& bbox1,
const BoundingBox& bbox2);

// 更新追踪
void UpdateTrack(Track* track, const Occupant& occupant);

// 创建新追踪
Track CreateTrack(const Occupant& occupant);

// 删除旧追踪
void DeleteOldTracks(std::map<int, Track>* tracks);

// 配置
float max_distance_; // 最大匹配距离
int max_age_; // 最大丢失帧数
int min_hits_; // 最小确认帧数
int smooth_window_; // 平滑窗口大小

// 追踪状态
std::map<int, Track> tracks_;
int next_track_id_ = 1;
};

} // namespace mediapipe

#endif // MEDIAPIPE_CALCULATORS_IMS_OCCUPANT_TRACKER_CALCULATOR_H_
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
// mediapipe/calculators/ims/occupant_tracker_calculator.cc
#include "mediapipe/calculators/ims/occupant_tracker_calculator.h"
#include "mediapipe/framework/port/logging.h"
#include <limits>

namespace mediapipe {

absl::Status OccupantTrackerCalculator::GetContract(CalculatorContract* cc) {
cc->Inputs().Tag("OCCUPANTS").Set<std::vector<Occupant>>();
cc->Inputs().Tag("TIMESTAMP").Set<int64_t>();

cc->Outputs().Tag("TRACKED_OCCUPANTS").Set<std::vector<Occupant>>();

cc->Options<OccupantTrackerOptions>();

return absl::OkStatus();
}

absl::Status OccupantTrackerCalculator::Open(CalculatorContext* cc) {
const auto& options = cc->Options<OccupantTrackerOptions>();

max_distance_ = options.max_distance();
max_age_ = options.max_age();
min_hits_ = options.min_hits();
smooth_window_ = 5;

LOG(INFO) << "OccupantTrackerCalculator initialized";

return absl::OkStatus();
}

absl::Status OccupantTrackerCalculator::Process(CalculatorContext* cc) {
if (cc->Inputs().Tag("OCCUPANTS").IsEmpty()) {
return absl::OkStatus();
}

const auto& detections = cc->Inputs().Tag("OCCUPANTS").Get<std::vector<Occupant>>();

// 匹配检测到追踪
MatchDetectionsToTracks(detections, &tracks_);

// 删除旧追踪
DeleteOldTracks(&tracks_);

// 提取确认的追踪
std::vector<Occupant> tracked_occupants;
for (const auto& [id, track] : tracks_) {
if (track.state == STATE_CONFIRMED) {
Occupant occupant = track.occupant;
occupant.id = id; // 使用追踪 ID
tracked_occupants.push_back(occupant);
}
}

// 输出
cc->Outputs().Tag("TRACKED_OCCUPANTS").AddPacket(
MakePacket<std::vector<Occupant>>(tracked_occupants)
.At(cc->InputTimestamp()));

VLOG(1) << "Tracked " << tracked_occupants.size() << " occupants";

return absl::OkStatus();
}

void OccupantTrackerCalculator::MatchDetectionsToTracks(
const std::vector<Occupant>& detections,
std::map<int, Track>* tracks) {

std::vector<bool> detection_matched(detections.size(), false);

// 第一步:匹配已确认的追踪
for (auto& [id, track] : *tracks) {
if (track.state != STATE_CONFIRMED) continue;

float best_distance = max_distance_;
int best_idx = -1;

for (size_t i = 0; i < detections.size(); ++i) {
if (detection_matched[i]) continue;

float distance = CalculateDistance(
track.occupant.person_bbox,
detections[i].person_bbox);

if (distance < best_distance) {
best_distance = distance;
best_idx = i;
}
}

if (best_idx >= 0) {
UpdateTrack(&track, detections[best_idx]);
detection_matched[best_idx] = true;
} else {
// 未匹配,增加丢失帧数
track.age++;
if (track.age > max_age_) {
track.state = STATE_DELETED;
}
}
}

// 第二步:创建新追踪(未匹配的检测)
for (size_t i = 0; i < detections.size(); ++i) {
if (detection_matched[i]) continue;

Track new_track = CreateTrack(detections[i]);
tracks->emplace(new_track.id, new_track);
}
}

float OccupantTrackerCalculator::CalculateDistance(
const BoundingBox& bbox1,
const BoundingBox& bbox2) {

float center1_x = bbox1.x + bbox1.width / 2.0f;
float center1_y = bbox1.y + bbox1.height / 2.0f;

float center2_x = bbox2.x + bbox2.width / 2.0f;
float center2_y = bbox2.y + bbox2.height / 2.0f;

float dx = center1_x - center2_x;
float dy = center1_y - center2_y;

return std::sqrt(dx * dx + dy * dy);
}

void OccupantTrackerCalculator::UpdateTrack(
Track* track,
const Occupant& occupant) {

track->occupant = occupant;
track->hits++;
track->age = 0;

// 更新边界框历史
track->bbox_history.push_back(occupant.person_bbox);
while (track->bbox_history.size() > smooth_window_) {
track->bbox_history.pop_front();
}

// 平滑边界框
track->smooth_bbox.x = 0.0f;
track->smooth_bbox.y = 0.0f;
track->smooth_bbox.width = 0.0f;
track->smooth_bbox.height = 0.0f;

for (const auto& bbox : track->bbox_history) {
track->smooth_bbox.x += bbox.x;
track->smooth_bbox.y += bbox.y;
track->smooth_bbox.width += bbox.width;
track->smooth_bbox.height += bbox.height;
}

track->smooth_bbox.x /= track->bbox_history.size();
track->smooth_bbox.y /= track->bbox_history.size();
track->smooth_bbox.width /= track->bbox_history.size();
track->smooth_bbox.height /= track->bbox_history.size();

// 更新状态
if (track->state == STATE_TENTATIVE && track->hits >= min_hits_) {
track->state = STATE_CONFIRMED;
}
}

Track OccupantTrackerCalculator::CreateTrack(const Occupant& occupant) {
Track track;
track.id = next_track_id_++;
track.occupant = occupant;
track.state = STATE_TENTATIVE;
track.hits = 1;
track.age = 0;
track.smooth_bbox = occupant.person_bbox;
track.bbox_history.push_back(occupant.person_bbox);

return track;
}

void OccupantTrackerCalculator::DeleteOldTracks(std::map<int, Track>* tracks) {
auto it = tracks->begin();
while (it != tracks->end()) {
if (it->second.state == STATE_DELETED) {
it = tracks->erase(it);
} else {
++it;
}
}
}

REGISTER_CALCULATOR(OccupantTrackerCalculator);

} // namespace mediapipe

七、测试与验证

7.1 单元测试

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
TEST(OccupantDetectorTest, DetectsAdultCorrectly) {
// 创建成人检测
Detection person;
auto* bbox = person.mutable_location_data()
->mutable_relative_bounding_box();
bbox->set_xmin(0.1f);
bbox->set_ymin(0.1f);
bbox->set_width(0.2f);
bbox->set_height(0.4f);
person.add_score(0.95f);

// 创建人脸检测
Detection face;
auto* face_bbox = face.mutable_location_data()
->mutable_relative_bounding_box();
face_bbox->set_xmin(0.15f);
face_bbox->set_ymin(0.12f);
face_bbox->set_width(0.1f); // 人脸宽度 10% 图像宽度
face_bbox->set_height(0.18f); // 人脸高度 18% 图像高度(成人)
face.add_score(0.98f);

// 检测乘员
std::vector<Detection> persons = {person};
std::vector<std::vector<Detection>> faces = {{face}};

std::vector<Occupant> occupants;
MatchFacesToPersons(persons, faces, &occupants);

ASSERT_EQ(occupants.size(), 1);
EXPECT_EQ(occupants[0].type, OCCUPANT_ADULT);
EXPECT_TRUE(occupants[0].has_face);
EXPECT_GT(occupants[0].confidence, 0.9f);
}

TEST(OccupantDetectorTest, DetectsChildCorrectly) {
// 创建儿童检测
Detection person;
auto* bbox = person.mutable_location_data()
->mutable_relative_bounding_box();
bbox->set_xmin(0.65f);
bbox->set_ymin(0.7f);
bbox->set_width(0.15f);
bbox->set_height(0.2f);
person.add_score(0.9f);

// 创建人脸检测(较小)
Detection face;
auto* face_bbox = face.mutable_location_data()
->mutable_relative_bounding_box();
face_bbox->set_xmin(0.68f);
face_bbox->set_ymin(0.72f);
face_bbox->set_width(0.06f);
face_bbox->set_height(0.12f); // 人脸高度 12% 图像高度(儿童)
face.add_score(0.95f);

// 检测乘员
std::vector<Detection> persons = {person};
std::vector<std::vector<Detection>> faces = {{face}};

std::vector<Occupant> occupants;
MatchFacesToPersons(persons, faces, &occupants);

ASSERT_EQ(occupants.size(), 1);
EXPECT_EQ(occupants[0].type, OCCUPANT_CHILD);
EXPECT_TRUE(occupants[0].has_face);
}

TEST(SeatPositionTest, MapsDriverSeatCorrectly) {
BoundingBox bbox;
bbox.x = 0.1f;
bbox.y = 0.2f;
bbox.width = 0.2f;
bbox.height = 0.3f;

SeatPosition seat = DetermineSeatPosition(bbox, image_width, image_height);

EXPECT_EQ(seat, SEAT_DRIVER);
}

TEST(SeatPositionTest, MapsRearRightCorrectly) {
BoundingBox bbox;
bbox.x = 0.7f;
bbox.y = 0.7f;
bbox.width = 0.15f;
bbox.height = 0.2f;

SeatPosition seat = DetermineSeatPosition(bbox, image_width, image_height);

EXPECT_EQ(seat, SEAT_REAR_RIGHT);
}

7.2 集成测试

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# 测试场景 1: 驾驶员和副驾
python3 test_occupant_detection.py --scenario driver_and_passenger

# 预期输出:
# Occupant Count: 2
# Occupant 0: type=ADULT, seat=DRIVER, confidence=0.92
# Occupant 1: type=ADULT, seat=FRONT_PASSENGER, confidence=0.88

# 测试场景 2: 三个成人
python3 test_occupant_detection.py --scenario three_adults

# 预期输出:
# Occupant Count: 3
# Occupant 0: type=ADULT, seat=DRIVER, confidence=0.91
# Occupant 1: type=ADULT, seat=FRONT_PASSENGER, confidence=0.89
# Occupant 2: type=ADULT, seat=REAR_RIGHT, confidence=0.85

# 测试场景 3: 儿童在后排
python3 test_occupant_detection.py --scenario child_in_rear

# 预期输出:
# Occupant Count: 3
# Occupant 0: type=ADULT, seat=DRIVER, confidence=0.93
# Occupant 1: type=ADULT, seat=FRONT_PASSENGER, confidence=0.87
# Occupant 2: type=CHILD, seat=REAR_RIGHT, confidence=0.95

# 性能测试
python3 benchmark_occupant_detection.py --resolution 1280x720 --fps 30

# 预期输出:
# Average Latency: 45.2 ms
# Throughput: 22.1 FPS
# CPU Usage: 45%

八、总结

组件 功能
Body Detection 人体检测(YOLOv8)
Face Detection 人脸检测(BlazeFace)
Occupant Detector 乘员分类(成人/儿童/婴儿)
Occupant Tracker 乘员追踪(ID 分配)
Seat Mapper 座位映射
CPD Alert 儿童遗留告警

系列进度: 46/55
更新时间: 2026-03-12


MediaPipe 系列 46:IMS OMS 架构——乘员检测流水线完整实现
https://dapalm.com/2026/03/12/MediaPipe系列46-IMS-OMS架构:乘员检测流水线/
作者
Mars
发布于
2026年3月12日
许可协议