MediaPipe 系列 46：IMS OMS 架构——乘员检测流水线完整实现

一、OMS 业务背景

1.1 为什么需要 OMS？

OMS (Occupant Monitoring System) 是 Euro NCAP 2025+ 的强制要求：

儿童安全： 后排儿童离车检测（CPD - Child Presence Detection）
安全带提醒： 检测乘客是否系安全带
气囊控制： 根据乘员位置和体型调整气囊部署策略
人数统计： 车辆过载检测、智能锁车提醒

1.2 Euro NCAP OMS 要求

测试场景	要求	检测阈值
前排空座椅	检测无人	准确率 ≥ 95%
前排有成人	检测乘员	准确率 ≥ 95%
后排有儿童	检测儿童	准确率 ≥ 90%
多人检测	统计人数	准确率 ≥ 85%
儿童遗留	车内遗留儿童	响应时间 < 30 秒

本篇聚焦乘员检测流水线——检测车内所有乘员的位置、数量和类型。

1.3 检测指标体系

┌─────────────────────────────────────────────────────────────┐
│                    乘员检测指标体系                           │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  一级指标（直接观测）                                        │
│  ├── Occupant Count：乘员数量                               │
│  ├── Occupant Position：乘员位置（前排/后排）                │
│  ├── Occupant Type：乘员类型（成人/儿童/婴儿）               │
│  └── Occupant Confidence：检测置信度                         │
│                                                             │
│  二级指标（人体分析）                                        │
│  ├── Face Bounding Box：人脸边界框                           │
│  ├── Body Bounding Box：人体边界框                          │
│  ├── Keypoints：关键点（肩、腰、膝盖）                       │
│  ├── Body Size：身体尺寸（身高/体重估算）                    │
│  └── Seat Position：座位位置                                │
│                                                             │
│  三级指标（融合决策）                                        │
│  ├── Seat Occupancy：座椅占用状态                           │
│  ├── Occupant ID：乘员追踪 ID                               │
│  ├── Presence Duration：停留时间                           │
│  └── CPD Alert：儿童遗留告警                                │
│                                                             │
└─────────────────────────────────────────────────────────────┘

二、乘员检测原理

2.1 检测流程

┌─────────────────────────────────────────────────────────────┐
│                    乘员检测流程                              │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   Step 1: 人体检测（粗定位）                                 │
│   ┌─────────────────────────────────┐                      │
│   │   ┌─────┐   ┌─────┐   ┌─────┐  │                      │
│   │   │  1  │   │  2  │   │  3  │  │                      │
│   │   └─────┘   └─────┘   └─────┘  │                      │
│   │   检测图像中所有人体             │                      │
│   └─────────────────────────────────┘                      │
│                  │                                          │
│                  ▼                                          │
│   Step 2: 人脸检测（精定位）                                 │
│   ┌─────────────────────────────────┐                      │
│   │   ┌─────┐   ┌─────┐   ┌─────┐  │                      │
│   │   │  ●  │   │  ●  │   │     │  │                      │
│   │   └─────┘   └─────┘   └─────┘  │                      │
│   │   检测每个人体中的人脸           │                      │
│   └─────────────────────────────────┘                      │
│                  │                                          │
│                  ▼                                          │
│   Step 3: 乘员分类（成人/儿童）                              │
│   ┌─────────────────────────────────┐                      │
│   │   ┌─────┐   ┌─────┐   ┌─────┐  │                      │
│   │   │ A   │   │ A   │   │ C   │  │                      │
│   │   (成人)   (成人)   (儿童)     │                      │
│   └─────────────────────────────────┘                      │
│                  │                                          │
│                  ▼                                          │
│   Step 4: 座位映射（位置判断）                               │
│   ┌─────────────────────────────────┐                      │
│   │   驾驶位   副驾   后左   后右     │                      │
│   │   ┌─────┐ ┌─────┐              │                      │
│   │   │  1  │ │  2  │              │                      │
│   │   └─────┘ └─────┘              │                      │
│   │   ┌─────┐ ┌─────┐              │                      │
│   │   │  3  │ │     │              │                      │
│   │   └─────┘ └─────┘              │                      │
│   └─────────────────────────────────┘                      │
│                                                             │
└─────────────────────────────────────────────────────────────┘

2.2 人体检测

使用 YOLOv8 或 SSD 进行人体检测：

┌─────────────────────────────────────────────────────────────┐
│                    人体检测模型                              │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   输入图像: 640×480 或 1280×720                             │
│                                                             │
│   ┌───────────────────────────────────────────┐           │
│   │ 人体检测输出                               │           │
│   │ [x, y, width, height, confidence] × N    │           │
│   └───────────────────────────────────────────┘           │
│                                                             │
│   模型选择：                                                │
│   - YOLOv8-nano: 速度最快，适合嵌入式                      │
│   - YOLOv8-small: 平衡速度和精度                           │
│   - MobileNetV3-SSD: 轻量级，适合移动端                    │
│                                                             │
│   检测阈值：                                                │
│   - confidence_threshold: 0.5                              │
│   - iou_threshold: 0.45 (NMS)                              │
│                                                             │
└─────────────────────────────────────────────────────────────┘

2.3 人脸检测

在人体边界框内检测人脸：

def detect_faces_in_person(person_bbox, image):
    """
    在人体边界框内检测人脸
    
    Args:
        person_bbox: 人体边界框 [x, y, width, height]
        image: 输入图像
    
    Returns:
        List[face_bbox]: 人脸边界框列表
    """
    # 裁剪人体区域
    x, y, w, h = person_bbox
    padding = 0.2  # 扩展 20%
    
    crop_x = max(0, int(x - padding * w))
    crop_y = max(0, int(y - padding * h))
    crop_w = min(image.shape[1] - crop_x, int(w * (1 + 2 * padding)))
    crop_h = min(image.shape[0] - crop_y, int(h * (1 + 2 * padding)))
    
    person_roi = image[crop_y:crop_y+crop_h, crop_x:crop_x+crop_w]
    
    # 人脸检测
    face_detections = face_detector.detect(person_roi)
    
    # 转换回原图坐标
    faces = []
    for face in face_detections:
        face_bbox = [
            crop_x + face.x,
            crop_y + face.y,
            face.width,
            face.height
        ]
        faces.append(face_bbox)
    
    return faces

2.4 乘员类型分类

基于身体尺寸判断乘员类型：

┌─────────────────────────────────────────────────────────────┐
│                    乘员类型判断                              │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   特征提取：                                                │
│   ├── 人脸高度 (face_height)                               │
│   ├── 肩宽 (shoulder_width)                                │
│   ├── 头肩比例 (head_to_shoulder_ratio)                    │
│   └── 身体高度 (body_height)                               │
│                                                             │
│   分类规则：                                                │
│                                                             │
│   成人:                                                     │
│   ├── face_height > 0.15 * image_height                    │
│   ├── shoulder_width > 0.30 * image_width                  │
│   └── head_to_shoulder_ratio < 0.4                         │
│                                                             │
│   儿童 (6-12 岁):                                           │
│   ├── face_height: 0.10-0.15 * image_height                │
│   ├── shoulder_width: 0.20-0.30 * image_width              │
│   └── head_to_shoulder_ratio: 0.4-0.5                       │
│                                                             │
│   婴儿 (0-5 岁):                                            │
│   ├── face_height < 0.10 * image_height                    │
│   ├── shoulder_width < 0.20 * image_width                  │
│   └── head_to_shoulder_ratio > 0.5                         │
│                                                             │
│   座椅检测（可选）：                                        │
│   └── 检测儿童座椅 → 判定为儿童                             │
│                                                             │
└─────────────────────────────────────────────────────────────┘

三、座位映射逻辑

3.1 车内区域划分

┌─────────────────────────────────────────────────────────────┐
│                    车内座位布局                              │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│                    前挡风玻璃                                │
│                    ┌───────────────┐                        │
│                    │               │                        │
│      驾驶位         │               │       副驾位          │
│   (POSITION_0)      │               │   (POSITION_1)       │
│   ┌─────────┐      │               │      ┌─────────┐      │
│   │         │      │               │      │         │      │
│   │   ●     │      │               │      │   ●     │      │
│   │         │      │               │      │         │      │
│   └─────────┘      │               │      └─────────┘      │
│                    │               │                        │
│                    │               │                        │
│                    └───────────────┘                        │
│                                                             │
│      后排左侧          后排中间          后排右侧            │
│   (POSITION_2)     (POSITION_3)      (POSITION_4)          │
│   ┌─────────┐      ┌─────────┐      ┌─────────┐            │
│   │         │      │         │      │         │            │
│   │   ●     │      │   ●     │      │   ●     │            │
│   │         │      │         │      │         │            │
│   └─────────┘      └─────────┘      └─────────┘            │
│                                                             │
│   座位定义：                                                │
│   - POSITION_0: 驾驶位（左前）                              │
│   - POSITION_1: 副驾位（右前）                              │
│   - POSITION_2: 后排左侧                                    │
│   - POSITION_3: 后排中间                                    │
│   - POSITION_4: 后排右侧                                    │
│                                                             │
└─────────────────────────────────────────────────────────────┘

3.2 座位判断算法

enum SeatPosition {
  SEAT_DRIVER = 0,          // 驾驶位
  SEAT_FRONT_PASSENGER = 1, // 副驾位
  SEAT_REAR_LEFT = 2,       // 后排左侧
  SEAT_REAR_CENTER = 3,     // 后排中间
  SEAT_REAR_RIGHT = 4       // 后排右侧
};

struct SeatRegion {
  float x_min;  // 归一化 X 范围 [0, 1]
  float x_max;
  float y_min;  // 归一化 Y 范围 [0, 1]
  float y_max;
};

// 定义座位区域
std::map<SeatPosition, SeatRegion> kSeatRegions = {
  {SEAT_DRIVER,          {0.05f, 0.35f, 0.10f, 0.60f}},
  {SEAT_FRONT_PASSENGER, {0.65f, 0.95f, 0.10f, 0.60f}},
  {SEAT_REAR_LEFT,       {0.05f, 0.35f, 0.65f, 0.95f}},
  {SEAT_REAR_CENTER,     {0.35f, 0.65f, 0.65f, 0.95f}},
  {SEAT_REAR_RIGHT,      {0.65f, 0.95f, 0.65f, 0.95f}}
};

SeatPosition DetermineSeatPosition(
    const BoundingBox& bbox,
    const CameraConfig& config) {
  
  // 计算边界框中心
  float center_x = bbox.x + bbox.width / 2.0f;
  float center_y = bbox.y + bbox.height / 2.0f;
  
  // 归一化坐标
  float normalized_x = center_x / config.image_width;
  float normalized_y = center_y / config.image_height;
  
  // 遍历所有座位区域
  for (const auto& [seat, region] : kSeatRegions) {
    if (normalized_x >= region.x_min &&
        normalized_x <= region.x_max &&
        normalized_y >= region.y_min &&
        normalized_y <= region.y_max) {
      return seat;
    }
  }
  
  // 默认返回未知
  return SEAT_DRIVER;  // 默认归为驾驶位
}

3.3 多摄像头融合

┌─────────────────────────────────────────────────────────────┐
│                    多摄像头布局                              │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   Camera 1 (OMS Front)   Camera 2 (OMS Rear)               │
│   覆盖前排座位            覆盖后排座位                        │
│                                                             │
│   ┌─────────────┐         ┌─────────────┐                  │
│   │   ┌───┐     │         │    ┌───┐    │                  │
│   │   │ ● │     │         │    │ ● │    │                  │
│   │   └───┘     │         │    └───┘    │                  │
│   │   ┌───┐     │         │    ┌───┐    │                  │
│   │   │ ● │     │         │    │ ● │    │                  │
│   │   └───┘     │         │    └───┘    │                  │
│   └─────────────┘         └─────────────┘                  │
│                                                             │
│   融合策略：                                                │
│   1. 每个 Camera 独立检测                                  │
│   2. 合并所有检测结果                                       │
│   3. 去重（基于 IOU）                                       │
│   4. 分配唯一 Occupant ID                                  │
│                                                             │
└─────────────────────────────────────────────────────────────┘

四、完整流水线架构

4.1 架构图

┌─────────────────────────────────────────────────────────────────────────┐
│                    IMS OMS 乘员检测完整流水线                             │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  输入层                                                                 │
│  ┌─────────────┐                                                       │
│  │ OMS Camera  │ → 1280×720 @ 30fps (广角，覆盖全车)                  │
│  └─────────────┘                                                       │
│         │                                                               │
│         ▼                                                               │
│  检测层                                                                 │
│  ┌─────────────┐     ┌─────────────┐                                  │
│  │Body         │────▶│Face         │                                  │
│  │Detection    │     │Detection    │                                  │
│  │(YOLOv8)     │     │(BlazeFace)  │                                  │
│  └─────────────┘     └─────────────┘                                  │
│         │                   │                                          │
│         ▼                   ▼                                          │
│  ┌─────────────┐     ┌─────────────┐                                  │
│  │Person       │     │Face         │                                  │
│  │BBoxes       │     │BBoxes       │                                  │
│  │[x,y,w,h]×N  │     │[x,y,w,h]×M  │                                  │
│  └─────────────┘     └─────────────┘                                  │
│                                                                         │
│  匹配层                                                                 │
│  ┌─────────────────────────────────────────────┐                      │
│  │        Face-Person Matcher                   │                      │
│  │  - 将人脸关联到人体                          │                      │
│  │  - 基于包含关系                              │                      │
│  │  - 输出: PersonWithFace                      │                      │
│  └─────────────────────────────────────────────┘                      │
│                          │                                             │
│                          ▼                                             │
│  ┌─────────────────────────────────────────────┐                      │
│  │     Occupant (person + face)               │                      │
│  │     [person_bbox, face_bbox, confidence]    │                      │
│  └─────────────────────────────────────────────┘                      │
│                                                                         │
│  分析层                                                                 │
│  ┌─────────────┐     ┌─────────────┐     ┌─────────────┐              │
│  │Occupant     │────▶│Seat         │────▶│Occupant     │              │
│  │Type         │     │Position     │     │Tracker      │              │
│  │Classifier   │     │Mapper       │     │(ID分配)     │              │
│  └─────────────┘     └─────────────┘     └─────────────┘              │
│         │                   │                   │                      │
│         ▼                   ▼                   ▼                      │
│  ┌─────────────┐     ┌─────────────┐     ┌─────────────┐              │
│  │Type         │     │Seat ID      │     │Occupant ID  │              │
│  │(Adult/Child)│     │(0-4)        │     │(unique)     │              │
│  └─────────────┘     └─────────────┘     └─────────────┘              │
│                                                                         │
│  融合层                                                                 │
│  ┌─────────────────────────────────────────────────────┐              │
│  │           Occupant Aggregator                       │              │
│  │  - 统计人数                                         │              │
│  │  - 座椅占用状态                                     │              │
│  │  - 儿童检测告警                                     │              │
│  └─────────────────────────────────────────────────────┘              │
│                          │                                             │
│                          ▼                                             │
│  输出层                   │                                             │
│  ┌─────────────────────────────────────────────────────────┐          │
│  │ OccupancyResult {                                      │          │
│  │   occupant_count: 3,                                    │          │
│  │   occupants: [                                          │          │
│  │     {id: 1, seat: 0, type: ADULT, confidence: 0.92},    │          │
│  │     {id: 2, seat: 1, type: ADULT, confidence: 0.88},    │          │
│  │     {id: 3, seat: 4, type: CHILD, confidence: 0.95}    │          │
│  │   ],                                                    │          │
│  │   seat_occupancy: [true, true, false, false, true],     │          │
│  │   child_detected: true,                                │          │
│  │   cpd_alert: false                                     │          │
│  │ }                                                       │          │
│  └─────────────────────────────────────────────────────────┘          │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

五、完整 Graph 配置

# mediapipe/graphs/ims/oms_occupant_graph.pbtxt

# ============== 输入输出定义 ==============
input_stream: "OMS_IMAGE:oms_image"
input_stream: "TIMESTAMP:timestamp"

output_stream: "OCCUPANCY_RESULT:occupancy_result"
output_stream: "ALERT:alert"

# ============== 1. 人体检测 ==============
node {
  calculator: "ObjectDetectionCalculator"
  input_stream: "IMAGE:oms_image"
  output_stream: "DETECTIONS:person_detections"
  options {
    [mediapipe.ObjectDetectionOptions.ext] {
      model_path: "/models/yolov8n_person.tflite"
      label_map: "person"
      score_threshold: 0.5
      iou_threshold: 0.45
      max_detections: 10
    }
  }
}

# ============== 2. 人脸检测（在人体区域内） ==============
node {
  calculator: "MultiRegionFaceDetectionCalculator"
  input_stream: "IMAGE:oms_image"
  input_stream: "PERSON_DETECTIONS:person_detections"
  output_stream: "FACE_DETECTIONS:face_detections"
  output_stream: "PERSON_FACES:person_faces"
  options {
    [mediapipe.MultiRegionFaceDetectionOptions.ext] {
      model_path: "/models/blazeface.tflite"
      score_threshold: 0.5
      max_detections_per_person: 2
    }
  }
}

# ============== 3. 乘员检测器 ==============
node {
  calculator: "OccupantDetectorCalculator"
  input_stream: "PERSON_DETECTIONS:person_detections"
  input_stream: "PERSON_FACES:person_faces"
  input_stream: "IMAGE:oms_image"
  output_stream: "OCCUPANTS:occupants"
  output_stream: "OCCUPANT_COUNT:occupant_count"
  options {
    [mediapipe.OccupantDetectorOptions.ext] {
      # 座位配置
      image_width: 1280
      image_height: 720
      
      # 分类阈值
      adult_min_face_height_ratio: 0.15
      child_min_face_height_ratio: 0.10
      child_max_face_height_ratio: 0.15
      infant_max_face_height_ratio: 0.10
      
      # 置信度阈值
      min_person_confidence: 0.5
      min_face_confidence: 0.5
      min_occupant_confidence: 0.6
    }
  }
}

# ============== 4. 乘员追踪 ==============
node {
  calculator: "OccupantTrackerCalculator"
  input_stream: "OCCUPANTS:occupants"
  input_stream: "TIMESTAMP:timestamp"
  output_stream: "TRACKED_OCCUPANTS:tracked_occupants"
  options {
    [mediapipe.OccupantTrackerOptions.ext] {
      # 追踪配置
      max_distance: 0.2  # 最大位移距离（归一化）
      max_age: 30        # 最大丢失帧数
      min_hits: 3        # 最小确认帧数
    }
  }
}

# ============== 5. 座椅占用状态 ==============
node {
  calculator: "SeatOccupancyCalculator"
  input_stream: "TRACKED_OCCUPANTS:tracked_occupants"
  output_stream: "SEAT_OCCUPANCY:seat_occupancy"
  output_stream: "OCCUPANT_BY_SEAT:occupant_by_seat"
  options {
    [mediapipe.SeatOccupancyOptions.ext] {
      # 座位区域（归一化坐标）
      seat_0_x_min: 0.05
      seat_0_x_max: 0.35
      seat_0_y_min: 0.10
      seat_0_y_max: 0.60
      
      seat_1_x_min: 0.65
      seat_1_x_max: 0.95
      seat_1_y_min: 0.10
      seat_1_y_max: 0.60
      
      seat_2_x_min: 0.05
      seat_2_x_max: 0.35
      seat_2_y_min: 0.65
      seat_2_y_max: 0.95
      
      seat_3_x_min: 0.35
      seat_3_x_max: 0.65
      seat_3_y_min: 0.65
      seat_3_y_max: 0.95
      
      seat_4_x_min: 0.65
      seat_4_x_max: 0.95
      seat_4_y_min: 0.65
      seat_4_y_max: 0.95
    }
  }
}

# ============== 6. 儿童检测告警 ==============
node {
  calculator: "ChildPresenceDetectionCalculator"
  input_stream: "TRACKED_OCCUPANTS:tracked_occupants"
  input_stream: "TIMESTAMP:timestamp"
  output_stream: "CPD_ALERT:cpd_alert"
  options {
    [mediapipe.ChildPresenceDetectionOptions.ext] {
      # 告警配置
      alert_threshold_seconds: 30  # 儿童单独在车内 30 秒告警
      require_no_adult: true        # 要求无成人
    }
  }
}

# ============== 7. 结果汇总 ==============
node {
  calculator: "OccupancyResultAggregatorCalculator"
  input_stream: "TRACKED_OCCUPANTS:tracked_occupants"
  input_stream: "OCCUPANT_COUNT:occupant_count"
  input_stream: "SEAT_OCCUPANCY:seat_occupancy"
  input_stream: "CPD_ALERT:cpd_alert"
  output_stream: "OCCUPANCY_RESULT:occupancy_result"
  output_stream: "ALERT:alert"
}

六、核心 Calculator 实现

6.1 OccupantDetectorCalculator

// mediapipe/calculators/ims/occupant_detector_calculator.h
#ifndef MEDIAPIPE_CALCULATORS_IMS_OCCUPANT_DETECTOR_CALCULATOR_H_
#define MEDIAPIPE_CALCULATORS_IMS_OCCUPANT_DETECTOR_CALCULATOR_H_

#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/formats/detection.pb.h"
#include "mediapipe/framework/formats/image_frame.h"
#include "mediapipe/calculators/ims/occupant_detector_options.pb.h"

namespace mediapipe {

// 乘员类型
enum OccupantType {
  OCCUPANT_ADULT = 0,   // 成人
  OCCUPANT_CHILD = 1,   // 儿童 (6-12 岁)
  OCCUPANT_INFANT = 2,  // 婴儿 (0-5 岁)
  OCCUPANT_UNKNOWN = 3  // 未知
};

// 座位位置
enum SeatPosition {
  SEAT_DRIVER = 0,
  SEAT_FRONT_PASSENGER = 1,
  SEAT_REAR_LEFT = 2,
  SEAT_REAR_CENTER = 3,
  SEAT_REAR_RIGHT = 4,
  SEAT_UNKNOWN = 5
};

// 乘员信息
struct Occupant {
  int id;                       // 乘员 ID
  OccupantType type;            // 乘员类型
  SeatPosition seat;            // 座位位置
  BoundingBox person_bbox;       // 人体边界框
  BoundingBox face_bbox;         // 人脸边界框
  float confidence;             // 置信度
  float face_height_ratio;      // 人脸高度比例
  float body_height_ratio;      // 身体高度比例
  bool has_face;                // 是否检测到人脸
  int64_t timestamp;
};

class OccupantDetectorCalculator : public CalculatorBase {
 public:
  static absl::Status GetContract(CalculatorContract* cc);
  
  absl::Status Open(CalculatorContext* cc) override;
  absl::Status Process(CalculatorContext* cc) override;

 private:
  // 匹配人脸到人体
  void MatchFacesToPersons(
      const std::vector<Detection>& person_detections,
      const std::vector<std::vector<Detection>>& person_faces,
      std::vector<Occupant>* occupants);
  
  // 分类乘员类型
  OccupantType ClassifyOccupantType(
      const Occupant& occupant,
      const ImageFrame& image);
  
  // 判断座位位置
  SeatPosition DetermineSeatPosition(
      const BoundingBox& bbox,
      const ImageFrame& image);
  
  // 计算特征
  void CalculateFeatures(
      const BoundingBox& person_bbox,
      const BoundingBox& face_bbox,
      const ImageFrame& image,
      Occupant* occupant);
  
  // 配置
  int image_width_;
  int image_height_;
  float adult_min_face_height_ratio_;
  float child_min_face_height_ratio_;
  float child_max_face_height_ratio_;
  float infant_max_face_height_ratio_;
  float min_person_confidence_;
  float min_face_confidence_;
  float min_occupant_confidence_;
  
  // ID 计数器
  int next_occupant_id_ = 1;
};

}  // namespace mediapipe

#endif  // MEDIAPIPE_CALCULATORS_IMS_OCCUPANT_DETECTOR_CALCULATOR_H_

// mediapipe/calculators/ims/occupant_detector_calculator.cc
#include "mediapipe/calculators/ims/occupant_detector_calculator.h"
#include "mediapipe/framework/port/logging.h"
#include "mediapipe/framework/formats/landmark.pb.h"

namespace mediapipe {

using mediapipe::Detection;
using mediapipe::ImageFrame;

absl::Status OccupantDetectorCalculator::GetContract(CalculatorContract* cc) {
  cc->Inputs().Tag("PERSON_DETECTIONS").Set<std::vector<Detection>>();
  cc->Inputs().Tag("PERSON_FACES").Set<std::vector<std::vector<Detection>>>();
  cc->Inputs().Tag("IMAGE").Set<ImageFrame>();
  
  cc->Outputs().Tag("OCCUPANTS").Set<std::vector<Occupant>>();
  cc->Outputs().Tag("OCCUPANT_COUNT").Set<int>();
  
  cc->Options<OccupantDetectorOptions>();
  
  return absl::OkStatus();
}

absl::Status OccupantDetectorCalculator::Open(CalculatorContext* cc) {
  const auto& options = cc->Options<OccupantDetectorOptions>();
  
  image_width_ = options.image_width();
  image_height_ = options.image_height();
  adult_min_face_height_ratio_ = options.adult_min_face_height_ratio();
  child_min_face_height_ratio_ = options.child_min_face_height_ratio();
  child_max_face_height_ratio_ = options.child_max_face_height_ratio();
  infant_max_face_height_ratio_ = options.infant_max_face_height_ratio();
  min_person_confidence_ = options.min_person_confidence();
  min_face_confidence_ = options.min_face_confidence();
  min_occupant_confidence_ = options.min_occupant_confidence();
  
  LOG(INFO) << "OccupantDetectorCalculator initialized";
  
  return absl::OkStatus();
}

absl::Status OccupantDetectorCalculator::Process(CalculatorContext* cc) {
  if (cc->Inputs().Tag("PERSON_DETECTIONS").IsEmpty() ||
      cc->Inputs().Tag("PERSON_FACES").IsEmpty() ||
      cc->Inputs().Tag("IMAGE").IsEmpty()) {
    return absl::OkStatus();
  }
  
  const auto& person_detections = 
      cc->Inputs().Tag("PERSON_DETECTIONS").Get<std::vector<Detection>>();
  
  const auto& person_faces = 
      cc->Inputs().Tag("PERSON_FACES").Get<std::vector<std::vector<Detection>>>();
  
  const auto& image = cc->Inputs().Tag("IMAGE").Get<ImageFrame>();
  
  // 匹配人脸到人体
  std::vector<Occupant> occupants;
  MatchFacesToPersons(person_detections, person_faces, &occupants);
  
  // 分析每个乘员
  for (auto& occupant : occupants) {
    // 计算特征
    if (occupant.has_face) {
      CalculateFeatures(occupant.person_bbox, occupant.face_bbox, image, &occupant);
      
      // 分类乘员类型
      occupant.type = ClassifyOccupantType(occupant, image);
    } else {
      // 无人脸，基于身体尺寸推断
      CalculateFeatures(occupant.person_bbox, {}, image, &occupant);
      occupant.type = OCCUPANT_UNKNOWN;
    }
    
    // 判断座位位置
    occupant.seat = DetermineSeatPosition(occupant.person_bbox, image);
    
    // 分配 ID
    occupant.id = next_occupant_id_++;
    occupant.timestamp = cc->InputTimestamp().Value();
    
    VLOG(1) << "Occupant " << occupant.id 
            << ": type=" << occupant.type
            << ", seat=" << occupant.seat
            << ", confidence=" << occupant.confidence;
  }
  
  // 过滤低置信度乘员
  std::vector<Occupant> filtered_occupants;
  for (const auto& occupant : occupants) {
    if (occupant.confidence >= min_occupant_confidence_) {
      filtered_occupants.push_back(occupant);
    }
  }
  
  // 输出
  cc->Outputs().Tag("OCCUPANTS").AddPacket(
      MakePacket<std::vector<Occupant>>(filtered_occupants)
          .At(cc->InputTimestamp()));
  
  cc->Outputs().Tag("OCCUPANT_COUNT").AddPacket(
      MakePacket<int>(static_cast<int>(filtered_occupants.size()))
          .At(cc->InputTimestamp()));
  
  return absl::OkStatus();
}

void OccupantDetectorCalculator::MatchFacesToPersons(
    const std::vector<Detection>& person_detections,
    const std::vector<std::vector<Detection>>& person_faces,
    std::vector<Occupant>* occupants) {
  
  // 确保人体和数量匹配
  int num_persons = person_detections.size();
  int num_person_faces = person_faces.size();
  int min_count = std::min(num_persons, num_person_faces);
  
  for (int i = 0; i < min_count; ++i) {
    const auto& person = person_detections[i];
    
    // 跳过低置信度人体
    if (person.score()[0] < min_person_confidence_) {
      continue;
    }
    
    Occupant occupant;
    occupant.person_bbox = {
      person.location_data().relative_bounding_box().xmin(),
      person.location_data().relative_bounding_box().ymin(),
      person.location_data().relative_bounding_box().width(),
      person.location_data().relative_bounding_box().height()
    };
    occupant.confidence = person.score()[0];
    occupant.has_face = false;
    
    // 匹配人脸
    if (i < num_person_faces && !person_faces[i].empty()) {
      const auto& faces = person_faces[i];
      
      // 选择最佳人脸（置信度最高）
      const auto* best_face = &faces[0];
      float best_score = faces[0].score()[0];
      
      for (const auto& face : faces) {
        if (face.score()[0] > best_score) {
          best_score = face.score()[0];
          best_face = &face;
        }
      }
      
      if (best_score >= min_face_confidence_) {
        occupant.face_bbox = {
          best_face->location_data().relative_bounding_box().xmin(),
          best_face->location_data().relative_bounding_box().ymin(),
          best_face->location_data().relative_bounding_box().width(),
          best_face->location_data().relative_bounding_box().height()
        };
        occupant.has_face = true;
      }
    }
    
    occupants->push_back(occupant);
  }
}

void OccupantDetectorCalculator::CalculateFeatures(
    const BoundingBox& person_bbox,
    const BoundingBox& face_bbox,
    const ImageFrame& image,
    Occupant* occupant) {
  
  // 身体特征
  occupant->body_height_ratio = person_bbox.height;
  
  // 人脸特征
  if (occupant->has_face) {
    occupant->face_height_ratio = face_bbox.height;
    
    // 确保人脸在人体内
    float face_center_x = face_bbox.x + face_bbox.width / 2.0f;
    float face_center_y = face_bbox.y + face_bbox.height / 2.0f;
    
    float person_min_x = person_bbox.x;
    float person_max_x = person_bbox.x + person_bbox.width;
    float person_min_y = person_bbox.y;
    float person_max_y = person_bbox.y + person_bbox.height;
    
    // 检查人脸是否在人体区域内
    if (face_center_x >= person_min_x && face_center_x <= person_max_x &&
        face_center_y >= person_min_y && face_center_y <= person_max_y) {
      // 人脸在人体内，增加置信度
      occupant->confidence *= 1.1f;
    }
  }
}

OccupantType OccupantDetectorCalculator::ClassifyOccupantType(
    const Occupant& occupant,
    const ImageFrame& image) {
  
  if (!occupant.has_face) {
    return OCCUPANT_UNKNOWN;
  }
  
  float face_ratio = occupant.face_height_ratio;
  
  // 婴儿 (0-5 岁): 人脸高度比例 < 10%
  if (face_ratio < infant_max_face_height_ratio_) {
    return OCCUPANT_INFANT;
  }
  
  // 儿童 (6-12 岁): 10% <= 人脸高度比例 < 15%
  if (face_ratio >= child_min_face_height_ratio_ &&
      face_ratio < child_max_face_height_ratio_) {
    return OCCUPANT_CHILD;
  }
  
  // 成人: 人脸高度比例 >= 15%
  if (face_ratio >= adult_min_face_height_ratio_) {
    return OCCUPANT_ADULT;
  }
  
  return OCCUPANT_UNKNOWN;
}

SeatPosition OccupantDetectorCalculator::DetermineSeatPosition(
    const BoundingBox& bbox,
    const ImageFrame& image) {
  
  // 计算边界框中心
  float center_x = bbox.x + bbox.width / 2.0f;
  float center_y = bbox.y + bbox.height / 2.0f;
  
  // 归一化坐标
  float normalized_x = center_x / image_width_;
  float normalized_y = center_y / image_height_;
  
  // 驾驶位 (左前)
  if (normalized_x >= 0.05f && normalized_x <= 0.35f &&
      normalized_y >= 0.10f && normalized_y <= 0.60f) {
    return SEAT_DRIVER;
  }
  
  // 副驾位 (右前)
  if (normalized_x >= 0.65f && normalized_x <= 0.95f &&
      normalized_y >= 0.10f && normalized_y <= 0.60f) {
    return SEAT_FRONT_PASSENGER;
  }
  
  // 后排左侧
  if (normalized_x >= 0.05f && normalized_x <= 0.35f &&
      normalized_y >= 0.65f && normalized_y <= 0.95f) {
    return SEAT_REAR_LEFT;
  }
  
  // 后排中间
  if (normalized_x >= 0.35f && normalized_x <= 0.65f &&
      normalized_y >= 0.65f && normalized_y <= 0.95f) {
    return SEAT_REAR_CENTER;
  }
  
  // 后排右侧
  if (normalized_x >= 0.65f && normalized_x <= 0.95f &&
      normalized_y >= 0.65f && normalized_y <= 0.95f) {
    return SEAT_REAR_RIGHT;
  }
  
  return SEAT_UNKNOWN;
}

REGISTER_CALCULATOR(OccupantDetectorCalculator);

}  // namespace mediapipe

6.2 OccupantTrackerCalculator

// mediapipe/calculators/ims/occupant_tracker_calculator.h
#ifndef MEDIAPIPE_CALCULATORS_IMS_OCCUPANT_TRACKER_CALCULATOR_H_
#define MEDIAPIPE_CALCULATORS_IMS_OCCUPANT_TRACKER_CALCULATOR_H_

#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/calculators/ims/occupant_detector_calculator.h"
#include <map>
#include <deque>

namespace mediapipe {

// 追踪状态
enum TrackState {
  STATE_TENTATIVE = 0,  // 临时（需要更多帧确认）
  STATE_CONFIRMED = 1,  // 已确认
  STATE_DELETED = 2     // 已删除
};

// 追踪对象
struct Track {
  int id;                          // 追踪 ID
  Occupant occupant;                // 乘员信息
  TrackState state;                 // 追踪状态
  int hits;                         // 命中帧数
  int age;                          // 丢失帧数
  BoundingBox smooth_bbox;          // 平滑后的边界框
  std::deque<BoundingBox> bbox_history;  // 历史边界框
};

class OccupantTrackerCalculator : public CalculatorBase {
 public:
  static absl::Status GetContract(CalculatorContract* cc);
  
  absl::Status Open(CalculatorContext* cc) override;
  absl::Status Process(CalculatorContext* cc) override;

 private:
  // 匹配检测到追踪
  void MatchDetectionsToTracks(
      const std::vector<Occupant>& detections,
      std::map<int, Track>* tracks);
  
  // 计算距离
  float CalculateDistance(
      const BoundingBox& bbox1,
      const BoundingBox& bbox2);
  
  // 更新追踪
  void UpdateTrack(Track* track, const Occupant& occupant);
  
  // 创建新追踪
  Track CreateTrack(const Occupant& occupant);
  
  // 删除旧追踪
  void DeleteOldTracks(std::map<int, Track>* tracks);
  
  // 配置
  float max_distance_;      // 最大匹配距离
  int max_age_;             // 最大丢失帧数
  int min_hits_;            // 最小确认帧数
  int smooth_window_;       // 平滑窗口大小
  
  // 追踪状态
  std::map<int, Track> tracks_;
  int next_track_id_ = 1;
};

}  // namespace mediapipe

#endif  // MEDIAPIPE_CALCULATORS_IMS_OCCUPANT_TRACKER_CALCULATOR_H_

// mediapipe/calculators/ims/occupant_tracker_calculator.cc
#include "mediapipe/calculators/ims/occupant_tracker_calculator.h"
#include "mediapipe/framework/port/logging.h"
#include <limits>

namespace mediapipe {

absl::Status OccupantTrackerCalculator::GetContract(CalculatorContract* cc) {
  cc->Inputs().Tag("OCCUPANTS").Set<std::vector<Occupant>>();
  cc->Inputs().Tag("TIMESTAMP").Set<int64_t>();
  
  cc->Outputs().Tag("TRACKED_OCCUPANTS").Set<std::vector<Occupant>>();
  
  cc->Options<OccupantTrackerOptions>();
  
  return absl::OkStatus();
}

absl::Status OccupantTrackerCalculator::Open(CalculatorContext* cc) {
  const auto& options = cc->Options<OccupantTrackerOptions>();
  
  max_distance_ = options.max_distance();
  max_age_ = options.max_age();
  min_hits_ = options.min_hits();
  smooth_window_ = 5;
  
  LOG(INFO) << "OccupantTrackerCalculator initialized";
  
  return absl::OkStatus();
}

absl::Status OccupantTrackerCalculator::Process(CalculatorContext* cc) {
  if (cc->Inputs().Tag("OCCUPANTS").IsEmpty()) {
    return absl::OkStatus();
  }
  
  const auto& detections = cc->Inputs().Tag("OCCUPANTS").Get<std::vector<Occupant>>();
  
  // 匹配检测到追踪
  MatchDetectionsToTracks(detections, &tracks_);
  
  // 删除旧追踪
  DeleteOldTracks(&tracks_);
  
  // 提取确认的追踪
  std::vector<Occupant> tracked_occupants;
  for (const auto& [id, track] : tracks_) {
    if (track.state == STATE_CONFIRMED) {
      Occupant occupant = track.occupant;
      occupant.id = id;  // 使用追踪 ID
      tracked_occupants.push_back(occupant);
    }
  }
  
  // 输出
  cc->Outputs().Tag("TRACKED_OCCUPANTS").AddPacket(
      MakePacket<std::vector<Occupant>>(tracked_occupants)
          .At(cc->InputTimestamp()));
  
  VLOG(1) << "Tracked " << tracked_occupants.size() << " occupants";
  
  return absl::OkStatus();
}

void OccupantTrackerCalculator::MatchDetectionsToTracks(
    const std::vector<Occupant>& detections,
    std::map<int, Track>* tracks) {
  
  std::vector<bool> detection_matched(detections.size(), false);
  
  // 第一步：匹配已确认的追踪
  for (auto& [id, track] : *tracks) {
    if (track.state != STATE_CONFIRMED) continue;
    
    float best_distance = max_distance_;
    int best_idx = -1;
    
    for (size_t i = 0; i < detections.size(); ++i) {
      if (detection_matched[i]) continue;
      
      float distance = CalculateDistance(
          track.occupant.person_bbox,
          detections[i].person_bbox);
      
      if (distance < best_distance) {
        best_distance = distance;
        best_idx = i;
      }
    }
    
    if (best_idx >= 0) {
      UpdateTrack(&track, detections[best_idx]);
      detection_matched[best_idx] = true;
    } else {
      // 未匹配，增加丢失帧数
      track.age++;
      if (track.age > max_age_) {
        track.state = STATE_DELETED;
      }
    }
  }
  
  // 第二步：创建新追踪（未匹配的检测）
  for (size_t i = 0; i < detections.size(); ++i) {
    if (detection_matched[i]) continue;
    
    Track new_track = CreateTrack(detections[i]);
    tracks->emplace(new_track.id, new_track);
  }
}

float OccupantTrackerCalculator::CalculateDistance(
    const BoundingBox& bbox1,
    const BoundingBox& bbox2) {
  
  float center1_x = bbox1.x + bbox1.width / 2.0f;
  float center1_y = bbox1.y + bbox1.height / 2.0f;
  
  float center2_x = bbox2.x + bbox2.width / 2.0f;
  float center2_y = bbox2.y + bbox2.height / 2.0f;
  
  float dx = center1_x - center2_x;
  float dy = center1_y - center2_y;
  
  return std::sqrt(dx * dx + dy * dy);
}

void OccupantTrackerCalculator::UpdateTrack(
    Track* track,
    const Occupant& occupant) {
  
  track->occupant = occupant;
  track->hits++;
  track->age = 0;
  
  // 更新边界框历史
  track->bbox_history.push_back(occupant.person_bbox);
  while (track->bbox_history.size() > smooth_window_) {
    track->bbox_history.pop_front();
  }
  
  // 平滑边界框
  track->smooth_bbox.x = 0.0f;
  track->smooth_bbox.y = 0.0f;
  track->smooth_bbox.width = 0.0f;
  track->smooth_bbox.height = 0.0f;
  
  for (const auto& bbox : track->bbox_history) {
    track->smooth_bbox.x += bbox.x;
    track->smooth_bbox.y += bbox.y;
    track->smooth_bbox.width += bbox.width;
    track->smooth_bbox.height += bbox.height;
  }
  
  track->smooth_bbox.x /= track->bbox_history.size();
  track->smooth_bbox.y /= track->bbox_history.size();
  track->smooth_bbox.width /= track->bbox_history.size();
  track->smooth_bbox.height /= track->bbox_history.size();
  
  // 更新状态
  if (track->state == STATE_TENTATIVE && track->hits >= min_hits_) {
    track->state = STATE_CONFIRMED;
  }
}

Track OccupantTrackerCalculator::CreateTrack(const Occupant& occupant) {
  Track track;
  track.id = next_track_id_++;
  track.occupant = occupant;
  track.state = STATE_TENTATIVE;
  track.hits = 1;
  track.age = 0;
  track.smooth_bbox = occupant.person_bbox;
  track.bbox_history.push_back(occupant.person_bbox);
  
  return track;
}

void OccupantTrackerCalculator::DeleteOldTracks(std::map<int, Track>* tracks) {
  auto it = tracks->begin();
  while (it != tracks->end()) {
    if (it->second.state == STATE_DELETED) {
      it = tracks->erase(it);
    } else {
      ++it;
    }
  }
}

REGISTER_CALCULATOR(OccupantTrackerCalculator);

}  // namespace mediapipe

七、测试与验证

7.1 单元测试

TEST(OccupantDetectorTest, DetectsAdultCorrectly) {
  // 创建成人检测
  Detection person;
  auto* bbox = person.mutable_location_data()
                  ->mutable_relative_bounding_box();
  bbox->set_xmin(0.1f);
  bbox->set_ymin(0.1f);
  bbox->set_width(0.2f);
  bbox->set_height(0.4f);
  person.add_score(0.95f);
  
  // 创建人脸检测
  Detection face;
  auto* face_bbox = face.mutable_location_data()
                       ->mutable_relative_bounding_box();
  face_bbox->set_xmin(0.15f);
  face_bbox->set_ymin(0.12f);
  face_bbox->set_width(0.1f);  // 人脸宽度 10% 图像宽度
  face_bbox->set_height(0.18f);  // 人脸高度 18% 图像高度（成人）
  face.add_score(0.98f);
  
  // 检测乘员
  std::vector<Detection> persons = {person};
  std::vector<std::vector<Detection>> faces = {{face}};
  
  std::vector<Occupant> occupants;
  MatchFacesToPersons(persons, faces, &occupants);
  
  ASSERT_EQ(occupants.size(), 1);
  EXPECT_EQ(occupants[0].type, OCCUPANT_ADULT);
  EXPECT_TRUE(occupants[0].has_face);
  EXPECT_GT(occupants[0].confidence, 0.9f);
}

TEST(OccupantDetectorTest, DetectsChildCorrectly) {
  // 创建儿童检测
  Detection person;
  auto* bbox = person.mutable_location_data()
                  ->mutable_relative_bounding_box();
  bbox->set_xmin(0.65f);
  bbox->set_ymin(0.7f);
  bbox->set_width(0.15f);
  bbox->set_height(0.2f);
  person.add_score(0.9f);
  
  // 创建人脸检测（较小）
  Detection face;
  auto* face_bbox = face.mutable_location_data()
                       ->mutable_relative_bounding_box();
  face_bbox->set_xmin(0.68f);
  face_bbox->set_ymin(0.72f);
  face_bbox->set_width(0.06f);
  face_bbox->set_height(0.12f);  // 人脸高度 12% 图像高度（儿童）
  face.add_score(0.95f);
  
  // 检测乘员
  std::vector<Detection> persons = {person};
  std::vector<std::vector<Detection>> faces = {{face}};
  
  std::vector<Occupant> occupants;
  MatchFacesToPersons(persons, faces, &occupants);
  
  ASSERT_EQ(occupants.size(), 1);
  EXPECT_EQ(occupants[0].type, OCCUPANT_CHILD);
  EXPECT_TRUE(occupants[0].has_face);
}

TEST(SeatPositionTest, MapsDriverSeatCorrectly) {
  BoundingBox bbox;
  bbox.x = 0.1f;
  bbox.y = 0.2f;
  bbox.width = 0.2f;
  bbox.height = 0.3f;
  
  SeatPosition seat = DetermineSeatPosition(bbox, image_width, image_height);
  
  EXPECT_EQ(seat, SEAT_DRIVER);
}

TEST(SeatPositionTest, MapsRearRightCorrectly) {
  BoundingBox bbox;
  bbox.x = 0.7f;
  bbox.y = 0.7f;
  bbox.width = 0.15f;
  bbox.height = 0.2f;
  
  SeatPosition seat = DetermineSeatPosition(bbox, image_width, image_height);
  
  EXPECT_EQ(seat, SEAT_REAR_RIGHT);
}

7.2 集成测试

# 测试场景 1: 驾驶员和副驾
python3 test_occupant_detection.py --scenario driver_and_passenger

# 预期输出:
# Occupant Count: 2
# Occupant 0: type=ADULT, seat=DRIVER, confidence=0.92
# Occupant 1: type=ADULT, seat=FRONT_PASSENGER, confidence=0.88

# 测试场景 2: 三个成人
python3 test_occupant_detection.py --scenario three_adults

# 预期输出:
# Occupant Count: 3
# Occupant 0: type=ADULT, seat=DRIVER, confidence=0.91
# Occupant 1: type=ADULT, seat=FRONT_PASSENGER, confidence=0.89
# Occupant 2: type=ADULT, seat=REAR_RIGHT, confidence=0.85

# 测试场景 3: 儿童在后排
python3 test_occupant_detection.py --scenario child_in_rear

# 预期输出:
# Occupant Count: 3
# Occupant 0: type=ADULT, seat=DRIVER, confidence=0.93
# Occupant 1: type=ADULT, seat=FRONT_PASSENGER, confidence=0.87
# Occupant 2: type=CHILD, seat=REAR_RIGHT, confidence=0.95

# 性能测试
python3 benchmark_occupant_detection.py --resolution 1280x720 --fps 30

# 预期输出:
# Average Latency: 45.2 ms
# Throughput: 22.1 FPS
# CPU Usage: 45%

八、总结

组件	功能
Body Detection	人体检测（YOLOv8）
Face Detection	人脸检测（BlazeFace）
Occupant Detector	乘员分类（成人/儿童/婴儿）
Occupant Tracker	乘员追踪（ID 分配）
Seat Mapper	座位映射
CPD Alert	儿童遗留告警

系列进度： 46/55
更新时间： 2026-03-12

MediaPipe 系列 > IMS 实战

#Euro NCAP #IMS #OMS #MediaPipe #乘员检测 #座椅占用

MediaPipe 系列 46：IMS OMS 架构——乘员检测流水线完整实现

https://dapalm.com/2026/03/12/MediaPipe系列46-IMS-OMS架构：乘员检测流水线/

作者

Mars

发布于

2026年3月12日

许可协议

MediaPipe 系列 42：IMS DMS 架构——分心检测流水线完整实现上一篇

MediaPipe 系列 11：自定义 Calculator 第一步——从零到运行下一篇