Automotive Innovation ›› 2021, Vol. 4 ›› Issue (3): 338-349.doi: 10.1007/s42154-021-00142-4

• • 上一篇    

A Computer Graphics-Based Framework for 3D Pose Estimation of Pedestrians

Jisi Tang & Qing Zhou    

  1. State Key Laboratory of Automotive Safety and Energy, School of Vehicle and Mobility, Tsinghua University
  • 出版日期:2021-08-16 发布日期:2021-08-16

A Computer Graphics-Based Framework for 3D Pose Estimation of Pedestrians

Jisi Tang & Qing Zhou    

  1. State Key Laboratory of Automotive Safety and Energy, School of Vehicle and Mobility, Tsinghua University
  • Online:2021-08-16 Published:2021-08-16

摘要:

In pedestrian-to-vehicle collision accidents, adapting safety measures ahead of time based on actual pose of pedestrians is one of the core objectives for integrated safety. It can significantly enhance the performance of passive safety system when active safety maneuvers fail to avoid accidents. This study proposes a deep learning model to estimate 3D pose of pedestrians from images. Since conventional pedestrian image datasets do not have available pose features to work with, a computer graphics-based (CG) framework is established to train the system with synthetic images. Biofidelic 3D meshes of standing males are first transformed into several walking poses, and then rendered as images from multiple view angles. Subsequently, a matrix of 50 anthropometries, 10 gaits and 12 views is built, in total of 6000 images. A two-branch convolutional neural network (CNN) was trained on the synthetic dataset. The model can simultaneously predict 16 joint landmarks and 14 joint angles of pedestrian for each image with high accuracy. Mean errors of the predictions are 0.54 pixels and ??0.06°, respectively. Any specific pose can then be completely reconstructed from the outputs. Overall, the current study has established a CG-based pipeline to generate photorealistic images with desired features for the training; it demonstrates the feasibility of leveraging CNN to estimate the pose of a walking pedestrian from synthesized images. The proposed framework provides a starting point for vehicles to infer pedestrian poses and then adapt protection measures accordingly for imminent impact to minimize pedestrian injuries.

Abstract:

In pedestrian-to-vehicle collision accidents, adapting safety measures ahead of time based on actual pose of pedestrians is one of the core objectives for integrated safety. It can significantly enhance the performance of passive safety system when active safety maneuvers fail to avoid accidents. This study proposes a deep learning model to estimate 3D pose of pedestrians from images. Since conventional pedestrian image datasets do not have available pose features to work with, a computer graphics-based (CG) framework is established to train the system with synthetic images. Biofidelic 3D meshes of standing males are first transformed into several walking poses, and then rendered as images from multiple view angles. Subsequently, a matrix of 50 anthropometries, 10 gaits and 12 views is built, in total of 6000 images. A two-branch convolutional neural network (CNN) was trained on the synthetic dataset. The model can simultaneously predict 16 joint landmarks and 14 joint angles of pedestrian for each image with high accuracy. Mean errors of the predictions are 0.54 pixels and ??0.06°, respectively. Any specific pose can then be completely reconstructed from the outputs. Overall, the current study has established a CG-based pipeline to generate photorealistic images with desired features for the training; it demonstrates the feasibility of leveraging CNN to estimate the pose of a walking pedestrian from synthesized images. The proposed framework provides a starting point for vehicles to infer pedestrian poses and then adapt protection measures accordingly for imminent impact to minimize pedestrian injuries.