三维手部姿态估计与重建

三维手部姿态估计与重建在计算机视觉与人机交互领域具有重要意义。精准且高效的姿态估计能够为虚拟现实、增强现实、智能驾驶、机器人交互等应用提供自然、直观的交互方式。例如，在虚拟现实场景中，通过实时准确地捕捉手部姿态，用户可以自然地与虚拟环境进行交互，如抓取、操作虚拟物体，提升沉浸感。在机器人领域，准确的手部姿态估计可助力机器人更好地理解人类手势指令，实现高效的人机协作。此外，三维手部重建能够为医学、动画制作等领域提供高质量的三维手部模型，辅助医学诊断、动画角色动作生成等。我们研究主要致力于提高基于RGB模态和Depth模态的三维手部姿态估计算法的精度、速度和鲁棒性。

基于RGB图的三维手部姿态估计

Click here for details

Demo

A3-Net: Calibration-Free Multi-View 3D Hand Reconstruction for Enhanced Musical Instrument Learning

Geng Chen, Xufeng Jian*, Yuchen Chen, Pengfei Ren†, Jingyu Wang†, Haifeng Sun, Qi Qi, Jing Wang, Jianxin Liao

IJCAI

2025

ProjectPage: A3-Net: Calibration-Free Multi-View 3D Hand Reconstruction for Enhanced Musical Instrument Learning
Pose-Guided Temporal Enhancement for Robust Low-Resolution Hand Reconstruction

Kaixin Fan, Pengfei Ren*, Jingyu Wang† , Haifeng Sun, Qi Qi, Zirui Zhuang, Jianxin Liao

CVPR

2025

PDF

ProjectPage: Pose-Guided Temporal Enhancement for Robust Low-Resolution Hand Reconstruction
Region-Aware Dynamic Filtering Network for 3D Hand Reconstruction

aState Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications Paper Abstract 3D hand reconstruction from RGB image has attracted a lot of attention due to its crucial role in human-computer interaction. Nevertheless, it is still challenging to perform 3D hand reconstruction under conditions of hand-object interaction due to severe mutual […]

ECAI

2023

Oral

PDF

ProjectPage: Region-Aware Dynamic Filtering Network for 3D Hand Reconstruction
SMR: Spatial-Guided Model-Based Regression for 3D Hand Pose and Mesh Reconstruction

Paper Abstract 3D hand reconstruction is an important technique for human-computer interaction. Interactive experience depends on the accuracy, efficiency, and robustness of the algorithm. Therefore, in this paper, we first propose a balanced framework called spatial-aware regression (SAR) to achieve precise and fast reconstruction. SAR can bridge convolutional networks and graph-structure networks more effectively than […]

TCSVT

2023

PDF

ProjectPage: SMR: Spatial-Guided Model-Based Regression for 3D Hand Pose and Mesh Reconstruction
SAR: Spatial-Aware Regression for 3D Hand Pose and Mesh Reconstruction from a Monocular RGB Image

Paper Code Illustration of 3D hand reconstruction from a monocular RGB image input. From the camera input (left), we reconstruct 3D hand mesh (upper right) and 3D hand pose (lower right). Because of our good balance of accuracy and efficiency, our method has more potential for real-world applications in VR/AR scenarios. Abstract 3D hand reconstruction […]

ISMAR

2021

Code

PDF

ProjectPage: SAR: Spatial-Aware Regression for 3D Hand Pose and Mesh Reconstruction from a Monocular RGB Image

基于深度图的三维手部姿态估计

Click here for details

Keypoint Fusion for RGB-D Based 3D Hand Pose Estimation

State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications Paper Code Abstract Previous 3D hand pose estimation methods primarily rely on a single modality, either RGB or depth, and the comprehensive utilization of the dual modalities has not been extensively explored. RGB and depth data provide complementary information and thus […]

AAAI

2024

Code

PDF

ProjectPage: Keypoint Fusion for RGB-D Based 3D Hand Pose Estimation
Two Heads are Better than One: Image-Point Cloud Network for Depth-Based 3D Hand Pose Estimation

State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications Paper Code IPNet utilizes a 2D CNN for visual feature extraction and initial hand pose estimation. Then, IPNet obtains the initial point cloud features through a 2D-3D projection module. Finally, IPNet iteratively updates point features and refines hand pose in the […]

AAAI

2023

Distinguished Paper Award, Oral

Code

PDF

ProjectPage: Two Heads are Better than One: Image-Point Cloud Network for Depth-Based 3D Hand Pose Estimation
SA-Fusion: Multimodal Fusion Approach for Web-based Human-Computer Interaction in the Wild

1Beijing University of Posts and Telecommunications, 2State Key Laboratory of Networking and Switching Technology, 3China Mobile Research Institute Paper Abstract Web-based AR technology has broadened human-computer interaction scenes from traditional mechanical devices and flat screens to the real world, resulting in unconstrained environmental challenges such as complex backgrounds, extreme illumination, depth range differences, and hand-object […]

WWW

2023

PDF

ProjectPage: SA-Fusion: Multimodal Fusion Approach for Web-based Human-Computer Interaction in the Wild
Pose-Guided Hierarchical Graph Reasoning for 3-D Hand Pose Estimation From a Single Depth Image

Paper Due to the self-similarity of the fingers and severe self-occlusion, it is difficult to predict the correct joint position from local evidence (as shown in Fig. a). By incorporating context information (as shown in Fig. b, where adjacent joints can be accurately predicted), forming enhanced feature maps through pose-guided hierarchical graph (PHG), ambiguity is […]

TCYB

2021

PDF

ProjectPage: Pose-Guided Hierarchical Graph Reasoning for 3-D Hand Pose Estimation From a Single Depth Image
Spatial-Aware Stacked Regression Network for Real-Time 3D Hand Pose Estimation

Paper Refer to SRN for more details. Bibtex

Neurocomputing

2021

PDF

ProjectPage: Spatial-Aware Stacked Regression Network for Real-Time 3D Hand Pose Estimation
AWR: Adaptive Weighting Regression for 3D Hand Pose Estimation

1State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, 2EBUPT Information Technology Co., Ltd. Paper Code We introduce adaptive weighting regression (AWR) method. The weight distribution in weight maps can be adjusted adaptively to achieve more accurate and robust performance under the guidance of joint supervision. Top row: When the […]

AAAI

2020

Code

PDF

ProjectPage: AWR: Adaptive Weighting Regression for 3D Hand Pose Estimation
SRN: Stacked Regression Network for Real-time 3D Hand Pose Estimation

Paper Code Normal hand Small hand Demos above are realtime results from Kinect V2 using models trained on Hands17 dataset (Intel Realsense SR300). Abstract Recently, most of state-of-the-art methods are based on 3D input data, because 3D data capture more spatial information than the depth image. However, these methods either require a complex network structure or time-consuming […]

BMVC

2019

Code

PDF

ProjectPage: SRN: Stacked Regression Network for Real-time 3D Hand Pose Estimation

自监督三维手部重建

强监督三维手部姿态估计方法在复杂真实环境下泛化能力有限，常面临时序抖动和精度下降问题。多样性高质量标注手部数据的缺失是导致现有模型泛化能力差的核心原因。自监督三维手部重建正旨在通过利用海量无标注数据自身蕴含的结构信息，构造伪标签和预测任务，使模型在无需人工标注的条件下自主学习鲁棒手部特征，从而有效缓解过拟合问题。我们的工作借助多视角自监督策略，构建跨视角共享的一致性潜在表征空间，实现知识迁移与互补优势的充分挖掘，不仅提升了对手部自遮挡、复杂姿态的识别能力，增强了系统在真实场景下的鲁棒性和精确度。我们的研究在无需任何标注数据的情况下，首次实现了10mm三维手部重建误差。

Click here for details

Rule Meets Learning: Confidence-Aware Multi-View Fusion for Self-Supervised 3D Hand Pose Estimation

MM

2025

ProjectPage: Rule Meets Learning: Confidence-Aware Multi-View Fusion for Self-Supervised 3D Hand Pose Estimation
HaMuCo: Hand Pose Estimation via Multiview Collaborative Self-Supervised Learning

1PICO IDL ByteDance, 2Beijing University of Posts and Telecommunications Paper arXiv Code Our method takes multi-view images with 2D pseudo labels for training.Our method can estimate accurate 3D hand pose with single- or arbitrary multi-view images. Abstract Recent advancements in 3D hand pose estimation have shown promising results, but its effectiveness has primarily relied on the […]

ICCV

2023

Code

arXiv

PDF

ProjectPage: HaMuCo: Hand Pose Estimation via Multiview Collaborative Self-Supervised Learning
Mining Multi-View Information: A Strong Self-Supervised Framework for Depth-Based 3D Hand Pose and Mesh Estimation

Paper Code Abstract In this work, we study the cross-view information fusion problem in the task of self-supervised 3D hand pose estimation from the depth image. Previous methods usually adopt a hand-crafted rule to generate pseudo labels from multi-view estimations in order to supervise the network training in each view. However, these methods ignore the […]

CVPR

2022

Code

PDF

ProjectPage: Mining Multi-View Information: A Strong Self-Supervised Framework for Depth-Based 3D Hand Pose and Mesh Estimation
A Dual-Branch Self-Boosting Framework for Self-Supervised 3D Hand Pose Estimation

Paper Code For ICVL and MSRA, we show that DSF is able to generate more reasonable hand poses than annotations. We use red circles to locate errors in annotations. Overview Through image-to-image translation technology, our framework can make better use of synthetic data for pre-training. The dual-branch design allows our framework to adopt a part-aware […]

TIP

2022

Code

PDF

ProjectPage: A Dual-Branch Self-Boosting Framework for Self-Supervised 3D Hand Pose Estimation

双手重建

Click here for details

Decoupled Iterative Refinement Framework forInteracting Hands Reconstruction from a Single RGB Image

Pengfei Ren, Chao Wen, Xiaozheng Zheng, Zhou Xue, Haifeng Sun, Qi Qi, Jingyu Wang, Jianxin Liao

ICCV

2023

Oral

Code

arXiv

PDF

ProjectPage: Decoupled Iterative Refinement Framework forInteracting Hands Reconstruction from a Single RGB Image

手物协同重建

精准重建手部与物体交互过程，赋能虚拟现实、智能制造和机器人操作等。手物交互数据包含描述手部动态姿态和物体位姿的时空信息，比如交互序列数据。我们的研究方向主要集中在手物姿态估计、细粒度手物网格重建、照片级真实的手物渲染，旨在提升交互细节的还原度和物理合理性。研究平台已在多个场景的人机交互平台中落地应用。例如，在裸手交互系统中，无需额外传感器即可实现对手部和物体的精确三维重建，提升了交互的自然度和灵活性；在VR平台中，手物交互重建技术助力高沉浸式体验；此外，研究成果在真实场景渲染中也得到了广泛应用，通过高精度的手物建模与渲染技术，实现了交互过程的照片级还原，为影视制作、数字孪生和工业仿真等领域提供了核心技术支持。

Click here for details

Generalizable Hand-Object Modeling from Monocular RGB Images via 3D Gaussians

NeurIPS

2025

ProjectPage: Generalizable Hand-Object Modeling from Monocular RGB Images via 3D Gaussians
Coarse-to-Fine Implicit Representation Learning for 3D Hand-Object Reconstruction from a Single RGB-D Image

Xingyu Liu, Pengfei Ren*, Jingyu Wang, Haifeng Sun, Qi Qi, Zirui Zhuang, Jianxin Liao

ECCV

2024

Code

PDF

ProjectPage: Coarse-to-Fine Implicit Representation Learning for 3D Hand-Object Reconstruction from a Single RGB-D Image

时序姿态

Click here for details

Prior-aware Dynamic Temporal Modeling Framework for Sequential 3D Hand Pose Estimation

ICCV

2025

ProjectPage: Prior-aware Dynamic Temporal Modeling Framework for Sequential 3D Hand Pose Estimation