TrackingWorld is a novel approach for dense, world-centric 3D tracking from monocular videos. Our method estimates accurate camera poses and disentangles 3D trajectories of both static and dynamic components — not limited to a single foreground object. It supports dense tracking of nearly all pixels, enabling robust 3D scene understanding from monocular inputs.
Estimates accurate camera poses for consistent 3D world coordinate system anchoring.
Separates 3D motion for static background and dynamic foreground components.
Supports tracking of nearly all pixels, moving beyond sparse keypoints.
Figure: Overview of TrackingWorld Framework.
If you find TrackingWorld useful for your research or applications, please consider citing our paper:
@inproceedings{
lu2025trackingworld,
title={TrackingWorld: World-centric Monocular 3D Tracking of Almost All Pixels},
author={Jiahao Lu and Weitao Xiong and Jiacheng Deng and Peng Li and Tianyu Huang and Zhiyang Dou and Cheng Lin and Sai-Kit Yeung and Yuan Liu},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=vDV912fa3t}
}