We mark missing regions in pink and use ObjFiller-3D to complete these areas.
3D inpainting often relies on multi-view 2D image inpainting, where the inherent inconsistencies across different inpainted views can result in blurred textures, spatial discontinuities, and distracting visual artifacts. These inconsistencies pose significant challenges when striving for accurate and realistic 3D object completion, particularly in applications that demand high fidelity and structural coherence. To overcome these limitations, we propose ObjFiller-3D, a novel method designed for the completion and editing of high-quality and consistent 3D objects. Instead of employing a conventional 2D image inpainting model, our approach leverages a curated selection of state-of-the-art video editing model to fill in the masked regions of 3D objects. We analyze the representation gap between 3D and videos, and propose an adaptation of a video inpainting model for 3D scene inpainting. In addition, we introduce a reference-based 3D inpainting method to further enhance the quality of reconstruction. Experiments across diverse datasets show that compared to previous methods, ObjFiller-3D produces more faithful and fine-grained reconstructions (PSNR of 26.6 vs. NeRFiller (15.9) and LPIPS of 0.19 vs. Instant3dit (0.25)). Moreover, it demonstrates strong potential for practical deployment in real-world 3D editing applications.
ObjFiller-3D performs 3D object completion through three integrated stages: generating multi-view masks via trajectory-based rendering, producing coherent inpainted frames using a fine-tuned VACE model, and reconstructing the object with NeRF, 3DGS, or LRM for high-fidelity synthesis.
There's a lot of excellent work that was introduced around the same time as ours.
Relevant works include SPIn-NeRF, InFusion, AuraFusion360 and IMFine. But these works primarily focus on removing objects from scenes. In contrast, our work addresses the more general task of completing arbitrary masked regions, offering greater versatility.
Among existing approaches, the most closely related to ours are NeRFiller and Instant3dit.
@misc{feng2025objfiller3dconsistentmultiview3d,
title={ObjFiller-3D: Consistent Multi-view 3D Inpainting via Video Diffusion Models},
author={Haitang Feng and Jie Liu and Jie Tang and Gangshan Wu and Beiqi Chen and Jianhuang Lai and Guangcong Wang},
year={2025},
eprint={2508.18271},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2508.18271},
}