GO-Renderer: Generative Object Rendering with 3D-aware Controllable Video Diffusion Models

Zekai Gu^1,2 Shuoxuan Feng³ Yansong Wang⁴ Hanzhuo Huang⁵ Zhongshuo Du Chengfeng Zhao¹ Chengwei Ren¹ Peng Wang^{2 †} Yuan Liu¹

¹The Hong Kong University of Science and Technology ²VAST ³Nanyang Technological University ⁴Tsinghua University ⁵ShanghaiTech University Corresponding author ^† Project leader

Arxiv Code Page

GO-Renderer combines coarse 3D proxies with a video diffusion model to render objects with accurate viewpoint control, strong multi-view consistency, and realistic lighting in novel environments.

Abstract

Reconstructing a renderable 3D model from images is a useful but challenging task. Recent feedforward 3D reconstruction methods have demonstrated remarkable success in efficiently recovering geometry, but still cannot accurately model the complex appearances of these 3D reconstructed models. Recent diffusion-based generative models can synthesize realistic images or videos of an object using reference images without explicitly modeling its appearance, which provides a promising direction for object rendering, but lacks accurate control over the viewpoints. In this paper, we propose GO-Renderer, a unified framework integrating the reconstructed 3D proxies to guide the video generative models to achieve high-quality object rendering on arbitrary viewpoints under arbitrary lighting conditions. Our method not only enjoys the accurate viewpoint control using the reconstructed 3D proxy but also enables high-quality rendering in different lighting environments using diffusion generative models without explicitly modeling complex materials and lighting. Extensive experiments demonstrate that GO-Renderer achieves state-of-the-art performance across the object rendering tasks, including synthesizing images on new viewpoints, rendering the objects in a novel lighting environment, and inserting an object into an existing video.

Motivation

Why object rendering is still hard

Appearance is hard to reconstruct

Feed-forward 3D reconstruction can recover geometry efficiently, but it still struggles to faithfully model complex surface appearance, materials, and relighting behavior.

2D generative models lack control

Reference-based video diffusion models can synthesize realistic content, but they often hallucinate unseen regions and cannot follow strict camera trajectories with strong multi-view consistency.

GO-Renderer combines both strengths

A coarse 3D proxy gives explicit structural guidance, while diffusion priors deliver realistic appearance and lighting without requiring full physical material recovery.

Pipeline

Coordinate-map guided generative object rendering

GO-Renderer first reconstructs a coarse 3D proxy, then renders object-centric coordinate maps for reference views and target trajectories. These maps become dense spatial conditions for the video diffusion model.

Overview of the GO-Renderer architecture from the paper.

Our Results

A large, cartoonish statue of a toy. The statue stands in a sunlit, open plaza with historic stone buildings and lush green trees in the background. The sky is clear and blue.

Chocolate cupcake with glossy frosting in black wrapper, A cozy bakery counter with warm wooden tones, Soft morning light streaming through a window.

Cartoon-style layered cake with red cherry, smooth texture, A cozy kitchen counter with wooden table and soft lighting, Warm evening glow.

Modern luxury yacht, white with metallic accents, Open ocean near a tropical island during a calm day, Bright midday sunlight.

A car floating above the seafloor. The liquid environment is filled with a sense of pressure and provocative speed.

Black over-ear wireless headphones with matte finish, Modern home office desk with soft ambient lighting, Evening with warm artificial light.

Brown textured fish sculpture, A serene pond surrounded by greenery and rocks, Soft morning light.

Sliced cake with red cherry, beige layers, and white frosting, A cozy kitchen counter with wooden accents and soft lighting, Warm evening glow.

Applications

Offline rendering and object insertion

According to the manuscript, GO-Renderer supports practical downstream usage such as Blender-integrated offline rendering and inserting rendered objects into real-world videos with plausible reflections and shadows.

Application examples highlighted in the manuscript.

Citation

BibTeX

@misc{gu2026gorenderergenerativeobjectrendering,
          title={GO-Renderer: Generative Object Rendering with 3D-aware Controllable Video Diffusion Models}, 
          author={Zekai Gu and Shuoxuan Feng and Yansong Wang and Hanzhuo Huang and Zhongshuo Du and Chengfeng Zhao and Chengwei Ren and Peng Wang and Yuan Liu},
          year={2026},
          eprint={2603.23246},
          archivePrefix={arXiv},
          primaryClass={cs.CV},
          url={https://arxiv.org/abs/2603.23246}, 
    }