I-Scene: 3D Instance Models are Implicit Generalizable Spatial Learners

Lu Ling1; Yunhao Ge2, Yichen Sheng2, Aniket Bera1

1Purdue University   •   2NVIDIA Research

Abstract

Generalization remains the central challenge for interactive 3D scene generation. Existing learning‑based approaches ground spatial understanding in limited scene dataset, restricting generalization to new layouts. We instead reprogram a pre‑trained 3D instance generator to act as a scene‑level learner via, replacing dataset-bounded supervision with model-centric spatial supervision. This reprogramming unlocks the generator's transferable spatial knowledge, enabling generalization to unseen layouts and novel object compositions. Remarkably, spatial reasoning still emerges even when the training scenes are randomly composed objects. This demonstrates that the generator’s transferable scene prior provides a rich learning signal for inferring proximity, support, and symmetry from purely geometric cues. Replacing widely used canonical space, we instantiate this insight with a view‑centric formulation of the scene space, yielding a fully feed‑forward, generalizable scene generator that learns spatial relations directly from the instance model. Quantitative and qualitative results show that a 3D instance generator is an implicit spatial learner and reasoner, pointing toward foundation models for interactive 3D scene understanding and generation.

More Visualization Examples (Interactive Results)

Loading 3D Scene...

Comparison with State-of-the-Art (SOTA)

Scene
Scene
Scene
Scene
Scene
Scene
Scene
Scene
Scene
Scene
Scene
Scene
Scene
Scene
Scene
Scene
Scene

Method Overview / How It Works

Interesting Findings for Non-Semantic Scenes

Table 2. Comparison on 3D-FRONT and BlendSwap & Scenethesis. CD: Chamfer Distance; F-Score threshold τ=0.1. S = scene-level, O = object-level; IoU-B = volumetric IoU of scene bounding boxes. Best numbers are bold.

Training dataset 3D-FRONT BlendSwap & Scenethesis
CD-S↓ F-Score-S↑ CD-O↓ F-Score-O↑ IoU-B↑ CD-S↓ F-Score-S↑ CD-O↓ F-Score-O↑ IoU-B↑
3D-FT (25K) 0.0137 93.77 0.0278 81.34 0.8792 0.0118 90.79 0.0585 68.87 0.8222
Rand-15K 0.0496 79.96 0.0932 55.01 0.7729 0.0081 92.67 0.0698 67.36 0.8445
Rand-25K 0.0406 81.39 0.0402 74.76 0.7783 0.0075 93.60 0.0580 70.18 0.8471
3D-FT+Rand-15K 0.0148 93.50 0.0207 84.28 0.8762 0.0059 94.26 0.0503 72.39 0.8568

BibTeX

If you find this work useful, please consider citing our paper:

@article{ling2025iscene, title={I-Scene: 3D Instance Models are Implicit Generalizable Spatial Learners}, author={Ling, Lu and Ge, Yunhao and Sheng, Yichen and Bera, Aniket}, journal={arXiv preprint arXiv:2512.13683}, year={2025} }