The traditional NeRF and its variations require substantial computational resources, often exceeding what is typically available in constrained environments. Moreover, the limited video memory capacity of client devices imposes significant constraints on processing and rendering extensive assets simultaneously in real-time. This high resource demand presents a critical challenge in rendering vast scenes in real-time, necessitating swift loading and processing of extensive datasets.
In response to the challenges faced in real-time rendering of expansive scenes, a team of researchers from the University of Science and Technology of China introduced a technique named Cityon-Web. Drawing inspiration from conventional graphics methods used for managing large-scale scenes, they divide the scene into manageable blocks and incorporate different Levels-of-Detail (LOD) for representation.
To enable real-time rendering, the researchers utilize radiance field baking methods to precompute and store rendering primitives in 3D atlas textures arranged within a sparse grid in each block. However, loading all atlas textures into a single shader proves impractical due to limitations in shader resources. As a solution, the scene is structured as a hierarchy of segmented blocks, each rendered by a dedicated shader during the rendering process.
By employing a “divide and conquer” approach, they ensure that each block can adequately represent intricate details within the scene accurately. Additionally, to maintain high fidelity in the rendered output during the training phase, they simulate blending multiple shaders aligned with the rendering pipeline.
These block-based representations, combined with Levels-of-Detail (LOD) integration, facilitate dynamic resource management, streamlining the real-time loading and unloading process based on the viewer’s position and field of view. This adaptable loading strategy significantly reduces the bandwidth and memory demands of rendering extensive scenes, resulting in smoother user experiences, particularly on less powerful devices.
The experiments conducted demonstrate that City-on-Web achieves the rendering of photorealistic large-scale scenes at 32 frames per second (FPS) with a resolution of 1080p, utilizing an RTX 3060 GPU. It utilizes only 18% of the VRAM and 16% of the payload size compared to existing mesh-based methods.
The integration of block partitioning and Levels-of-Detail (LOD) has notably decreased the payload on the web platform while enhancing resource management efficiency. This method ensures high-fidelity rendering quality by maintaining consistency between the training process and the rendering phase.