The Spatial Organization Orchestrator - Deriving insights using multimodal AI
PROJECT TYPE
Multi Modal Object Detection, Gaussian Splatting with the Meta Quest 3, Building with Googles AI Studio
Project Duration
Solo 1 Day Sprint.
Result
Built a functional prototype that detects objects with insights into cleaning a cupboard as an output
Project Context
Managing physical clutter often feels overwhelming because the 'mess' lacks structure. While traditional 3D scanning creates rigid, confusing meshes, I utilized 3D Gaussian Splatting to capture the soft, nuanced reality of a home environment, providing a more intuitive digital twin for the user to interact with.
Exploring Gaussian Splatting
I decided to start by exploring Gaussian Splatting and dived deep into it's history. While traditional triangulated meshes struggle with 'fuzzy' or complex geometry, 3D Gaussian Splatting (popularized in 2023) offers a way to represent volumetric data with far greater fidelity, capturing the nuances of a cluttered reality that standard photogrammetry often misses.
Having designed 3D objects using Maya and Rhino in the past, I could see the value and use cases of Gaussian Splatting.
Why Hyperscape?
Most 3D scanning apps require you to move your phone in awkward circles, but Hyperscape leverages the Quest 3’s spatial sensors to let you simply 'walk around' your environment. As I moved past my messy cupboard, the headset's cameras captured the scene from dozens of angles.
Gaussian Splatting Meets Machine Learning
I have been diving deep into Machine Learning by taking courses such as the Machine learning specialization by Deeplearning.io/Stanford Online.
While learning about what Gaussian splatting could do, I connected the dots and explored the use case where a multimodal AI API (Gemini 1.5 Pro) could be used to derive insights.
The design thinking process
While thinking through the experience, I went through three stages and fed all the context as prompts to Gemini's AI studio:
Thinking about the types of insights, and I came up with
a) Granular Perception: Leveraging the high-fidelity nature of splats to detect subtle issues like surface dust or fine-wire tangles.
b) Contextual Logic: Moving beyond simple object detection to provide qualitative 'Deep Reasoning' on why a space is unorganized.
I ideated on how to layer information without obscuring the physical world, landing on a glassmorphic UI to ensure high legibility while maintaining a sense of spatial depth.

I then sketched out the flow, starting with Gemini 1.5 Pro gathering insights to generating an output from a set of frames.

The high level user flow
I designed a feedback loop where the system captures the user's environment in real-time, processes visual chaos into structured advice, and maps those insights back onto the physical space via glassmorphic spatial cues.

Crafting Prompts For Gemini's AI studio
I split the prompts into two parts:
To build put the object detection
UI instructions
Follow up prompts to debug and refine

The technical architecture
To create a seamless 'hands-free' experience, I developed a pipeline that captures Quest 3 screencasts and uses Gemini 1.5 Pro as a reasoning engine. This allows the system to act as a Spatial Guide, overlaying actionable JSON-driven annotations directly into the user’s field of view.
