Interactive, real-time, scalable video simulations: smooth at 1080p/60 fps, multimodal generation from images or text, and advanced scene understanding.