A benchmark for General AI Assistants that tests AI systems on real-world tasks requiring reasoning, multi-modality, and tool use capabilities.