Platform for building RL environments and evals
HUD (YC W25) is developing agentic evals and RL environments for Computer Use Agents (CUAs) that browse the web for frontier AI labs. Our CUA Evals framework is the first comprehensive evaluation tool for CUAs.
People don't actually know if AI agents are working reliably. To make AI agents work in the real world, we need detailed evals for a huge range of tasks.
We're backed by Y Combinator, and work closely with frontier AI labs to provide agent evaluation and training infrastructure at scale.