Fulcrum - for those who are tired of being bullshitted by their agents
TL;DR:
We’re building the best software for debugging AI agents and their environments. If you’re building RL environments or deploying agents, get in touch with us and try out our product.
https://www.youtube.com/watch?v=WXa-cGlg6E4
The problem
Agents are incredibly complicated. Evals are a mess. We need to understand what our agents are capable of and if our RL environments even work.
The solution
Our monitoring system identifies why your agents don't succeed, uncovers bugs in your environment, and detects fake solutions, reward hacking, or catastrophic agent failures. Our redteaming agents run experiments to diagnose your agent issues. They plug into your environments, agent traces, and agent source code to find the root cause of your issues. Finally we take all of these investigations, and turn them into intuitive, explorable reports you can chat with.
Who we are
We met at MIT where we were doing research on LLMs. In our latest paper we built a method for scalably generating software environments of arbitrary difficulty. Doing it ourselves gave us the know-how to build a great environment debugger. We have published at top ML conferences (NeurIPS, CoLM) and built widely-used open source software (5k stars, >100k downloads). Our team has won a total of four medals at international olympiads in physics and economics.
Call to action:
Email us at team@fulcrumresearch.ai if you’re building environments or deploying agents, and share this around with people who are. Or chat with us!