Simulation Engine for Benchmarking AI Products

Kashikoi is a simulation engine to benchmark AI agents. We generate CPU friendly world models that autonomously interview agents and generate deep behavioral assessments. We built a similar technology at Moveworks which was used to ship 250+ enterprise agents to customers daily.

Active Founders

Tim Michaud

Founder

Tim is a Founder of Kashikoi. Before Kashikoi, Tim worked at Moveworks(ServiceNow) where he was responsible for securing 250+ enterprise agents. He built automated red teaming software that broke GPT-4, as well as guardrails and offline jobs to protect Moveworks' agents from attackers. For fun he built his own model which found vulnerabilities in macOS and iOS which he publicly discusses on his blog

Tim Michaud

Founder

Tim is a Founder of Kashikoi. Before Kashikoi, Tim worked at Moveworks(ServiceNow) where he was responsible for securing 250+ enterprise agents. He built automated red teaming software that broke GPT-4, as well as guardrails and offline jobs to protect Moveworks' agents from attackers. For fun he built his own model which found vulnerabilities in macOS and iOS which he publicly discusses on his blog

Aaksha Meghawat

Founder

Founder, Kashikoi. Aaksha led the Simulation & Evaluation stack at Moveworks(ServiceNow) shipping 250+ customized enterprise ready agents to Fortune 500 and federal agencies. Aaksha has done award winning NSF sponsored research in Transformers at CMU (long before OpenAI made them cool). She shipped edge speech recognition models on 1bn+ iPhones (for the most esoteric dialects you can think of) at Apple. The innovation behind this was nominated for a Best Paper at Interspeech 2021.

Aaksha Meghawat

Founder

Founder, Kashikoi. Aaksha led the Simulation & Evaluation stack at Moveworks(ServiceNow) shipping 250+ customized enterprise ready agents to Fortune 500 and federal agencies. Aaksha has done award winning NSF sponsored research in Transformers at CMU (long before OpenAI made them cool). She shipped edge speech recognition models on 1bn+ iPhones (for the most esoteric dialects you can think of) at Apple. The innovation behind this was nominated for a Best Paper at Interspeech 2021.

Company Launches

Kashikoi - Simulation Engine for Benchmarking AI Agents

See original launch post

Hey YC!
We are Tim and Aaksha - cofounders of Kashikoi. Kashikoi is a simulation engine to benchmark GenAI Agents. We generate CPU friendly world models that autonomously interview agents and generate deep behavioral assessments.

The Problem

Building high-performing AI agents is becoming increasingly complex. Teams face many challenges:

Managing prompt bloat and keeping up with endless prompt tuning cycles.
Evaluating their agents (or competitors') meaningfully and efficiently.
Understanding agent performance in ways that reflect real-world values and expectations—not just public benchmarks.

Despite growing interest and investment, most solutions rely heavily on prompt engineering, public benchmarks, or surface-level observability. These approaches often mislead more than they inform, creating a false sense of progress.

The Solution

uploaded image

Today’s “LLMs” are adaptive systems which run test time adaptation loops behind that tiny “Thinking…” blinking on the screen. We are building a scalable version of test time adaptation and inference scaling a.k.a World Models that bring the power of these techniques to you.

Simply put you can simulate highly customized benchmarks, diverse data and align your evaluations while maintaining all of these for the long run, all without writing prompts! Our world models unlock automatic prompt optimization and detecting regression test staleness as fun side effects.

LLM based Systems are getting smarter and so should you using our World models!

Check out our simulation engine adaptively interviewing RAG Agents and multi-turn evaluation in action here.

Why Us

Tim and Aaksha used similar world models tech at Moveworks to massively reduce dev cycles for shipping 250+ customized enterprise ready agents.
Aaksha has done cutting edge research in Transformers at CMU (long before OpenAI made them cool). She shipped edge speech models on 1bn+ iPhones. The innovation behind these models was published as a paper at Interspeech 2021 and nominated for a Best Paper Award that same year.
Tim has found many high impact security vulnerabilities throughout his career. One of his top bug discoveries was in all Qualcomm GPS chips leading to a 50 mile 0-click exploit that had no mitigations. Tim has many public CVEs for a variety of Apple products, including: Safari, MacOS, iOS, tvOS, & iTunes.

Our Ask

Sign up for our waitlist at getkashikoi.com, or email us at founders@getkashikoi.com if you or someone you know wants:

Instant evals on your agent (or a competitor’s 😉) and we’ll generate such a report for you.
Making advanced features—like automatic prompt optimization, world models, and inference scaling—work seamlessly for you.
Reliable prompt free evaluations that are aligned with your values and expectations (which we auto-encode in a special edge-friendly world model for you).
Don’t come to us if:
- You love writing prompts
- You trust public benchmarks
- You think that good observability is enough to make your agent win
- You aren’t ready to have an honest conversation about your agents’ performance
Jokes aside 😅, if you know enterprises building agents suffering from prompt bloat, please (pretty please 🥺) send them our way!