Statistical Testing for Non-Deterministic AI Agents

agentrial

Statistical Testing for Non-Deterministic AI Agents

agentrial is an open-source Python framework that runs your AI agent N times on each test case and gives you confidence intervals instead of pass/fail.

Your agent passed 10/10 runs? Wilson CI says the true reliability could be as low as 72%. agentrial catches that.

  • Multi-trial evaluation — Wilson confidence intervals on pass rates, bootstrap resampling on cost/latency
  • Failure attribution — Fisher exact test pinpoints which step in your pipeline breaks
  • Regression detection — compare versions in CI/CD, exit code 1 blocks the PR on significant drops
  • Framework-agnostic — adapters for LangGraph, CrewAI, AutoGen, Pydantic AI, OpenAI Agents SDK, smolagents, or any Python callable
  • Local-first — no accounts, no telemetry, no cloud. MIT license.

pip install agentrial

  • agentrial
  • agentrial
  • agentrial

Comments, support and feedback

    About this launch

    agentrial by Alessandro Potenza Will be launched June 29th 2027.

    Trending launches