Docs GitHub 2.4k
Now in beta · v0.4

One decorator.
A shareable eval report.

Drop AgentsProof into your agent in 30 seconds. Get a public URL that proves it works — scored on goal completion, tool accuracy, efficiency, quality, and safety.

$ npm install @agentsproof/sdk
agentsproof.dev/r/x8kp2m live
agent.tsyour code
import { AgentsProof } from '@agentsproof/sdk';

const ap = new AgentsProof({ apiKey: process.env.AGENTSPROOF_API_KEY });
const run = ap.startRun({ projectSlug: 'my-agent', input: { query } });

const result = await run.trace('llm_call', 'gpt-4o', () => llm(query));
const { publicUrl } = await run.complete({ answer: result });

console.log(publicUrl); // → https://agentsproof.dev/r/abc123
report · run #47the proof
my-coding-agent
2 minutes ago · 8 steps
87/100
Goal completion92
Tool accuracy78
Step efficiency95
Output quality88
Safety97
No anomalies
Drops intoOpenAIAnthropicLangChainVercel AI SDKLlamaIndex
How it works

Three lines between you and proof.

01run.install()

Install the SDK

One package, zero config. Works in any Node or edge runtime.

$ npm i @agentsproof/sdk
02run.trace()

Wrap your calls

Drop run.trace() around each LLM and tool call. That's the whole integration.

$ run.trace('llm_call', fn)
03run.share()

Get a public URL

Every run is scored on 5 axes and gets a shareable report card.

$ → agentsproof.dev/r/x8kp2m
agent.tsCopy
import { AgentsProof } from '@agentsproof/sdk';

const ap = new AgentsProof({ apiKey: process.env.AGENTSPROOF_API_KEY });
const run = ap.startRun({ projectSlug: 'my-agent', input: { query } });

const result = await run.trace('llm_call', 'gpt-4o', () => llm(query));
const { publicUrl } = await run.complete({ answer: result });

console.log(publicUrl); // → https://agentsproof.dev/r/abc123
From the timeline

Devs are posting their scores.

@nadia_builds· 2h

Shipped 4 agents this month. AgentsProof is the first time I actually know which one is good.

@theo_dev· 1d

The public report URL is clever. Posted my score on HN and got 3 people asking what I used.

@marcus_ml· 3d

5 minute setup. Caught a tool-call loop I'd never have spotted otherwise.