langprobe

Introduction

What langprobe is, the replay + agent-first wedge, and where to go next.

langprobe is the real debugger for agents. It's a self-hosted LLM observability, eval-rigor, and agent-replay platform that traces every run, lets you replay a captured run with edits applied, and scores outputs with a panel of LLM judges — all in one place, in your VPC.

Most tools in this space are dashboards for humans. langprobe's wedge is two things they don't do:

Replay

Open a broken run, edit a prompt, model, or tool config, re-run it, and diff what changed — span by span, with a determinism verdict so you know whether the fix is real or a lucky sample. This is the debugger you reach for at 2am.

Agent-first

The same surface is built for agents, not just people. Every read is available as a token-budgeted, LLM-legible projection over REST and MCP, so an agent can debug an agent: find the failed run, read its salient slice, replay an edit, read the diff. A 48k-token trace becomes a 2k-token salient slice.

No proprietary SDK

langprobe ingests plain OTLP/HTTP at POST /v1/traces — no proprietary SDK required. Stock OpenInference instrumentors for CrewAI, DSPy, Pydantic AI, OpenAI Agents, LlamaIndex, and bare providers work out of the box, or use the Python SDK directly.

Next steps

On this page