Introduction
What langprobe is, the replay + agent-first wedge, and where to go next.
langprobe is the real debugger for agents. It's a self-hosted LLM observability, eval-rigor, and agent-replay platform that traces every run, lets you replay a captured run with edits applied, and scores outputs with a panel of LLM judges — all in one place, in your VPC.
Most tools in this space are dashboards for humans. langprobe's wedge is two things they don't do:
Replay
Open a broken run, edit a prompt, model, or tool config, re-run it, and diff what changed — span by span, with a determinism verdict so you know whether the fix is real or a lucky sample. This is the debugger you reach for at 2am.
Agent-first
The same surface is built for agents, not just people. Every read is available as a token-budgeted, LLM-legible projection over REST and MCP, so an agent can debug an agent: find the failed run, read its salient slice, replay an edit, read the diff. A 48k-token trace becomes a 2k-token salient slice.
No proprietary SDK
langprobe ingests plain OTLP/HTTP at POST /v1/traces — no proprietary
SDK required. Stock OpenInference instrumentors for CrewAI, DSPy, Pydantic
AI, OpenAI Agents, LlamaIndex, and bare providers work out of the box, or use
the Python SDK directly.