Introduction

langprobe is the real debugger for agents. It's a self-hosted LLM observability, eval-rigor, and agent-replay platform that traces every run, lets you replay a captured run with edits applied, and scores outputs with a panel of LLM judges — all in one place, in your VPC.

Most tools in this space are dashboards for humans. langprobe's wedge is two things they don't do:

Replay

Open a broken run, edit a prompt, model, or tool config, re-run it, and diff what changed — span by span, with a determinism verdict so you know whether the fix is real or a lucky sample. This is the debugger you reach for at 2am.

Agent-first

The same surface is built for agents, not just people. Every read is available as a token-budgeted, LLM-legible projection over REST and MCP, so an agent can debug an agent: find the failed run, read its salient slice, replay an edit, read the diff. A 48k-token trace becomes a 2k-token salient slice.

No proprietary SDK

langprobe ingests plain OTLP/HTTP at POST /v1/traces — no proprietary SDK required. Stock OpenInference instrumentors for CrewAI, DSPy, Pydantic AI, OpenAI Agents, LlamaIndex, and bare providers work out of the box, or use the Python SDK directly.

Introduction

Replay

Agent-first

No proprietary SDK

Next steps

Getting Started

Guides

API Reference

Python SDK

On this page