Prompt Intelligence

Prompt Intelligence provides prompt version management and A/B experiments for LibreFang agents. Track prompt changes over time, compare variants side-by-side, and let the system automatically activate the best-performing prompt.


Quick Start

1. Enable the feature

Add to ~/.librefang/config.toml:

[prompt_intelligence]
enabled = true

2. Create prompt versions

Open Dashboard → Agents → select an agent → Prompts button → Versions tab → New Version.

Enter a system prompt and optional description. The server automatically generates a unique ID and content hash.

3. Run an A/B experiment

Switch to the Experiments tab → New Experiment → select 2 or more prompt versions → Create.

Click Start to begin routing traffic to the selected variants.

4. Review results

Click Metrics on a running experiment to see per-variant success rate, latency, and cost. When satisfied, click Complete — the winning variant's prompt is automatically activated.


Prompt Versioning

Automatic Tracking

When enabled = true, LibreFang computes a hash of each agent's system prompt at the start of every request. If the prompt content has changed since the last tracked version, a new version is created automatically and marked as active.

This means any change to an agent's system_prompt in its manifest or via the API is captured without manual intervention.

Manual Versions

Create versions explicitly through the Dashboard or API when you want to prepare variants for an experiment. Each version stores:

  • System prompt — the full prompt text
  • Content hash — computed server-side for deduplication
  • Version number — auto-incremented per agent
  • Description — optional human-readable label
  • Active flag — only one version is active per agent at a time

Version Pruning

When a new version is created and the total exceeds max_versions_per_agent (default: 50), the oldest inactive versions are automatically deleted. Active versions are never pruned.

Activating a Version

Click Activate on any version in the Dashboard, or call the API:

curl -X POST http://localhost:4545/api/prompts/versions/{version_id}/activate \
  -H "Content-Type: application/json" \
  -d '{"agent_id": "your-agent-id"}'

The agent will use this version's system prompt on the next request.


A/B Experiments

Concepts

An experiment compares two or more prompt variants on a single agent. Each variant is backed by a prompt version. When the experiment is running, incoming requests are routed to variants based on a configurable traffic split.

TermDescription
VariantA named entry in the experiment, linked to a specific prompt version.
Traffic splitPercentage of traffic each variant receives (e.g. [70, 30] for 70/30).
Success criteriaRules that determine whether a request was successful.

Experiment Lifecycle

Draft Running Paused Running Completed

               →→→ Completed →→→→→→
StatusBehavior
DraftExperiment is configured but not active. No traffic is routed.
RunningRequests are split among variants. Metrics are recorded.
PausedTraffic routing stops. Metrics are preserved.
CompletedExperiment ends. The variant with the highest success rate is automatically activated.

Traffic Split

Traffic is distributed using weighted selection. The traffic_split array defines the percentage for each variant. For example, with two variants and [70, 30]:

  • 70% of sessions are routed to Variant A (Control)
  • 30% of sessions are routed to Variant B

Routing is session-consistent — the same session always hits the same variant, determined by hashing the session ID.

Success Evaluation

A request is considered successful when:

  1. The agent produced a non-empty response
  2. The agent completed at least one iteration (executed the loop)

Metrics

Both streaming and non-streaming request paths record metrics per variant:

MetricDescription
Total requestsNumber of requests routed to this variant
Successful / FailedBreakdown by success criteria
Success ratePercentage of successful requests
Average latencyMean response time in milliseconds
Average costMean cost per request in USD
Total costCumulative cost for all requests

Auto-Activation

When an experiment is marked as Completed, the system:

  1. Compares success rates across all variants
  2. Selects the variant with the highest success rate
  3. Activates the corresponding prompt version for the agent

This happens automatically — no manual intervention required.


Dashboard

Prompts Button

Each agent card in the Agents page has a Prompts button that opens the Prompt Intelligence modal with two tabs:

Versions tab:

  • List all prompt versions with version number, active badge, description, and creation date
  • Create new versions with a system prompt and description
  • Activate any version with one click
  • Delete non-active versions

Experiments tab:

  • List all experiments with status badges and variant count
  • Create experiments by selecting 2+ prompt versions as variants
  • Start / Pause / Complete experiments
  • View detailed metrics with success rate badges, latency, and cost

Analytics Page

The Analytics page includes a Model Performance section with:

  • KPI cards for average latency, fastest model, cheapest per-call, and total calls
  • Latency comparison chart (avg/min/max by model)
  • Cost per call chart
  • Detailed performance table

API Reference

Prompt Versions

EndpointMethodDescription
/api/agents/{agent_id}/prompts/versionsGETList all prompt versions for an agent
/api/agents/{agent_id}/prompts/versionsPOSTCreate a new prompt version
/api/prompts/versions/{id}GETGet a specific version
/api/prompts/versions/{id}DELETEDelete a version
/api/prompts/versions/{id}/activatePOSTActivate a version (body: {"agent_id": "..."})

Create Version

curl -X POST http://localhost:4545/api/agents/{agent_id}/prompts/versions \
  -H "Content-Type: application/json" \
  -d '{
    "system_prompt": "You are a helpful coding assistant.",
    "description": "Focus on code quality",
    "version": 1,
    "tools": [],
    "variables": [],
    "created_by": "dashboard"
  }'

The server generates id, created_at, and content_hash automatically.

Experiments

EndpointMethodDescription
/api/agents/{agent_id}/prompts/experimentsGETList experiments for an agent
/api/agents/{agent_id}/prompts/experimentsPOSTCreate an experiment
/api/prompts/experiments/{id}GETGet experiment details
/api/prompts/experiments/{id}/startPOSTStart experiment
/api/prompts/experiments/{id}/pausePOSTPause experiment
/api/prompts/experiments/{id}/completePOSTComplete experiment (auto-activates winner)
/api/prompts/experiments/{id}/metricsGETGet per-variant metrics

Create Experiment

curl -X POST http://localhost:4545/api/agents/{agent_id}/prompts/experiments \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Tone comparison",
    "status": "draft",
    "traffic_split": [50, 50],
    "success_criteria": {
      "require_user_helpful": true,
      "require_no_tool_errors": true,
      "require_non_empty": true
    },
    "variants": [
      {
        "name": "Control",
        "prompt_version_id": "version-uuid-1",
        "description": "Current prompt"
      },
      {
        "name": "Variant B",
        "prompt_version_id": "version-uuid-2",
        "description": "More concise"
      }
    ]
  }'

Get Metrics

curl http://localhost:4545/api/prompts/experiments/{id}/metrics

Response:

[
  {
    "variant_id": "...",
    "variant_name": "Control",
    "total_requests": 142,
    "successful_requests": 128,
    "failed_requests": 14,
    "success_rate": 90.1,
    "avg_latency_ms": 1250.5,
    "avg_cost_usd": 0.0023,
    "total_cost_usd": 0.3266
  },
  {
    "variant_id": "...",
    "variant_name": "Variant B",
    "total_requests": 58,
    "successful_requests": 55,
    "failed_requests": 3,
    "success_rate": 94.8,
    "avg_latency_ms": 980.2,
    "avg_cost_usd": 0.0019,
    "total_cost_usd": 0.1102
  }
]

Model Performance

EndpointMethodDescription
/api/usage/by-model/performanceGETModel performance metrics with latency stats

Configuration Reference

[prompt_intelligence]
enabled = false              # Master toggle (default: false)
hash_prompts = true          # Compute content hashes for dedup (default: true)
max_versions_per_agent = 50  # Max versions before pruning (default: 50)

See the Configuration Reference for the full config schema.


Database

Prompt Intelligence uses SQLite (same database as the rest of LibreFang). Migration v13 creates four tables:

TablePurpose
prompt_versionsStores prompt version history per agent
prompt_experimentsExperiment definitions and status
experiment_variantsVariants within experiments, linking to prompt versions
experiment_metricsAggregated metrics per variant (requests, latency, cost)

Migration v14 adds a latency_ms column to the existing usage_events table for model performance tracking.

Migrations run automatically on startup. No manual intervention required.