Prompt Intelligence
Prompt Intelligence provides prompt version management and A/B experiments for LibreFang agents. Track prompt changes over time, compare variants side-by-side, and let the system automatically activate the best-performing prompt.
Quick Start
1. Enable the feature
Add to ~/.librefang/config.toml:
[prompt_intelligence]
enabled = true
2. Create prompt versions
Open Dashboard → Agents → select an agent → Prompts button → Versions tab → New Version.
Enter a system prompt and optional description. The server automatically generates a unique ID and content hash.
3. Run an A/B experiment
Switch to the Experiments tab → New Experiment → select 2 or more prompt versions → Create.
Click Start to begin routing traffic to the selected variants.
4. Review results
Click Metrics on a running experiment to see per-variant success rate, latency, and cost. When satisfied, click Complete — the winning variant's prompt is automatically activated.
Prompt Versioning
Automatic Tracking
When enabled = true, LibreFang computes a hash of each agent's system prompt at the start of every request. If the prompt content has changed since the last tracked version, a new version is created automatically and marked as active.
This means any change to an agent's system_prompt in its manifest or via the API is captured without manual intervention.
Manual Versions
Create versions explicitly through the Dashboard or API when you want to prepare variants for an experiment. Each version stores:
- System prompt — the full prompt text
- Content hash — computed server-side for deduplication
- Version number — auto-incremented per agent
- Description — optional human-readable label
- Active flag — only one version is active per agent at a time
Version Pruning
When a new version is created and the total exceeds max_versions_per_agent (default: 50), the oldest inactive versions are automatically deleted. Active versions are never pruned.
Activating a Version
Click Activate on any version in the Dashboard, or call the API:
curl -X POST http://localhost:4545/api/prompts/versions/{version_id}/activate \
-H "Content-Type: application/json" \
-d '{"agent_id": "your-agent-id"}'
The agent will use this version's system prompt on the next request.
A/B Experiments
Concepts
An experiment compares two or more prompt variants on a single agent. Each variant is backed by a prompt version. When the experiment is running, incoming requests are routed to variants based on a configurable traffic split.
| Term | Description |
|---|---|
| Variant | A named entry in the experiment, linked to a specific prompt version. |
| Traffic split | Percentage of traffic each variant receives (e.g. [70, 30] for 70/30). |
| Success criteria | Rules that determine whether a request was successful. |
Experiment Lifecycle
Draft → Running → Paused → Running → Completed
↘ ↗
→→→ Completed →→→→→→
| Status | Behavior |
|---|---|
| Draft | Experiment is configured but not active. No traffic is routed. |
| Running | Requests are split among variants. Metrics are recorded. |
| Paused | Traffic routing stops. Metrics are preserved. |
| Completed | Experiment ends. The variant with the highest success rate is automatically activated. |
Traffic Split
Traffic is distributed using weighted selection. The traffic_split array defines the percentage for each variant. For example, with two variants and [70, 30]:
- 70% of sessions are routed to Variant A (Control)
- 30% of sessions are routed to Variant B
Routing is session-consistent — the same session always hits the same variant, determined by hashing the session ID.
Success Evaluation
A request is considered successful when:
- The agent produced a non-empty response
- The agent completed at least one iteration (executed the loop)
Metrics
Both streaming and non-streaming request paths record metrics per variant:
| Metric | Description |
|---|---|
| Total requests | Number of requests routed to this variant |
| Successful / Failed | Breakdown by success criteria |
| Success rate | Percentage of successful requests |
| Average latency | Mean response time in milliseconds |
| Average cost | Mean cost per request in USD |
| Total cost | Cumulative cost for all requests |
Auto-Activation
When an experiment is marked as Completed, the system:
- Compares success rates across all variants
- Selects the variant with the highest success rate
- Activates the corresponding prompt version for the agent
This happens automatically — no manual intervention required.
Dashboard
Prompts Button
Each agent card in the Agents page has a Prompts button that opens the Prompt Intelligence modal with two tabs:
Versions tab:
- List all prompt versions with version number, active badge, description, and creation date
- Create new versions with a system prompt and description
- Activate any version with one click
- Delete non-active versions
Experiments tab:
- List all experiments with status badges and variant count
- Create experiments by selecting 2+ prompt versions as variants
- Start / Pause / Complete experiments
- View detailed metrics with success rate badges, latency, and cost
Analytics Page
The Analytics page includes a Model Performance section with:
- KPI cards for average latency, fastest model, cheapest per-call, and total calls
- Latency comparison chart (avg/min/max by model)
- Cost per call chart
- Detailed performance table
API Reference
Prompt Versions
| Endpoint | Method | Description |
|---|---|---|
/api/agents/{agent_id}/prompts/versions | GET | List all prompt versions for an agent |
/api/agents/{agent_id}/prompts/versions | POST | Create a new prompt version |
/api/prompts/versions/{id} | GET | Get a specific version |
/api/prompts/versions/{id} | DELETE | Delete a version |
/api/prompts/versions/{id}/activate | POST | Activate a version (body: {"agent_id": "..."}) |
Create Version
curl -X POST http://localhost:4545/api/agents/{agent_id}/prompts/versions \
-H "Content-Type: application/json" \
-d '{
"system_prompt": "You are a helpful coding assistant.",
"description": "Focus on code quality",
"version": 1,
"tools": [],
"variables": [],
"created_by": "dashboard"
}'
The server generates id, created_at, and content_hash automatically.
Experiments
| Endpoint | Method | Description |
|---|---|---|
/api/agents/{agent_id}/prompts/experiments | GET | List experiments for an agent |
/api/agents/{agent_id}/prompts/experiments | POST | Create an experiment |
/api/prompts/experiments/{id} | GET | Get experiment details |
/api/prompts/experiments/{id}/start | POST | Start experiment |
/api/prompts/experiments/{id}/pause | POST | Pause experiment |
/api/prompts/experiments/{id}/complete | POST | Complete experiment (auto-activates winner) |
/api/prompts/experiments/{id}/metrics | GET | Get per-variant metrics |
Create Experiment
curl -X POST http://localhost:4545/api/agents/{agent_id}/prompts/experiments \
-H "Content-Type: application/json" \
-d '{
"name": "Tone comparison",
"status": "draft",
"traffic_split": [50, 50],
"success_criteria": {
"require_user_helpful": true,
"require_no_tool_errors": true,
"require_non_empty": true
},
"variants": [
{
"name": "Control",
"prompt_version_id": "version-uuid-1",
"description": "Current prompt"
},
{
"name": "Variant B",
"prompt_version_id": "version-uuid-2",
"description": "More concise"
}
]
}'
Get Metrics
curl http://localhost:4545/api/prompts/experiments/{id}/metrics
Response:
[
{
"variant_id": "...",
"variant_name": "Control",
"total_requests": 142,
"successful_requests": 128,
"failed_requests": 14,
"success_rate": 90.1,
"avg_latency_ms": 1250.5,
"avg_cost_usd": 0.0023,
"total_cost_usd": 0.3266
},
{
"variant_id": "...",
"variant_name": "Variant B",
"total_requests": 58,
"successful_requests": 55,
"failed_requests": 3,
"success_rate": 94.8,
"avg_latency_ms": 980.2,
"avg_cost_usd": 0.0019,
"total_cost_usd": 0.1102
}
]
Model Performance
| Endpoint | Method | Description |
|---|---|---|
/api/usage/by-model/performance | GET | Model performance metrics with latency stats |
Configuration Reference
[prompt_intelligence]
enabled = false # Master toggle (default: false)
hash_prompts = true # Compute content hashes for dedup (default: true)
max_versions_per_agent = 50 # Max versions before pruning (default: 50)
See the Configuration Reference for the full config schema.
Database
Prompt Intelligence uses SQLite (same database as the rest of LibreFang). Migration v13 creates four tables:
| Table | Purpose |
|---|---|
prompt_versions | Stores prompt version history per agent |
prompt_experiments | Experiment definitions and status |
experiment_variants | Variants within experiments, linking to prompt versions |
experiment_metrics | Aggregated metrics per variant (requests, latency, cost) |
Migration v14 adds a latency_ms column to the existing usage_events table for model performance tracking.
Migrations run automatically on startup. No manual intervention required.