Auto-Evolution Mode
Auto-Evolution Mode is an opt-in capability of the DevOps Hand that lets the daemon watch GitHub repositories on a schedule and act on what it finds without human prompting: review open pull requests, triage open issues, and produce draft pull requests that implement triaged bug fixes or feature requests via a Brainstorm → Architect → PRD → Implement (BMAD) pipeline.
Auto-evolution is not a separate Hand. It is a Phase-7 extension of the existing DevOps Hand, sharing the same agents (coordinator, engineer, ops, code-reviewer) plus one new sub-agent (implementer). Activate it by flipping the DevOps Hand's auto_evolve setting to true — no extra install required once librefang hand install devops is done.
Table of Contents
- What it does
- Where it fits in the platform
- Setup
- Required GitHub token scopes
- Configuration reference
- Safety model
- BMAD pipeline phases
- Observability
- Troubleshooting
- What it does NOT do
What it does
Every evolution_check_interval (default 15 min) the Hand wakes up and, for each owner/repo in evolution_repos:
- Reviews open PRs — pulls each PR's diff, asks the existing
code-reviewersub-agent for a structured verdict, posts a singleCOMMENTreview (orREQUEST_CHANGESon blocking findings — never auto-APPROVE). PRs already reviewed at their currenthead_shaare skipped. - Triages open issues — labels first (
bug/feature/question/wontfix), single-prompt LLM fallback when labels are absent. Outcomes:bug-fix | feature | needs-info | skip. - Implements actionable issues — dispatches
bug-fixandfeatureissues to theimplementersub-agent, which runs the BMAD pipeline scaled bybmad_strictnessand opens a draft PR.
Bot-authored PRs (dependabot, renovate, etc.) get a token-cheap pass: recorded but not deeply reviewed. PRs with > 200 changed files are surfaced for human review rather than spending tokens on a diff the reviewer can't usefully ground.
Where it fits in the platform
LibreFang ships three autonomous self-improvement subsystems. They are designed to complement each other, not duplicate:
| Subsystem | Scope | Trigger | Output |
|---|---|---|---|
auto_dream | Agent's own memory | Time + session-count gate, per-agent opt-in | Consolidated memory entries |
skill_workshop | Reusable workflows captured from agent turns | Post-turn hook, opt-in per agent | Candidate skill drafts in ~/.librefang/skills/pending/ |
auto_evolve (this page) | Source code in upstream repos | Cron gate inside DevOps Hand, opt-in per Hand instance | PR review comments + draft PRs |
Together they form the platform's "code evolves" / "memory consolidates" / "workflows distill" triad. Each is independently gated and independently safe to disable. See docs/architecture/skill-workshop.md in the repo for the same pattern applied to skill capture.
Setup
The Hand definition lives in librefang/librefang-registry under hands/devops/. Install, configure, and activate:
# 1. Install or update the DevOps Hand
librefang hand install devops
# or, if already installed:
librefang hand update devops
# 2. Provide a GitHub Personal Access Token (see scopes below)
export GITHUB_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxx
# (Or write GITHUB_TOKEN=... to ~/.librefang/.env)
# 3. Configure the four evolution settings
librefang hand config devops auto_evolve=true
librefang hand config devops evolution_repos=librefang/librefang,librefang/librefang-registry
librefang hand config devops evolution_check_interval=15min
librefang hand config devops bmad_strictness=standard
# Optional: tighten or relax the per-PR file-touch budget
librefang hand config devops max_changed_files=30
# 4. Activate (or restart if already active)
librefang hand activate devops
The full Hand-side config matrix (with each option's effect and default) is in the hands/devops/README.md of the registry. This page focuses on platform behaviour, not the Hand itself.
Required GitHub token scopes
For public-repo evolution, a fine-grained token needs:
- Pull requests — read & write (posting reviews, opening draft PRs)
- Issues — read & write (triage comments, cross-link comments on the source issue)
- Contents — read & write (pushing the implementation branch)
- Metadata — read (resolving
default_branchfor PR creation)
For private-repo evolution, add the classic repo scope and ensure each target repo is listed in evolution_repos. Forks of public repos still count as private for token purposes if the fork is private.
Never store the PAT in librefang.toml or in a Hand setting. Keep it in the GITHUB_TOKEN environment variable read at daemon start, or in ~/.librefang/.env. The Hand's shell_exec calls reference $GITHUB_TOKEN directly so the token never lands in agent message history or memory.
Configuration reference
| Setting | Type | Default | Effect |
|---|---|---|---|
auto_evolve | toggle | false | Master switch. When false, Phase 7 is skipped entirely on every tick. |
evolution_repos | text | "" | Comma-separated owner/repo pairs. Empty disables the loop without needing to flip auto_evolve. |
evolution_check_interval | select | 15min | Per-repo cadence. Values: 5min / 15min / 1hour / 6hour / 1day. |
bmad_strictness | select | standard | Depth of the BMAD pipeline (see below). Values: light / standard / strict. |
max_changed_files | select | 30 | Hard cap on files touched by a single auto-generated draft PR. Larger work is split into multiple PRs. |
approval_mode | toggle (inherited) | true | When true, deployment-style and destructive actions go through devops_queue.json for human approval — this applies to evolution work too. |
Safety model
Auto-evolution operates under three layered guarantees that the operator cannot accidentally turn off:
1. Draft-only PRs. Every PR the implementer creates is draft: true. The Hand never marks a PR ready-for-review and never merges. A human is always the one to flip the readiness flag and click merge.
2. No protected-branch writes. The implementer never pushes to main, master, trunk, or any branch protected by a GitHub ruleset. It never uses --force, --no-verify, --no-gpg-sign, or --amend against a remote branch. Upstream pre-commit / pre-push / commit-msg hooks are discovered via git config core.hooksPath and honored — hook failure aborts the task rather than triggering a retry.
3. Path safety floor. The implementer stops and writes to devops_queue.json (for human triage) rather than committing if the change would touch:
Cargo.tomlworkspacemembers = [...]entries- migration files (any
*/migrations/*,*/migrate/*, or*.sql) - secrets (
.env*,*.pem,*.p12,id_rsa,id_ed25519,credentials*,secrets*,vault_*.key) - more files than the
max_changed_filessetting allows (default 30)
In addition, each evolution tick self-paces against ~70% of the per-turn token budget so subsequent ticks have headroom and a runaway pipeline can't starve the rest of the Hand.
BMAD pipeline phases
When the implementer is dispatched to an actionable issue (bug-fix or feature), it runs a four-phase pipeline whose depth scales with bmad_strictness:
| Phase | light | standard | strict |
|---|---|---|---|
| Brainstorm | skip | inline (≤200 words) | inline + queue gate |
| Architect | always | always | always + queue gate |
| PRD | skip | required | required + queue gate |
| Implement | always | always | always |
Each phase's output is appended to a BMAD.md file committed alongside the implementation so reviewers can see the reasoning that led to the diff.
For bug fixes, the implement phase is strictly test-first: the failing reproduction test is committed before the fix, in the same PR. For features, tests land alongside the code.
Strict mode queue gate. Between every phase, the implementer writes the produced artifact to devops_queue.json with status: "pending" and ends the current turn. The next continuous tick re-reads the queue; if the user (out-of-band) has flipped status to approved, the implementer resumes from the next phase. There is no in-turn polling or sleep — the queue persists across turns by design.
Observability
Three new metrics surface on the agent dashboard at http://127.0.0.1:4545:
- PRs Reviewed — total successful review postings (
devops_hand_prs_reviewed) - Issues Processed — total triaged issues, regardless of outcome (
devops_hand_issues_processed) - Draft PRs Opened — total auto-generated draft PRs (
devops_hand_draft_prs_opened)
The Hand also publishes four advisory events that subscribers (dashboard, audit log, downstream Hands) can consume:
| Event | Payload | When |
|---|---|---|
devops_evolution_pr_reviewed | { pr_url, verdict, head_sha } | After a PR review is posted |
devops_evolution_pr_opened | { pr_url, issue_url, classification } | After a draft PR is created from an issue |
devops_evolution_blocked | { reason, pr_or_issue_url, retry_after } | When a tick aborts (safety floor / API / hook) |
devops_evolution_skipped | { pr_or_issue_url, reason } | When the cadence gate or filters skip an item |
Per-PR / per-issue state lives in memory under keys like devops_pr_review_<owner>_<repo>_<num> and devops_issue_state_<owner>_<repo>_<num> so progress survives daemon restarts.
Troubleshooting
"Nothing seems to be happening"
Check, in order:
auto_evolveis actually on:librefang hand config devopsand confirm the setting readstrue.evolution_reposis non-empty and the pairs areowner/repo(no leadinghttps://, no trailing slash).GITHUB_TOKENis set in the daemon's environment, not just your interactive shell. If you started the daemon before exporting the token, restart it.- The cadence gate hasn't fired yet — check
devops_evolution_cursor_<owner>_<repo>in memory; iflast_tick_atis recent (< evolution_check_intervalago), the next tick is still in the future.
"Reviews are posted but with weird verdicts"
The reviewer sub-agent returns one of approve | request_changes | block | comment_only. The mapping to GitHub review events is:
approve→ posted asCOMMENT(the Hand never auto-APPROVEs; a human still has to)request_changes→REQUEST_CHANGESblock→REQUEST_CHANGESwith a**Reviewer flagged as BLOCKING — escalate to a maintainer**prefix in the bodycomment_only/ anything unexpected →COMMENT
If you're seeing REQUEST_CHANGES more often than expected, inspect the reviewer's summary in the GitHub review body or in memory under devops_pr_review_<owner>_<repo>_<num>.
"Draft PRs land but cargo checks fail in CI"
The implementer runs the project's own lint/test gate locally before pushing (typically cargo clippy --workspace --all-targets -- -D warnings and cargo test -p <crate>). A CI-only failure usually means:
- the project gates CI on commands the implementer didn't run (custom integration tests,
xtaskjobs) — add them to the implementer's lint/test invocations via the project'sjustfileor aCONTRIBUTING.mdrunbook the agent can pick up - the implementer-local cache differs from CI's clean build — usually surfaces as a
Cargo.lockregeneration the implementer didn't commit; the BMAD pipeline should always re-stageCargo.lockafter a dependency-affecting change
"The Hand wants to touch Cargo.toml / migrations / secrets and stops"
That's the path safety floor doing its job. Inspect devops_queue.json; if the change is legitimately needed, manually take it from the queue, perform the edit out-of-band, and the implementer will pick up the next tick on a clean tree.
"Hooks reject the implementer's push"
Most often the upstream repo enforces Co-Authored-By: Claude rejection or similar AI-attribution bans. The implementer's prompt forbids LLM-vendor attribution in commit messages, but process attribution (Generated by DevOps Hand → implementer) is fine and encouraged. If the upstream rejects the latter too, customize the implementer's commit-message template in the Hand definition — do not add --no-verify (the Hand's safety floor blocks it anyway).
What it does NOT do
To set realistic expectations:
- It does not merge PRs. A human always merges.
- It does not mark draft PRs as ready-for-review.
- It does not push to
main/master/ protected branches. - It does not operate on private repos unless the configured PAT has
reposcope and the repo is listed inevolution_repos. - It does not modify
Cargo.tomlworkspace members, migration files, secrets, or >max_changed_filesfiles in a single PR. - It does not consume more than ~70% of the per-turn token budget in a single tick.
For the full Hand-level specification and the prompts each sub-agent runs, see the source: hands/devops/HAND.toml and hands/devops/SKILL.md in the registry.