Integrity & Content Safety

This page covers tamper evidence, signed content, taint propagation, secret handling, and skill-content scanning.

Included Topics

  • Merkle Hash Chain Audit Trail
  • Information Flow Taint Tracking
  • Ed25519 Manifest Signing
  • Secret Zeroization
  • Prompt Injection Scanner
  • Prompt Injection Guard

Merkle Hash Chain Audit Trail

Source: librefang-runtime/src/audit.rs

Every security-critical action is appended to a tamper-evident Merkle hash chain, similar to a blockchain. Each entry contains the SHA-256 hash of its own contents concatenated with the hash of the previous entry.

Auditable Actions

pub enum AuditAction {
    ToolInvoke,
    CapabilityCheck,
    AgentSpawn,
    AgentKill,
    AgentMessage,
    MemoryAccess,
    FileAccess,
    NetworkAccess,
    ShellExec,
    AuthAttempt,
    WireConnect,
    ConfigChange,
}

Entry Structure

pub struct AuditEntry {
    pub seq: u64,          // Monotonically increasing sequence number
    pub timestamp: String, // ISO-8601
    pub agent_id: String,
    pub action: AuditAction,
    pub detail: String,    // e.g. tool name, file path
    pub outcome: String,   // "ok", "denied", error message
    pub prev_hash: String, // SHA-256 of previous entry (or 64 zeros)
    pub hash: String,      // SHA-256 of this entry + prev_hash
}

Hash Computation

Each entry's hash is computed from all of its fields concatenated with the previous entry's hash:

fn compute_entry_hash(
    seq: u64, timestamp: &str, agent_id: &str,
    action: &AuditAction, detail: &str,
    outcome: &str, prev_hash: &str,
) -> String {
    let mut hasher = Sha256::new();
    hasher.update(seq.to_string().as_bytes());
    hasher.update(timestamp.as_bytes());
    hasher.update(agent_id.as_bytes());
    hasher.update(action.to_string().as_bytes());
    hasher.update(detail.as_bytes());
    hasher.update(outcome.as_bytes());
    hasher.update(prev_hash.as_bytes());
    hex::encode(hasher.finalize())
}

Chain Integrity Verification

AuditLog::verify_integrity() walks the entire chain and recomputes every hash. If any entry has been tampered with, the recomputed hash will not match the stored hash, or the prev_hash linkage will be broken:

pub fn verify_integrity(&self) -> Result<(), String> {
    let entries = self.entries.lock().unwrap_or_else(|e| e.into_inner());
    let mut expected_prev = "0".repeat(64);  // Genesis sentinel

    for entry in entries.iter() {
        if entry.prev_hash != expected_prev {
            return Err(format!(
                "chain break at seq {}: expected prev_hash {} but found {}",
                entry.seq, expected_prev, entry.prev_hash
            ));
        }
        let recomputed = compute_entry_hash(/* ... */);
        if recomputed != entry.hash {
            return Err(format!(
                "hash mismatch at seq {}: expected {} but found {}",
                entry.seq, recomputed, entry.hash
            ));
        }
        expected_prev = entry.hash.clone();
    }
    Ok(())
}

Thread Safety

AuditLog uses Mutex<Vec<AuditEntry>> and Mutex<String> for the tip hash. Both locks use unwrap_or_else(|e| e.into_inner()) to recover from poisoned mutexes, ensuring the audit log remains available even after a panic.

API

MethodDescription
AuditLog::new()Creates an empty log with genesis sentinel ("0" * 64)
record(agent_id, action, detail, outcome)Appends an entry, returns its hash
verify_integrity()Validates the entire chain
tip_hash()Returns the hash of the most recent entry
len() / is_empty()Entry count
recent(n)Returns the most recent n entries (cloned)

Information Flow Taint Tracking

Source: librefang-types/src/taint.rs

LibreFang implements a lattice-based taint propagation model that prevents tainted values from flowing into sensitive sinks without explicit declassification. This guards against prompt injection, data exfiltration, and confused-deputy attacks.

Taint Labels

pub enum TaintLabel {
    ExternalNetwork,  // Data from external network requests
    UserInput,        // Direct user input
    Pii,              // Personally identifiable information
    Secret,           // API keys, tokens, passwords
    UntrustedAgent,   // Data from sandboxed/untrusted agents
}

Tainted Values

pub struct TaintedValue {
    pub value: String,              // The payload
    pub labels: HashSet<TaintLabel>, // Attached taint labels
    pub source: String,             // Human-readable origin
}

Key methods:

MethodDescription
TaintedValue::new(value, labels, source)Create with labels
TaintedValue::clean(value, source)Create with no labels (untainted)
merge_taint(&mut self, other)Union of labels (for concatenation)
check_sink(&self, sink)Check if value can flow to sink
declassify(&mut self, label)Remove a specific label (explicit security decision)
is_tainted(&self) -> boolTrue if any labels present

Taint Sinks

A TaintSink defines which labels are blocked from reaching it:

SinkBlocked LabelsRationale
TaintSink::shell_exec()ExternalNetwork, UntrustedAgent, UserInputPrevents command injection
TaintSink::net_fetch()Secret, PiiPrevents data exfiltration
TaintSink::agent_message()SecretPrevents secret leakage to other agents

Violation Handling

When check_sink() finds a blocked label, it returns a TaintViolation:

pub struct TaintViolation {
    pub label: TaintLabel,    // The offending label
    pub sink_name: String,    // "shell_exec", "net_fetch", etc.
    pub source: String,       // Where the tainted value came from
}

Display: taint violation: label 'Secret' from source 'env_var' is not allowed to reach sink 'net_fetch'

Declassification

Declassification is an explicit security decision. The caller asserts that the value has been sanitized:

tainted.declassify(&TaintLabel::ExternalNetwork);
tainted.declassify(&TaintLabel::UserInput);
// After declassification, value can flow to shell_exec
assert!(tainted.check_sink(&TaintSink::shell_exec()).is_ok());

Taint Propagation

When two values are combined (concatenation, interpolation), the result must carry the union of both label sets:

let mut combined = TaintedValue::new(/* ... */);
combined.merge_taint(&other_value);
// combined.labels is now the union of both

Ed25519 Manifest Signing

Source: librefang-types/src/manifest_signing.rs

Agent manifests define an agent's capabilities, tools, and configuration. A compromised manifest can grant elevated privileges. This module provides Ed25519-based cryptographic signing.

Signing Scheme

  1. Compute SHA-256 of the manifest content (raw TOML text).
  2. Sign the hash with Ed25519 (via ed25519-dalek).
  3. Bundle the signature, public key, and content hash into a SignedManifest envelope.

SignedManifest Structure

pub struct SignedManifest {
    pub manifest: String,           // Raw TOML content
    pub content_hash: String,       // Hex SHA-256 of manifest
    pub signature: Vec<u8>,         // Ed25519 signature (64 bytes)
    pub signer_public_key: Vec<u8>, // Ed25519 public key (32 bytes)
    pub signer_id: String,          // Human-readable signer ID
}

Signing

let signing_key = SigningKey::generate(&mut OsRng);
let signed = SignedManifest::sign(manifest_toml, &signing_key, "admin@org.com");

Internally:

pub fn sign(manifest: impl Into<String>, signing_key: &SigningKey, signer_id: impl Into<String>) -> Self {
    let manifest = manifest.into();
    let content_hash = hash_manifest(&manifest);  // SHA-256
    let signature = signing_key.sign(content_hash.as_bytes());
    let verifying_key = signing_key.verifying_key();
    Self {
        manifest,
        content_hash,
        signature: signature.to_bytes().to_vec(),
        signer_public_key: verifying_key.to_bytes().to_vec(),
        signer_id: signer_id.into(),
    }
}

Verification

Two-phase verification:

  1. Hash check: Recompute SHA-256 of manifest and compare to content_hash.
  2. Signature check: Verify the Ed25519 signature over content_hash using signer_public_key.
pub fn verify(&self) -> Result<(), String> {
    let recomputed = hash_manifest(&self.manifest);
    if recomputed != self.content_hash {
        return Err("content hash mismatch: ...");
    }
    let verifying_key = VerifyingKey::from_bytes(&pk_bytes)?;
    let signature = Signature::from_bytes(&sig_bytes);
    verifying_key.verify(self.content_hash.as_bytes(), &signature)
        .map_err(|e| format!("signature verification failed: {}", e))
}

Tamper Detection

  • Modifying the manifest content after signing causes a content hash mismatch.
  • Replacing the public key with a different key causes a signature verification failure.
  • Both attacks are caught by verify().

Secret Zeroization

Source: All LLM driver modules, channel adapters, and web search modules.

LibreFang uses Zeroizing<String> from the zeroize crate on every field that holds secret material. When the value is dropped, its memory is overwritten with zeros, preventing secrets from lingering in memory.

How It Works

Zeroizing<T> is a smart-pointer wrapper from the zeroize crate. It implements Deref<Target=T> for transparent usage and Drop for automatic zeroization:

// On Drop, the inner String's buffer is overwritten with zeros
let key = Zeroizing::new("sk-secret-key".to_string());
// Use key transparently via Deref
client.post(url).header("authorization", format!("Bearer {}", &*key));
// When key goes out of scope, memory is zeroed

Fields Using Zeroization

LLM Drivers (librefang-runtime/src/drivers/):

DriverField
AnthropicDriverapi_key: Zeroizing<String>
GeminiDriverapi_key: Zeroizing<String>
OpenAiCompatDriverapi_key: Zeroizing<String>

Channel Adapters (librefang-channels/src/):

AdapterField(s)
DiscordAdaptertoken: Zeroizing<String>
EmailAdapterpassword: Zeroizing<String>
BlueskyAdapterapp_password: Zeroizing<String>
DingTalkAdapteraccess_token: Zeroizing<String>, secret: Zeroizing<String>, client_id: Zeroizing<String>, client_secret: Zeroizing<String>
FeishuAdapterapp_secret: Zeroizing<String>
FlockAdapterbot_token: Zeroizing<String>
GitterAdaptertoken: Zeroizing<String>
GotifyAdapterapp_token: Zeroizing<String>, client_token: Zeroizing<String>

Web Search (librefang-runtime/src/web_search.rs):

fn resolve_api_key(env_var: &str) -> Option<Zeroizing<String>> {
    std::env::var(env_var).ok().filter(|k| !k.is_empty()).map(Zeroizing::new)
}

Embedding (librefang-runtime/src/embedding.rs):

StructField
EmbeddingClientapi_key: Zeroizing<String>

Why It Matters

Without zeroization, secrets remain in memory after use until the OS reclaims the page. An attacker with access to a core dump, swap file, or memory forensics tool can recover API keys. Zeroizing<String> ensures the secret is overwritten as soon as it is no longer needed.


Prompt Injection Scanner

Source: librefang-skills/src/verify.rs

The SkillVerifier provides two scanning functions: security_scan() for skill manifests and scan_prompt_content() for skill prompt text (SKILL.md body).

Manifest Security Scan

SkillVerifier::security_scan(manifest) inspects a skill's declared requirements:

CheckSeverityTrigger
Node.js runtimeWarningruntime_type == SkillRuntime::Node
Shell execution capabilityCriticalCapability contains shellexec or shell_exec
Unrestricted networkWarningCapability contains netconnect(*)
Shell toolCriticalTool is shell_exec or bash
Filesystem write toolWarningTool is file_write or file_delete
Too many toolsInfoMore than 10 tools required

Prompt Injection Scan

SkillVerifier::scan_prompt_content(content) detects common attack patterns in skill prompt text:

Critical -- Prompt override attempts:

"ignore previous instructions", "ignore all previous",
"disregard previous", "forget your instructions",
"you are now", "new instructions:", "system prompt override",
"ignore the above", "do not follow", "override system"

Warning -- Data exfiltration patterns:

"send to http", "send to https", "post to http", "post to https",
"exfiltrate", "forward all", "send all data",
"base64 encode and send", "upload to"

Warning -- Shell command references:

"rm -rf", "chmod ", "sudo "

Info -- Excessive length:

Content over 50,000 bytes triggers an info-level warning about potential LLM performance degradation.

SHA256 Checksum Verification

pub fn verify_checksum(data: &[u8], expected_sha256: &str) -> bool {
    let actual = Self::sha256_hex(data);
    actual == expected_sha256.to_lowercase()
}

Skills installed from ClawHub have their content verified against a known SHA256 hash to detect tampering during download.

Warning Structure

pub struct SkillWarning {
    pub severity: WarningSeverity,  // Info, Warning, Critical
    pub message: String,
}

Prompt Injection Guard

Source: librefang-kernel/src/injection_guard.rs

Before user input is forwarded to an LLM, the kernel scans the message for known prompt-injection indicators. Unlike skill scanning, this guard operates on runtime user input rather than static skill content, and it does not reject the message — it prepends a warning prefix and emits a structured log entry so that the model (and any downstream auditing) can reason about the potentially adversarial content with full situational awareness.

Detection Approach

The guard checks for two distinct threat classes:

1. Text Patterns (15 rules)

Common English-language phrases used to hijack instruction following:

PatternThreat class
ignore previous instructionsOverride attempt
ignore all previousOverride attempt
disregard your instructionsOverride attempt
disregard previousOverride attempt
forget your instructionsOverride attempt
you are nowPersona hijack
new instructions:Override injection
system:System-prompt injection
system prompt overrideOverride attempt
override systemOverride attempt
ignore the aboveOverride attempt
do not followOverride attempt
act as if you have no restrictionsJailbreak attempt
[system]Fake system-turn marker
<system>Fake system-turn marker

Matching is case-insensitive to catch IGNORE PREVIOUS INSTRUCTIONS and similar variants.

2. Invisible Unicode (10 code points)

Adversarial content is sometimes hidden using characters that are visually absent but semantically processed by the tokenizer:

Code pointNameCategory
U+200BZERO WIDTH SPACEZero-width
U+200CZERO WIDTH NON-JOINERZero-width
U+200DZERO WIDTH JOINERZero-width
U+2060WORD JOINERZero-width
U+FEFFZERO WIDTH NO-BREAK SPACE (BOM)Zero-width
U+202ALEFT-TO-RIGHT EMBEDDINGBidi control
U+202BRIGHT-TO-LEFT EMBEDDINGBidi control
U+202CPOP DIRECTIONAL FORMATTINGBidi control
U+202DLEFT-TO-RIGHT OVERRIDEBidi control
U+202ERIGHT-TO-LEFT OVERRIDEBidi control

Bidi override characters (U+202D, U+202E) are particularly dangerous because they can visually reverse text in a UI while the tokenizer processes the original byte order.

Detection Response

When either check fires, the kernel:

  1. Prepends a warning prefix to the user message before it is placed in the LLM context window:

    [Warning: potential prompt injection detected — proceeding with caution]
    
  2. Emits a structured warning log via tracing::warn!:

    prompt injection indicator detected: "ignore previous instructions" in message from user <id>
    

The modified message is sent to the LLM unchanged except for the prefix. The LLM therefore sees both the warning and the potentially hostile content and can apply its own policy.

Why Not Reject?

Legitimate messages can contain any of the detected phrases in innocuous contexts — for example:

  • A developer asking the agent to explain the phrase "ignore previous instructions".
  • A security researcher asking the agent to write a test case that includes injection strings.
  • Copy-pasted documentation that happens to contain system: as a YAML key.

Outright rejection in these cases would silently break valid workflows. The warning-prefix approach preserves usability while giving the model (and the audit trail) a clear signal to act on.

Limitations

Rule-based matching cannot cover the full injection surface:

  • Paraphrased or multi-lingual override attempts are not detected.
  • Injections split across multiple messages may evade single-message scanning.
  • Encoding tricks (e.g. base64 payloads decoded at runtime by the model) are out of scope.

The prompt injection guard is one defense layer within a broader strategy that also includes taint tracking, capability enforcement, and human-in-the-loop approvals. It is not a complete solution in isolation.