Integrity & Content Safety
This page covers tamper evidence, signed content, taint propagation, secret handling, and skill-content scanning.
Included Topics
- Merkle Hash Chain Audit Trail
- Information Flow Taint Tracking
- Ed25519 Manifest Signing
- Secret Zeroization
- Prompt Injection Scanner
- Prompt Injection Guard
Merkle Hash Chain Audit Trail
Source: librefang-runtime/src/audit.rs
Every security-critical action is appended to a tamper-evident Merkle hash chain, similar to a blockchain. Each entry contains the SHA-256 hash of its own contents concatenated with the hash of the previous entry.
Auditable Actions
pub enum AuditAction {
ToolInvoke,
CapabilityCheck,
AgentSpawn,
AgentKill,
AgentMessage,
MemoryAccess,
FileAccess,
NetworkAccess,
ShellExec,
AuthAttempt,
WireConnect,
ConfigChange,
}
Entry Structure
pub struct AuditEntry {
pub seq: u64, // Monotonically increasing sequence number
pub timestamp: String, // ISO-8601
pub agent_id: String,
pub action: AuditAction,
pub detail: String, // e.g. tool name, file path
pub outcome: String, // "ok", "denied", error message
pub prev_hash: String, // SHA-256 of previous entry (or 64 zeros)
pub hash: String, // SHA-256 of this entry + prev_hash
}
Hash Computation
Each entry's hash is computed from all of its fields concatenated with the previous entry's hash:
fn compute_entry_hash(
seq: u64, timestamp: &str, agent_id: &str,
action: &AuditAction, detail: &str,
outcome: &str, prev_hash: &str,
) -> String {
let mut hasher = Sha256::new();
hasher.update(seq.to_string().as_bytes());
hasher.update(timestamp.as_bytes());
hasher.update(agent_id.as_bytes());
hasher.update(action.to_string().as_bytes());
hasher.update(detail.as_bytes());
hasher.update(outcome.as_bytes());
hasher.update(prev_hash.as_bytes());
hex::encode(hasher.finalize())
}
Chain Integrity Verification
AuditLog::verify_integrity() walks the entire chain and recomputes every
hash. If any entry has been tampered with, the recomputed hash will not match
the stored hash, or the prev_hash linkage will be broken:
pub fn verify_integrity(&self) -> Result<(), String> {
let entries = self.entries.lock().unwrap_or_else(|e| e.into_inner());
let mut expected_prev = "0".repeat(64); // Genesis sentinel
for entry in entries.iter() {
if entry.prev_hash != expected_prev {
return Err(format!(
"chain break at seq {}: expected prev_hash {} but found {}",
entry.seq, expected_prev, entry.prev_hash
));
}
let recomputed = compute_entry_hash(/* ... */);
if recomputed != entry.hash {
return Err(format!(
"hash mismatch at seq {}: expected {} but found {}",
entry.seq, recomputed, entry.hash
));
}
expected_prev = entry.hash.clone();
}
Ok(())
}
Thread Safety
AuditLog uses Mutex<Vec<AuditEntry>> and Mutex<String> for the tip hash.
Both locks use unwrap_or_else(|e| e.into_inner()) to recover from poisoned
mutexes, ensuring the audit log remains available even after a panic.
API
| Method | Description |
|---|---|
AuditLog::new() | Creates an empty log with genesis sentinel ("0" * 64) |
record(agent_id, action, detail, outcome) | Appends an entry, returns its hash |
verify_integrity() | Validates the entire chain |
tip_hash() | Returns the hash of the most recent entry |
len() / is_empty() | Entry count |
recent(n) | Returns the most recent n entries (cloned) |
Information Flow Taint Tracking
Source: librefang-types/src/taint.rs
LibreFang implements a lattice-based taint propagation model that prevents tainted values from flowing into sensitive sinks without explicit declassification. This guards against prompt injection, data exfiltration, and confused-deputy attacks.
Taint Labels
pub enum TaintLabel {
ExternalNetwork, // Data from external network requests
UserInput, // Direct user input
Pii, // Personally identifiable information
Secret, // API keys, tokens, passwords
UntrustedAgent, // Data from sandboxed/untrusted agents
}
Tainted Values
pub struct TaintedValue {
pub value: String, // The payload
pub labels: HashSet<TaintLabel>, // Attached taint labels
pub source: String, // Human-readable origin
}
Key methods:
| Method | Description |
|---|---|
TaintedValue::new(value, labels, source) | Create with labels |
TaintedValue::clean(value, source) | Create with no labels (untainted) |
merge_taint(&mut self, other) | Union of labels (for concatenation) |
check_sink(&self, sink) | Check if value can flow to sink |
declassify(&mut self, label) | Remove a specific label (explicit security decision) |
is_tainted(&self) -> bool | True if any labels present |
Taint Sinks
A TaintSink defines which labels are blocked from reaching it:
| Sink | Blocked Labels | Rationale |
|---|---|---|
TaintSink::shell_exec() | ExternalNetwork, UntrustedAgent, UserInput | Prevents command injection |
TaintSink::net_fetch() | Secret, Pii | Prevents data exfiltration |
TaintSink::agent_message() | Secret | Prevents secret leakage to other agents |
Violation Handling
When check_sink() finds a blocked label, it returns a TaintViolation:
pub struct TaintViolation {
pub label: TaintLabel, // The offending label
pub sink_name: String, // "shell_exec", "net_fetch", etc.
pub source: String, // Where the tainted value came from
}
Display: taint violation: label 'Secret' from source 'env_var' is not allowed to reach sink 'net_fetch'
Declassification
Declassification is an explicit security decision. The caller asserts that the value has been sanitized:
tainted.declassify(&TaintLabel::ExternalNetwork);
tainted.declassify(&TaintLabel::UserInput);
// After declassification, value can flow to shell_exec
assert!(tainted.check_sink(&TaintSink::shell_exec()).is_ok());
Taint Propagation
When two values are combined (concatenation, interpolation), the result must carry the union of both label sets:
let mut combined = TaintedValue::new(/* ... */);
combined.merge_taint(&other_value);
// combined.labels is now the union of both
Ed25519 Manifest Signing
Source: librefang-types/src/manifest_signing.rs
Agent manifests define an agent's capabilities, tools, and configuration. A compromised manifest can grant elevated privileges. This module provides Ed25519-based cryptographic signing.
Signing Scheme
- Compute SHA-256 of the manifest content (raw TOML text).
- Sign the hash with Ed25519 (via
ed25519-dalek). - Bundle the signature, public key, and content hash into a
SignedManifestenvelope.
SignedManifest Structure
pub struct SignedManifest {
pub manifest: String, // Raw TOML content
pub content_hash: String, // Hex SHA-256 of manifest
pub signature: Vec<u8>, // Ed25519 signature (64 bytes)
pub signer_public_key: Vec<u8>, // Ed25519 public key (32 bytes)
pub signer_id: String, // Human-readable signer ID
}
Signing
let signing_key = SigningKey::generate(&mut OsRng);
let signed = SignedManifest::sign(manifest_toml, &signing_key, "admin@org.com");
Internally:
pub fn sign(manifest: impl Into<String>, signing_key: &SigningKey, signer_id: impl Into<String>) -> Self {
let manifest = manifest.into();
let content_hash = hash_manifest(&manifest); // SHA-256
let signature = signing_key.sign(content_hash.as_bytes());
let verifying_key = signing_key.verifying_key();
Self {
manifest,
content_hash,
signature: signature.to_bytes().to_vec(),
signer_public_key: verifying_key.to_bytes().to_vec(),
signer_id: signer_id.into(),
}
}
Verification
Two-phase verification:
- Hash check: Recompute SHA-256 of
manifestand compare tocontent_hash. - Signature check: Verify the Ed25519 signature over
content_hashusingsigner_public_key.
pub fn verify(&self) -> Result<(), String> {
let recomputed = hash_manifest(&self.manifest);
if recomputed != self.content_hash {
return Err("content hash mismatch: ...");
}
let verifying_key = VerifyingKey::from_bytes(&pk_bytes)?;
let signature = Signature::from_bytes(&sig_bytes);
verifying_key.verify(self.content_hash.as_bytes(), &signature)
.map_err(|e| format!("signature verification failed: {}", e))
}
Tamper Detection
- Modifying the manifest content after signing causes a content hash mismatch.
- Replacing the public key with a different key causes a signature verification failure.
- Both attacks are caught by
verify().
Secret Zeroization
Source: All LLM driver modules, channel adapters, and web search modules.
LibreFang uses Zeroizing<String> from the zeroize crate on every field
that holds secret material. When the value is dropped, its memory is
overwritten with zeros, preventing secrets from lingering in memory.
How It Works
Zeroizing<T> is a smart-pointer wrapper from the zeroize crate. It
implements Deref<Target=T> for transparent usage and Drop for automatic
zeroization:
// On Drop, the inner String's buffer is overwritten with zeros
let key = Zeroizing::new("sk-secret-key".to_string());
// Use key transparently via Deref
client.post(url).header("authorization", format!("Bearer {}", &*key));
// When key goes out of scope, memory is zeroed
Fields Using Zeroization
LLM Drivers (librefang-runtime/src/drivers/):
| Driver | Field |
|---|---|
AnthropicDriver | api_key: Zeroizing<String> |
GeminiDriver | api_key: Zeroizing<String> |
OpenAiCompatDriver | api_key: Zeroizing<String> |
Channel Adapters (librefang-channels/src/):
| Adapter | Field(s) |
|---|---|
DiscordAdapter | token: Zeroizing<String> |
EmailAdapter | password: Zeroizing<String> |
BlueskyAdapter | app_password: Zeroizing<String> |
DingTalkAdapter | access_token: Zeroizing<String>, secret: Zeroizing<String>, client_id: Zeroizing<String>, client_secret: Zeroizing<String> |
FeishuAdapter | app_secret: Zeroizing<String> |
FlockAdapter | bot_token: Zeroizing<String> |
GitterAdapter | token: Zeroizing<String> |
GotifyAdapter | app_token: Zeroizing<String>, client_token: Zeroizing<String> |
Web Search (librefang-runtime/src/web_search.rs):
fn resolve_api_key(env_var: &str) -> Option<Zeroizing<String>> {
std::env::var(env_var).ok().filter(|k| !k.is_empty()).map(Zeroizing::new)
}
Embedding (librefang-runtime/src/embedding.rs):
| Struct | Field |
|---|---|
EmbeddingClient | api_key: Zeroizing<String> |
Why It Matters
Without zeroization, secrets remain in memory after use until the OS
reclaims the page. An attacker with access to a core dump, swap file, or
memory forensics tool can recover API keys. Zeroizing<String> ensures
the secret is overwritten as soon as it is no longer needed.
Prompt Injection Scanner
Source: librefang-skills/src/verify.rs
The SkillVerifier provides two scanning functions: security_scan() for
skill manifests and scan_prompt_content() for skill prompt text (SKILL.md
body).
Manifest Security Scan
SkillVerifier::security_scan(manifest) inspects a skill's declared
requirements:
| Check | Severity | Trigger |
|---|---|---|
| Node.js runtime | Warning | runtime_type == SkillRuntime::Node |
| Shell execution capability | Critical | Capability contains shellexec or shell_exec |
| Unrestricted network | Warning | Capability contains netconnect(*) |
| Shell tool | Critical | Tool is shell_exec or bash |
| Filesystem write tool | Warning | Tool is file_write or file_delete |
| Too many tools | Info | More than 10 tools required |
Prompt Injection Scan
SkillVerifier::scan_prompt_content(content) detects common attack patterns
in skill prompt text:
Critical -- Prompt override attempts:
"ignore previous instructions", "ignore all previous",
"disregard previous", "forget your instructions",
"you are now", "new instructions:", "system prompt override",
"ignore the above", "do not follow", "override system"
Warning -- Data exfiltration patterns:
"send to http", "send to https", "post to http", "post to https",
"exfiltrate", "forward all", "send all data",
"base64 encode and send", "upload to"
Warning -- Shell command references:
"rm -rf", "chmod ", "sudo "
Info -- Excessive length:
Content over 50,000 bytes triggers an info-level warning about potential LLM performance degradation.
SHA256 Checksum Verification
pub fn verify_checksum(data: &[u8], expected_sha256: &str) -> bool {
let actual = Self::sha256_hex(data);
actual == expected_sha256.to_lowercase()
}
Skills installed from ClawHub have their content verified against a known SHA256 hash to detect tampering during download.
Warning Structure
pub struct SkillWarning {
pub severity: WarningSeverity, // Info, Warning, Critical
pub message: String,
}
Prompt Injection Guard
Source: librefang-kernel/src/injection_guard.rs
Before user input is forwarded to an LLM, the kernel scans the message for known prompt-injection indicators. Unlike skill scanning, this guard operates on runtime user input rather than static skill content, and it does not reject the message — it prepends a warning prefix and emits a structured log entry so that the model (and any downstream auditing) can reason about the potentially adversarial content with full situational awareness.
Detection Approach
The guard checks for two distinct threat classes:
1. Text Patterns (15 rules)
Common English-language phrases used to hijack instruction following:
| Pattern | Threat class |
|---|---|
ignore previous instructions | Override attempt |
ignore all previous | Override attempt |
disregard your instructions | Override attempt |
disregard previous | Override attempt |
forget your instructions | Override attempt |
you are now | Persona hijack |
new instructions: | Override injection |
system: | System-prompt injection |
system prompt override | Override attempt |
override system | Override attempt |
ignore the above | Override attempt |
do not follow | Override attempt |
act as if you have no restrictions | Jailbreak attempt |
[system] | Fake system-turn marker |
<system> | Fake system-turn marker |
Matching is case-insensitive to catch IGNORE PREVIOUS INSTRUCTIONS and
similar variants.
2. Invisible Unicode (10 code points)
Adversarial content is sometimes hidden using characters that are visually absent but semantically processed by the tokenizer:
| Code point | Name | Category |
|---|---|---|
| U+200B | ZERO WIDTH SPACE | Zero-width |
| U+200C | ZERO WIDTH NON-JOINER | Zero-width |
| U+200D | ZERO WIDTH JOINER | Zero-width |
| U+2060 | WORD JOINER | Zero-width |
| U+FEFF | ZERO WIDTH NO-BREAK SPACE (BOM) | Zero-width |
| U+202A | LEFT-TO-RIGHT EMBEDDING | Bidi control |
| U+202B | RIGHT-TO-LEFT EMBEDDING | Bidi control |
| U+202C | POP DIRECTIONAL FORMATTING | Bidi control |
| U+202D | LEFT-TO-RIGHT OVERRIDE | Bidi control |
| U+202E | RIGHT-TO-LEFT OVERRIDE | Bidi control |
Bidi override characters (U+202D, U+202E) are particularly dangerous because they can visually reverse text in a UI while the tokenizer processes the original byte order.
Detection Response
When either check fires, the kernel:
-
Prepends a warning prefix to the user message before it is placed in the LLM context window:
[Warning: potential prompt injection detected — proceeding with caution] -
Emits a structured warning log via
tracing::warn!:prompt injection indicator detected: "ignore previous instructions" in message from user <id>
The modified message is sent to the LLM unchanged except for the prefix. The LLM therefore sees both the warning and the potentially hostile content and can apply its own policy.
Why Not Reject?
Legitimate messages can contain any of the detected phrases in innocuous contexts — for example:
- A developer asking the agent to explain the phrase
"ignore previous instructions". - A security researcher asking the agent to write a test case that includes injection strings.
- Copy-pasted documentation that happens to contain
system:as a YAML key.
Outright rejection in these cases would silently break valid workflows. The warning-prefix approach preserves usability while giving the model (and the audit trail) a clear signal to act on.
Limitations
Rule-based matching cannot cover the full injection surface:
- Paraphrased or multi-lingual override attempts are not detected.
- Injections split across multiple messages may evade single-message scanning.
- Encoding tricks (e.g. base64 payloads decoded at runtime by the model) are out of scope.
The prompt injection guard is one defense layer within a broader strategy that also includes taint tracking, capability enforcement, and human-in-the-loop approvals. It is not a complete solution in isolation.