Safe Failure Modes: Understanding ALLOW, DOWNGRADE, and BLOCK Decisions
One of the most critical design decisions in production LLM systems is how to handle failures. Unlike traditional software where errors are binary (success or failure), LLM outputs exist on a spectrum from perfect to catastrophic.
The Three Decision Framework
Verdic implements a three-tier decision framework:
ALLOW
Definition: The LLM output passes all validation checks and is safe to show to users without modification.
When to use:
- Output matches expected intent
- No hallucinations detected
- Proper modality (format) enforced
- No policy violations
- Appropriate tone and content
Example:
const validation = await verdic.guard({
output: "Your order #12345 will arrive on January 15th.",
policy: {
expectedIntent: "order_status",
groundTruth: orderDatabase,
modality: "text"
}
})
// validation.decision === "ALLOW"
// Return original output to user
DOWNGRADE
Definition: The output has issues but can be salvaged through sanitization, redaction, or modification.
When to use:
- Minor hallucinations that can be removed
- PII that needs redaction
- Inappropriate tone that can be neutralized
- Excessive length that can be truncated
- Ambiguous statements that can be clarified
Example:
const validation = await verdic.guard({
output: "Your order will arrive soon, probably around January 15th. By the way, your email is john@example.com which we have on file.",
policy: {
noPII: true,
confidenceThreshold: 0.9
}
})
// validation.decision === "DOWNGRADE"
// validation.sanitizedOutput === "Your order will arrive soon."
// PII removed, uncertain date removed
Policy Violation Levels: HARD_BLOCK
Enterprise Framing: Policy violation levels (formerly "WARN/BLOCK")
Definition: The output is unsafe and cannot be salvaged. It must be completely replaced with a fallback response.
When to use:
- Severe hallucinations
- Dangerous or harmful content
- Complete intent mismatch
- Sensitive data exposure
- Compliance violations
Example:
const validation = await verdic.guard({
output: "Your credit card number ending in 4532 has been charged $299.99.",
policy: {
noPII: true,
noFinancialData: true
}
})
// validation.decision === "BLOCK"
// Return fallback instead of original output
return "I apologize, but I cannot provide that information. Please contact support."
Implementation Patterns
Pattern 1: Progressive Fallback
async function handleUserQuery(query: string): Promise<string> {
// Try primary LLM
const primaryResponse = await openai.generate(query)
const primaryValidation = await verdic.guard(primaryResponse)
if (primaryValidation.decision === "ALLOW") {
return primaryResponse
}
if (primaryValidation.decision === "DOWNGRADE") {
return primaryValidation.sanitizedOutput
}
// BLOCK: Try smaller, safer model
const fallbackResponse = await openai.generate(query, {
model: "gpt-3.5-turbo",
temperature: 0.3 // More deterministic
})
const fallbackValidation = await verdic.guard(fallbackResponse)
if (fallbackValidation.decision !== "BLOCK") {
return fallbackValidation.sanitizedOutput || fallbackResponse
}
// Ultimate fallback: template response
return getTemplateResponse(query)
}
Pattern 2: Confidence-Based Downgrade
async function confidenceBasedGuard(output: string) {
const validation = await verdic.guard({
output,
policy: {
minConfidence: 0.8,
groundTruth: knowledgeBase
}
})
if (validation.confidence >= 0.9) {
// High confidence: ALLOW
return { decision: "ALLOW", output }
}
if (validation.confidence >= 0.7) {
// Medium confidence: DOWNGRADE with disclaimer
return {
decision: "DOWNGRADE",
output: `${output}
*Note: This information should be verified.*`
}
}
// Low confidence: BLOCK
return {
decision: "BLOCK",
output: "I don't have enough information to answer that accurately."
}
}
Pattern 3: Context-Aware Decisions
async function contextAwareGuard(
output: string,
context: {
userRole: 'admin' | 'user' | 'guest'
criticality: 'low' | 'medium' | 'high'
}
) {
const validation = await verdic.guard(output)
// Admins see more information, including downgraded content
if (context.userRole === 'admin') {
return {
decision: validation.decision,
output: validation.decision === "BLOCK"
? `[BLOCKED] Original: ${output}`
: validation.sanitizedOutput || output,
metadata: validation.violations
}
}
// High criticality: be more strict
if (context.criticality === 'high') {
if (validation.decision !== "ALLOW") {
return {
decision: "BLOCK",
output: "This operation requires manual verification."
}
}
}
// Standard user flow
switch (validation.decision) {
case "ALLOW":
return { decision: "ALLOW", output }
case "DOWNGRADE":
return { decision: "DOWNGRADE", output: validation.sanitizedOutput }
case "BLOCK":
return { decision: "BLOCK", output: getFallback(context) }
}
}
Measuring Decision Quality
Track these metrics to optimize your decision thresholds:
1. Decision Distribution
interface DecisionMetrics {
allow: number // % of ALLOW decisions
downgrade: number // % of DOWNGRADE decisions
block: number // % of BLOCK decisions
}
// Healthy production system example:
// { allow: 85%, downgrade: 12%, block: 3% }
2. User Satisfaction by Decision Type
interface SatisfactionMetrics {
allowSatisfaction: number // User rating for ALLOW responses
downgradeSatisfaction: number // User rating for DOWNGRADE responses
blockSatisfaction: number // User rating for fallback responses
}
// Goal: High satisfaction even for DOWNGRADE/BLOCK
// { allow: 4.5/5, downgrade: 4.2/5, block: 3.8/5 }
3. False Positive Rate
How often are good outputs incorrectly blocked or downgraded?
const falsePositiveRate =
(incorrectlyBlocked + unnecessarilyDowngraded) / totalOutputs
// Target: < 1%
4. False Negative Rate
How often are bad outputs incorrectly allowed?
const falseNegativeRate =
allowedButShouldBeBlocked / totalOutputs
// Target: < 0.1% (very strict)
Tuning Your Policies
Start strict, then relax based on data:
Week 1: Strict Mode
const strictPolicy = {
minConfidence: 0.95,
allowHallucinations: false,
allowAmbiguity: false,
maxUncertainty: 0.05
}
// Expect high BLOCK rate (20-30%)
// Gather data on false positives
Week 2-4: Adjust Thresholds
const adjustedPolicy = {
minConfidence: 0.85, // Reduced based on data
allowHallucinations: false, // Keep strict
allowAmbiguity: true, // Relax (low risk)
maxUncertainty: 0.15 // Increased based on feedback
}
// Target: 10-15% BLOCK rate
Ongoing: Dynamic Policies
async function getDynamicPolicy(context: Context) {
const historicalPerformance = await getMetrics(context)
return {
minConfidence: calculateOptimalThreshold(historicalPerformance),
allowHallucinations: false,
allowAmbiguity: historicalPerformance.ambiguityHarmRate < 0.01,
maxUncertainty: historicalPerformance.optimalUncertainty
}
}
Production Best Practices
- Always have fallbacks: Never let a BLOCK decision break the user experience
- Log decision reasons: Understand why outputs were blocked or downgraded
- A/B test thresholds: Find the right balance for your use case
- User feedback loops: Let users report incorrect decisions
- Gradual rollout: Start with conservative policies, relax slowly
- Monitor drift: Decision distributions change as models evolve
- Context-appropriate severity: High-stakes scenarios need stricter policies
Conclusion
Safe failure modes are essential for production LLM systems. The three-tier ALLOW/DOWNGRADE/BLOCK framework provides flexibility while ensuring safety.
The key is treating AI outputs as untrusted and having a clear, tested policy for every possible failure mode. With proper guardrails, you can deploy LLMs confidently knowing that even when they fail, they fail safely.

