Verdic
Verdic

Trust infrastructure for LLM applications. Deterministic guardrails for production AI systems.

Product

  • How It Works
  • Pricing
  • Documentation

Company

  • About
  • Blog
  • Contact

Legal

  • Privacy Policy
  • Terms of Service
  • Security

© 2025 Verdic. All rights reserved.

verdic.dev
Technical Deep Dive

Safe Failure Modes: Understanding ALLOW, DOWNGRADE, and BLOCK Decisions

Design resilient LLM systems with graceful degradation. Learn how to implement safe failure modes that protect users without breaking experiences.

Kundan Singh Rathore

Kundan Singh Rathore

Founder & CEO

January 2, 2024
10 min read
System Design
Resilience
Error Handling
Production AI
Safe Failure Modes: Understanding ALLOW, DOWNGRADE, and BLOCK Decisions

Safe Failure Modes: Understanding ALLOW, DOWNGRADE, and BLOCK Decisions

One of the most critical design decisions in production LLM systems is how to handle failures. Unlike traditional software where errors are binary (success or failure), LLM outputs exist on a spectrum from perfect to catastrophic.

The Three Decision Framework

Verdic implements a three-tier decision framework:

ALLOW

Definition: The LLM output passes all validation checks and is safe to show to users without modification.

When to use:

  • Output matches expected intent
  • No hallucinations detected
  • Proper modality (format) enforced
  • No policy violations
  • Appropriate tone and content

Example:

const validation = await verdic.guard({
  output: "Your order #12345 will arrive on January 15th.",
  policy: {
    expectedIntent: "order_status",
    groundTruth: orderDatabase,
    modality: "text"
  }
})

// validation.decision === "ALLOW"
// Return original output to user

DOWNGRADE

Definition: The output has issues but can be salvaged through sanitization, redaction, or modification.

When to use:

  • Minor hallucinations that can be removed
  • PII that needs redaction
  • Inappropriate tone that can be neutralized
  • Excessive length that can be truncated
  • Ambiguous statements that can be clarified

Example:

const validation = await verdic.guard({
  output: "Your order will arrive soon, probably around January 15th. By the way, your email is john@example.com which we have on file.",
  policy: {
    noPII: true,
    confidenceThreshold: 0.9
  }
})

// validation.decision === "DOWNGRADE"
// validation.sanitizedOutput === "Your order will arrive soon."
// PII removed, uncertain date removed

Policy Violation Levels: HARD_BLOCK

Enterprise Framing: Policy violation levels (formerly "WARN/BLOCK")

Definition: The output is unsafe and cannot be salvaged. It must be completely replaced with a fallback response.

When to use:

  • Severe hallucinations
  • Dangerous or harmful content
  • Complete intent mismatch
  • Sensitive data exposure
  • Compliance violations

Example:

const validation = await verdic.guard({
  output: "Your credit card number ending in 4532 has been charged $299.99.",
  policy: {
    noPII: true,
    noFinancialData: true
  }
})

// validation.decision === "BLOCK"
// Return fallback instead of original output
return "I apologize, but I cannot provide that information. Please contact support."

Implementation Patterns

Pattern 1: Progressive Fallback

async function handleUserQuery(query: string): Promise<string> {
  // Try primary LLM
  const primaryResponse = await openai.generate(query)
  const primaryValidation = await verdic.guard(primaryResponse)

  if (primaryValidation.decision === "ALLOW") {
    return primaryResponse
  }

  if (primaryValidation.decision === "DOWNGRADE") {
    return primaryValidation.sanitizedOutput
  }

  // BLOCK: Try smaller, safer model
  const fallbackResponse = await openai.generate(query, { 
    model: "gpt-3.5-turbo",
    temperature: 0.3 // More deterministic
  })
  
  const fallbackValidation = await verdic.guard(fallbackResponse)

  if (fallbackValidation.decision !== "BLOCK") {
    return fallbackValidation.sanitizedOutput || fallbackResponse
  }

  // Ultimate fallback: template response
  return getTemplateResponse(query)
}

Pattern 2: Confidence-Based Downgrade

async function confidenceBasedGuard(output: string) {
  const validation = await verdic.guard({
    output,
    policy: {
      minConfidence: 0.8,
      groundTruth: knowledgeBase
    }
  })

  if (validation.confidence >= 0.9) {
    // High confidence: ALLOW
    return { decision: "ALLOW", output }
  }

  if (validation.confidence >= 0.7) {
    // Medium confidence: DOWNGRADE with disclaimer
    return {
      decision: "DOWNGRADE",
      output: `${output}

*Note: This information should be verified.*`
    }
  }

  // Low confidence: BLOCK
  return {
    decision: "BLOCK",
    output: "I don't have enough information to answer that accurately."
  }
}

Pattern 3: Context-Aware Decisions

async function contextAwareGuard(
  output: string,
  context: {
    userRole: 'admin' | 'user' | 'guest'
    criticality: 'low' | 'medium' | 'high'
  }
) {
  const validation = await verdic.guard(output)

  // Admins see more information, including downgraded content
  if (context.userRole === 'admin') {
    return {
      decision: validation.decision,
      output: validation.decision === "BLOCK" 
        ? `[BLOCKED] Original: ${output}` 
        : validation.sanitizedOutput || output,
      metadata: validation.violations
    }
  }

  // High criticality: be more strict
  if (context.criticality === 'high') {
    if (validation.decision !== "ALLOW") {
      return {
        decision: "BLOCK",
        output: "This operation requires manual verification."
      }
    }
  }

  // Standard user flow
  switch (validation.decision) {
    case "ALLOW":
      return { decision: "ALLOW", output }
    case "DOWNGRADE":
      return { decision: "DOWNGRADE", output: validation.sanitizedOutput }
    case "BLOCK":
      return { decision: "BLOCK", output: getFallback(context) }
  }
}

Measuring Decision Quality

Track these metrics to optimize your decision thresholds:

1. Decision Distribution

interface DecisionMetrics {
  allow: number // % of ALLOW decisions
  downgrade: number // % of DOWNGRADE decisions
  block: number // % of BLOCK decisions
}

// Healthy production system example:
// { allow: 85%, downgrade: 12%, block: 3% }

2. User Satisfaction by Decision Type

interface SatisfactionMetrics {
  allowSatisfaction: number // User rating for ALLOW responses
  downgradeSatisfaction: number // User rating for DOWNGRADE responses
  blockSatisfaction: number // User rating for fallback responses
}

// Goal: High satisfaction even for DOWNGRADE/BLOCK
// { allow: 4.5/5, downgrade: 4.2/5, block: 3.8/5 }

3. False Positive Rate

How often are good outputs incorrectly blocked or downgraded?

const falsePositiveRate = 
  (incorrectlyBlocked + unnecessarilyDowngraded) / totalOutputs

// Target: < 1%

4. False Negative Rate

How often are bad outputs incorrectly allowed?

const falseNegativeRate = 
  allowedButShouldBeBlocked / totalOutputs

// Target: < 0.1% (very strict)

Tuning Your Policies

Start strict, then relax based on data:

Week 1: Strict Mode

const strictPolicy = {
  minConfidence: 0.95,
  allowHallucinations: false,
  allowAmbiguity: false,
  maxUncertainty: 0.05
}

// Expect high BLOCK rate (20-30%)
// Gather data on false positives

Week 2-4: Adjust Thresholds

const adjustedPolicy = {
  minConfidence: 0.85, // Reduced based on data
  allowHallucinations: false, // Keep strict
  allowAmbiguity: true, // Relax (low risk)
  maxUncertainty: 0.15 // Increased based on feedback
}

// Target: 10-15% BLOCK rate

Ongoing: Dynamic Policies

async function getDynamicPolicy(context: Context) {
  const historicalPerformance = await getMetrics(context)

  return {
    minConfidence: calculateOptimalThreshold(historicalPerformance),
    allowHallucinations: false,
    allowAmbiguity: historicalPerformance.ambiguityHarmRate < 0.01,
    maxUncertainty: historicalPerformance.optimalUncertainty
  }
}

Production Best Practices

  1. Always have fallbacks: Never let a BLOCK decision break the user experience
  2. Log decision reasons: Understand why outputs were blocked or downgraded
  3. A/B test thresholds: Find the right balance for your use case
  4. User feedback loops: Let users report incorrect decisions
  5. Gradual rollout: Start with conservative policies, relax slowly
  6. Monitor drift: Decision distributions change as models evolve
  7. Context-appropriate severity: High-stakes scenarios need stricter policies

Conclusion

Safe failure modes are essential for production LLM systems. The three-tier ALLOW/DOWNGRADE/BLOCK framework provides flexibility while ensuring safety.

The key is treating AI outputs as untrusted and having a clear, tested policy for every possible failure mode. With proper guardrails, you can deploy LLMs confidently knowing that even when they fail, they fail safely.

Ready to Build Safer AI?

Get your API key and start implementing enterprise-grade guardrails in minutes.