Safe Failure Modes: Understanding ALLOW, DOWNGRADE, and BLOCK Decisions

One of the most critical design decisions in production LLM systems is how to handle failures. Unlike traditional software where errors are binary (success or failure), LLM outputs exist on a spectrum from perfect to catastrophic.

The Three Decision Framework

Verdic implements a three-tier decision framework:

ALLOW

Definition: The LLM output passes all validation checks and is safe to show to users without modification.

When to use:

Output matches expected intent
No hallucinations detected
Proper modality (format) enforced
No policy violations
Appropriate tone and content

Example:

const validation = await verdic.guard({
  output: "Your order #12345 will arrive on January 15th.",
  policy: {
    expectedIntent: "order_status",
    groundTruth: orderDatabase,
    modality: "text"
  }
})

// validation.decision === "ALLOW"
// Return original output to user

DOWNGRADE

Definition: The output has issues but can be salvaged through sanitization, redaction, or modification.

When to use:

Minor hallucinations that can be removed
PII that needs redaction
Inappropriate tone that can be neutralized
Excessive length that can be truncated
Ambiguous statements that can be clarified

Example:

const validation = await verdic.guard({
  output: "Your order will arrive soon, probably around January 15th. By the way, your email is john@example.com which we have on file.",
  policy: {
    noPII: true,
    confidenceThreshold: 0.9
  }
})

// validation.decision === "DOWNGRADE"
// validation.sanitizedOutput === "Your order will arrive soon."
// PII removed, uncertain date removed

Policy Violation Levels: HARD_BLOCK

Enterprise Framing: Policy violation levels (formerly "WARN/BLOCK")

Definition: The output is unsafe and cannot be salvaged. It must be completely replaced with a fallback response.

When to use:

Severe hallucinations
Dangerous or harmful content
Complete intent mismatch
Sensitive data exposure
Compliance violations

Example:

const validation = await verdic.guard({
  output: "Your credit card number ending in 4532 has been charged $299.99.",
  policy: {
    noPII: true,
    noFinancialData: true
  }
})

// validation.decision === "BLOCK"
// Return fallback instead of original output
return "I apologize, but I cannot provide that information. Please contact support."

Implementation Patterns

Pattern 1: Progressive Fallback

async function handleUserQuery(query: string): Promise<string> {
  // Try primary LLM
  const primaryResponse = await openai.generate(query)
  const primaryValidation = await verdic.guard(primaryResponse)

  if (primaryValidation.decision === "ALLOW") {
    return primaryResponse
  }

  if (primaryValidation.decision === "DOWNGRADE") {
    return primaryValidation.sanitizedOutput
  }

  // BLOCK: Try smaller, safer model
  const fallbackResponse = await openai.generate(query, { 
    model: "gpt-3.5-turbo",
    temperature: 0.3 // More deterministic
  })
  
  const fallbackValidation = await verdic.guard(fallbackResponse)

  if (fallbackValidation.decision !== "BLOCK") {
    return fallbackValidation.sanitizedOutput || fallbackResponse
  }

  // Ultimate fallback: template response
  return getTemplateResponse(query)
}

Pattern 2: Confidence-Based Downgrade

async function confidenceBasedGuard(output: string) {
  const validation = await verdic.guard({
    output,
    policy: {
      minConfidence: 0.8,
      groundTruth: knowledgeBase
    }
  })

  if (validation.confidence >= 0.9) {
    // High confidence: ALLOW
    return { decision: "ALLOW", output }
  }

  if (validation.confidence >= 0.7) {
    // Medium confidence: DOWNGRADE with disclaimer
    return {
      decision: "DOWNGRADE",
      output: `${output}

*Note: This information should be verified.*`
    }
  }

  // Low confidence: BLOCK
  return {
    decision: "BLOCK",
    output: "I don't have enough information to answer that accurately."
  }
}

Pattern 3: Context-Aware Decisions

async function contextAwareGuard(
  output: string,
  context: {
    userRole: 'admin' | 'user' | 'guest'
    criticality: 'low' | 'medium' | 'high'
  }
) {
  const validation = await verdic.guard(output)

  // Admins see more information, including downgraded content
  if (context.userRole === 'admin') {
    return {
      decision: validation.decision,
      output: validation.decision === "BLOCK" 
        ? `[BLOCKED] Original: ${output}` 
        : validation.sanitizedOutput || output,
      metadata: validation.violations
    }
  }

  // High criticality: be more strict
  if (context.criticality === 'high') {
    if (validation.decision !== "ALLOW") {
      return {
        decision: "BLOCK",
        output: "This operation requires manual verification."
      }
    }
  }

  // Standard user flow
  switch (validation.decision) {
    case "ALLOW":
      return { decision: "ALLOW", output }
    case "DOWNGRADE":
      return { decision: "DOWNGRADE", output: validation.sanitizedOutput }
    case "BLOCK":
      return { decision: "BLOCK", output: getFallback(context) }
  }
}

Measuring Decision Quality

Track these metrics to optimize your decision thresholds:

1. Decision Distribution

interface DecisionMetrics {
  allow: number // % of ALLOW decisions
  downgrade: number // % of DOWNGRADE decisions
  block: number // % of BLOCK decisions
}

// Healthy production system example:
// { allow: 85%, downgrade: 12%, block: 3% }

2. User Satisfaction by Decision Type

interface SatisfactionMetrics {
  allowSatisfaction: number // User rating for ALLOW responses
  downgradeSatisfaction: number // User rating for DOWNGRADE responses
  blockSatisfaction: number // User rating for fallback responses
}

// Goal: High satisfaction even for DOWNGRADE/BLOCK
// { allow: 4.5/5, downgrade: 4.2/5, block: 3.8/5 }

3. False Positive Rate

How often are good outputs incorrectly blocked or downgraded?

const falsePositiveRate = 
  (incorrectlyBlocked + unnecessarilyDowngraded) / totalOutputs

// Target: < 1%

4. False Negative Rate

How often are bad outputs incorrectly allowed?

const falseNegativeRate = 
  allowedButShouldBeBlocked / totalOutputs

// Target: < 0.1% (very strict)

Tuning Your Policies

Start strict, then relax based on data:

Week 1: Strict Mode

const strictPolicy = {
  minConfidence: 0.95,
  allowHallucinations: false,
  allowAmbiguity: false,
  maxUncertainty: 0.05
}

// Expect high BLOCK rate (20-30%)
// Gather data on false positives

Week 2-4: Adjust Thresholds

const adjustedPolicy = {
  minConfidence: 0.85, // Reduced based on data
  allowHallucinations: false, // Keep strict
  allowAmbiguity: true, // Relax (low risk)
  maxUncertainty: 0.15 // Increased based on feedback
}

// Target: 10-15% BLOCK rate

Ongoing: Dynamic Policies

async function getDynamicPolicy(context: Context) {
  const historicalPerformance = await getMetrics(context)

  return {
    minConfidence: calculateOptimalThreshold(historicalPerformance),
    allowHallucinations: false,
    allowAmbiguity: historicalPerformance.ambiguityHarmRate < 0.01,
    maxUncertainty: historicalPerformance.optimalUncertainty
  }
}

Production Best Practices

Always have fallbacks: Never let a BLOCK decision break the user experience
Log decision reasons: Understand why outputs were blocked or downgraded
A/B test thresholds: Find the right balance for your use case
User feedback loops: Let users report incorrect decisions
Gradual rollout: Start with conservative policies, relax slowly
Monitor drift: Decision distributions change as models evolve
Context-appropriate severity: High-stakes scenarios need stricter policies

Conclusion

Safe failure modes are essential for production LLM systems. The three-tier ALLOW/DOWNGRADE/BLOCK framework provides flexibility while ensuring safety.

The key is treating AI outputs as untrusted and having a clear, tested policy for every possible failure mode. With proper guardrails, you can deploy LLMs confidently knowing that even when they fail, they fail safely.

Safe Failure Modes: Understanding ALLOW, DOWNGRADE, and BLOCK Decisions

The Three Decision Framework

Verdic implements a three-tier decision framework:

ALLOW

Definition: The LLM output passes all validation checks and is safe to show to users without modification.

When to use:

Output matches expected intent
No hallucinations detected
Proper modality (format) enforced
No policy violations
Appropriate tone and content

Example:

const validation = await verdic.guard({
  output: "Your order #12345 will arrive on January 15th.",
  policy: {
    expectedIntent: "order_status",
    groundTruth: orderDatabase,
    modality: "text"
  }
})

// validation.decision === "ALLOW"
// Return original output to user

DOWNGRADE

Definition: The output has issues but can be salvaged through sanitization, redaction, or modification.

When to use:

Minor hallucinations that can be removed
PII that needs redaction
Inappropriate tone that can be neutralized
Excessive length that can be truncated
Ambiguous statements that can be clarified

Example:

const validation = await verdic.guard({
  output: "Your order will arrive soon, probably around January 15th. By the way, your email is john@example.com which we have on file.",
  policy: {
    noPII: true,
    confidenceThreshold: 0.9
  }
})

// validation.decision === "DOWNGRADE"
// validation.sanitizedOutput === "Your order will arrive soon."
// PII removed, uncertain date removed

Policy Violation Levels: HARD_BLOCK

Enterprise Framing: Policy violation levels (formerly "WARN/BLOCK")

Definition: The output is unsafe and cannot be salvaged. It must be completely replaced with a fallback response.

When to use:

Severe hallucinations
Dangerous or harmful content
Complete intent mismatch
Sensitive data exposure
Compliance violations

Example:

const validation = await verdic.guard({
  output: "Your credit card number ending in 4532 has been charged $299.99.",
  policy: {
    noPII: true,
    noFinancialData: true
  }
})

// validation.decision === "BLOCK"
// Return fallback instead of original output
return "I apologize, but I cannot provide that information. Please contact support."

Implementation Patterns

Pattern 1: Progressive Fallback

async function handleUserQuery(query: string): Promise<string> {
  // Try primary LLM
  const primaryResponse = await openai.generate(query)
  const primaryValidation = await verdic.guard(primaryResponse)

  if (primaryValidation.decision === "ALLOW") {
    return primaryResponse
  }

  if (primaryValidation.decision === "DOWNGRADE") {
    return primaryValidation.sanitizedOutput
  }

  // BLOCK: Try smaller, safer model
  const fallbackResponse = await openai.generate(query, { 
    model: "gpt-3.5-turbo",
    temperature: 0.3 // More deterministic
  })
  
  const fallbackValidation = await verdic.guard(fallbackResponse)

  if (fallbackValidation.decision !== "BLOCK") {
    return fallbackValidation.sanitizedOutput || fallbackResponse
  }

  // Ultimate fallback: template response
  return getTemplateResponse(query)
}

Pattern 2: Confidence-Based Downgrade

async function confidenceBasedGuard(output: string) {
  const validation = await verdic.guard({
    output,
    policy: {
      minConfidence: 0.8,
      groundTruth: knowledgeBase
    }
  })

  if (validation.confidence >= 0.9) {
    // High confidence: ALLOW
    return { decision: "ALLOW", output }
  }

  if (validation.confidence >= 0.7) {
    // Medium confidence: DOWNGRADE with disclaimer
    return {
      decision: "DOWNGRADE",
      output: `${output}

*Note: This information should be verified.*`
    }
  }

  // Low confidence: BLOCK
  return {
    decision: "BLOCK",
    output: "I don't have enough information to answer that accurately."
  }
}

Pattern 3: Context-Aware Decisions

async function contextAwareGuard(
  output: string,
  context: {
    userRole: 'admin' | 'user' | 'guest'
    criticality: 'low' | 'medium' | 'high'
  }
) {
  const validation = await verdic.guard(output)

  // Admins see more information, including downgraded content
  if (context.userRole === 'admin') {
    return {
      decision: validation.decision,
      output: validation.decision === "BLOCK" 
        ? `[BLOCKED] Original: ${output}` 
        : validation.sanitizedOutput || output,
      metadata: validation.violations
    }
  }

  // High criticality: be more strict
  if (context.criticality === 'high') {
    if (validation.decision !== "ALLOW") {
      return {
        decision: "BLOCK",
        output: "This operation requires manual verification."
      }
    }
  }

  // Standard user flow
  switch (validation.decision) {
    case "ALLOW":
      return { decision: "ALLOW", output }
    case "DOWNGRADE":
      return { decision: "DOWNGRADE", output: validation.sanitizedOutput }
    case "BLOCK":
      return { decision: "BLOCK", output: getFallback(context) }
  }
}

Measuring Decision Quality

Track these metrics to optimize your decision thresholds:

1. Decision Distribution

interface DecisionMetrics {
  allow: number // % of ALLOW decisions
  downgrade: number // % of DOWNGRADE decisions
  block: number // % of BLOCK decisions
}

// Healthy production system example:
// { allow: 85%, downgrade: 12%, block: 3% }

2. User Satisfaction by Decision Type

interface SatisfactionMetrics {
  allowSatisfaction: number // User rating for ALLOW responses
  downgradeSatisfaction: number // User rating for DOWNGRADE responses
  blockSatisfaction: number // User rating for fallback responses
}

// Goal: High satisfaction even for DOWNGRADE/BLOCK
// { allow: 4.5/5, downgrade: 4.2/5, block: 3.8/5 }

3. False Positive Rate

How often are good outputs incorrectly blocked or downgraded?

const falsePositiveRate = 
  (incorrectlyBlocked + unnecessarilyDowngraded) / totalOutputs

// Target: < 1%

4. False Negative Rate

How often are bad outputs incorrectly allowed?

const falseNegativeRate = 
  allowedButShouldBeBlocked / totalOutputs

// Target: < 0.1% (very strict)

Tuning Your Policies

Start strict, then relax based on data:

Week 1: Strict Mode

const strictPolicy = {
  minConfidence: 0.95,
  allowHallucinations: false,
  allowAmbiguity: false,
  maxUncertainty: 0.05
}

// Expect high BLOCK rate (20-30%)
// Gather data on false positives

Week 2-4: Adjust Thresholds

const adjustedPolicy = {
  minConfidence: 0.85, // Reduced based on data
  allowHallucinations: false, // Keep strict
  allowAmbiguity: true, // Relax (low risk)
  maxUncertainty: 0.15 // Increased based on feedback
}

// Target: 10-15% BLOCK rate

Ongoing: Dynamic Policies

async function getDynamicPolicy(context: Context) {
  const historicalPerformance = await getMetrics(context)

  return {
    minConfidence: calculateOptimalThreshold(historicalPerformance),
    allowHallucinations: false,
    allowAmbiguity: historicalPerformance.ambiguityHarmRate < 0.01,
    maxUncertainty: historicalPerformance.optimalUncertainty
  }
}

Production Best Practices

Always have fallbacks: Never let a BLOCK decision break the user experience
Log decision reasons: Understand why outputs were blocked or downgraded
A/B test thresholds: Find the right balance for your use case
User feedback loops: Let users report incorrect decisions
Gradual rollout: Start with conservative policies, relax slowly
Monitor drift: Decision distributions change as models evolve
Context-appropriate severity: High-stakes scenarios need stricter policies

Conclusion

Safe failure modes are essential for production LLM systems. The three-tier ALLOW/DOWNGRADE/BLOCK framework provides flexibility while ensuring safety.

Safe Failure Modes: Understanding ALLOW, DOWNGRADE, and BLOCK Decisions

Safe Failure Modes: Understanding ALLOW, DOWNGRADE, and BLOCK Decisions

The Three Decision Framework

ALLOW

DOWNGRADE

Policy Violation Levels: HARD_BLOCK

Implementation Patterns

Pattern 1: Progressive Fallback

Pattern 2: Confidence-Based Downgrade

Pattern 3: Context-Aware Decisions

Measuring Decision Quality

1. Decision Distribution

2. User Satisfaction by Decision Type

3. False Positive Rate

4. False Negative Rate

Tuning Your Policies

Week 1: Strict Mode

Week 2-4: Adjust Thresholds

Ongoing: Dynamic Policies

Production Best Practices

Conclusion

Ready to Build Safer AI?

Safe Failure Modes: Understanding ALLOW, DOWNGRADE, and BLOCK Decisions

Safe Failure Modes: Understanding ALLOW, DOWNGRADE, and BLOCK Decisions

The Three Decision Framework

ALLOW

DOWNGRADE

Policy Violation Levels: HARD_BLOCK

Implementation Patterns

Pattern 1: Progressive Fallback

Pattern 2: Confidence-Based Downgrade

Pattern 3: Context-Aware Decisions

Measuring Decision Quality

1. Decision Distribution

2. User Satisfaction by Decision Type

3. False Positive Rate

4. False Negative Rate

Tuning Your Policies

Week 1: Strict Mode

Week 2-4: Adjust Thresholds

Ongoing: Dynamic Policies

Production Best Practices

Conclusion

Ready to Build Safer AI?