The confidence distribution tells us what percentage of documents can be safely automated. Industry benchmarks suggest that for production deployment, you want:

 

  • >90% confidence: Straight-through processing (target: 70% of volume)
  • 70-90% confidence: Rapid review queue (target: 20% of volume)
  • <70% confidence: Full manual classification (target: <10% of volume)

 

The exact thresholds vary by risk tolerance, worker’s comp claims have higher stakes than marketing correspondence, but the pattern holds. Most documents are unambiguous. A small fraction needs review. A tiny fraction is genuinely hard.

 

What makes modern vision transformers particularly effective is that their confidence scores are well-calibrated: when the model says 95% confident, it’s right 95% of the time. This wasn’t true for earlier neural networks, which often expressed high confidence even when wrong.