Your AI Just Misread a Liability Cap. Here's Why.

A recent study reported a striking result: in clause-extraction tasks, the hallucination rate for clauses identified by LLMs was around 65% to 74% for some legally significant clause categories, particularly clauses involving numbers and obligations. The finding is an important reminder of the risks associated with relying on LLMs to extract and characterise key clauses from a contract.

The study, on US commercial contracts, breaks the errors down by clause category. The reported hallucination rates were materially higher for obligation clauses and for clauses involving numbers, such as monetary thresholds and liability caps.

Why LLMs Misinterpret Critical Contract Clauses

Some of the reasons for LLMs mischaracterising these types of clauses were instructive. Obligation clauses are harder for an LLM because their meaning can be affected by conditions, exceptions and scope qualifiers, which can cause an LLM to mischaracterise the clause.

A more interesting cause of the mistakes arises because of how LLMs work. The authors suggest that, where a clause usually contains a familiar figure (for example a liability cap of $5M or $10M), an LLM may sometimes substitute that familiar figure for the number actually used in the contract. That is a particularly dangerous kind of error because it can create false comfort about contractual risk.

The study suggests prompt engineering alone should not be treated as a sufficient control, because it does not address the underlying causes of the mistakes. One mitigating approach described in the study was a calibrated multi-agent debate pipeline, which reduced fabricated detections, although it did not eliminate content errors.

ContractProbe Locks Down NDAs
The 10 critical NDA clauses

Why Human Review Still Matters

At ContractProbe, we do not rely only on an LLM answer. We also use non-LLM-based analysis to cross-check outputs and flag inconsistencies for a human reviewer to consider.

Hallucinations in the form of non-existent case citations have been widely reported. This study is a useful reminder that LLMs hallucinate outside case-law research as well, including in contract-analysis tasks.

Users of LLMs need to take appropriate steps to ensure that the risks arising from their use are managed appropriately, including by keeping a "human in the loop" at appropriate times.

Source

LegalHalluLens: Typed Hallucination Auditing and Calibrated Multi-Agent Debate for Trustworthy Legal AI, Lalit Yadav, Akshaj Gurugubelli. Workshop paper, Second Workshop on Agents in the Wild, ICML 2026. Retrieved on 22 June 2026 from https://lnkd.in/g-XU6ZUZ

Tags:

#LegalTech #LegalAI #ContractReview #InHouseLegal #AIGovernance