Provenance and Witness Typing: Witnessed assertions and witness classes
An unapparent connection is stronger than an apparent one.
This chapter formalizes provenance judgements (A2), witnessed assertions (A2b), and a witness-class composition algebra (A2c) that types evidence by verification regime -- decidable, probabilistic, or attested. We show that retrieval-augmented generation relocates rather than solves the commitment problem: grounded claims still require witnesses if downstream reasoning is to detect conflicts and respect scope. The reader who wants the historical argument for why evidence requires custody -- from the hearsay rule to retrieval-augmented generation -- should read Vol I, Chapter 4 (The Empire of Strings).
The Confident Synthesis
The legal research system had access to everything it needed. In its corpus of appellate opinions (a toy jurisdiction used here for illustration), two holdings from the same court, both correctly retrieved, both authoritative within the corpus. The user asked a simple question: "Are liquidated damages clauses enforceable?"
The system synthesized a confident answer. The prose was measured, the citations real. It stated that liquidated damages clauses are enforceable when the stipulated amount represents a "reasonable forecast" of anticipated harm. This was the standard from a 2019 holding. The answer did not mention that the same court, in a 2023 opinion, had held such clauses unenforceable when the underlying contract was "procedurally unconscionable." The later holding did not contradict the earlier one; it refined it, carving out an exception that might or might not apply to the user's situation. The answer sounded like settled law, but it was contested terrain.
This is not hallucination. The system did not fabricate a case. Both holdings are real within the corpus; both are authoritative; both were retrieved correctly. The failure is subtler and more structural: the system committed to a synthesis without surfacing the scope relationship between its sources. It had two attestations and no mechanism to notice that one qualified the other. The problem was not inconsistency of sources; it was loss of scope—a refinement was flattened into a slogan.
When evidence loses its address, contradiction becomes invisible.
The system's output was grounded in the technical sense, derived from external documents rather than generated from statistical patterns alone. But grounding did not prevent the failure. It relocated it. The problem is no longer "the system invented a fact." The problem is "the system committed to a claim without acknowledging that its evidential basis is contested."
The Pattern Repeats
The failure is not limited to law. Wherever systems aggregate from multiple authoritative sources, the same structure appears.
Consider the fashion catalog aggregating inventory from three suppliers. Supplier A's feed lists a dress as "silk blend (70% silk, 30% polyester)." Supplier B's feed lists the same SKU (same product, same barcode) as "100% polyester." Both suppliers are in good standing; both feeds are canonical as far as the system knows. A third supplier lists the fabric as "satin," which is a weave rather than a fiber composition, orthogonal to the question entirely.
A user asks: "What is the fabric composition of this dress?"
The system might confidently state one value, choosing a supplier without disclosure. It might attempt a synthesis: "silk-polyester satin blend," incoherent but plausible-sounding. It might refuse to answer without explaining why. None of these is correct. The correct response acknowledges the conflict: Supplier A claims silk blend; Supplier B claims polyester; resolution requires a business rule or human arbiter.
The legal system and the catalog share a failure mode. Both retrieve authoritative sources. Both commit to claims without tracking that the sources disagree. Both lack the machinery to surface conflict and defer resolution. The domain is different; the structure is identical.
Retrieval as Operating System
Retrieval was once a read operation. You queried a search engine; the engine returned a ranked list; the list was the answer. The user's job was to evaluate the results. The system's job was to find them.
Retrieval-augmented generation changed the verb(Kiela 2020)Patrick Lewis and Ethan Perez and Aleksandra Piktus and Fabio Petroni and Vladimir Karpukhin and Naman Goyal and Heinrich Küttler and Mike Lewis and Wen-tau Yih and Tim Rocktäschel and Sebastian Riedel and Douwe Kiela, "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks," Advances in Neural Information Processing Systems 33 (2020): 9459–9474.View in bibliography. Now retrieval is a write operation. The system searches, retrieves, and injects the results into a context window. The language model then generates a response that treats the injected text as part of its available knowledge. The boundary between "what the system knows" and "what the system just retrieved" dissolves. The retrieved text becomes raw material for the next token prediction.
Tool use generalized the pattern. Retrieval became one action among many: search the web, query a database, call an API, execute code, check a calendar. Each action returns a result. Each result enters the context. Each context shapes the generation that follows. From the model's perspective, a tool call is a syscall: dispatch a request, receive a response, incorporate the response into the ongoing computation.
Agent architectures orchestrated these syscalls into plans. The system reasons about which tools to use, in what order, with what parameters. It dispatches multiple calls, aggregates results, handles failures, and synthesizes responses. The agent is a scheduler. Scheduling determines ordering: which evidence enters first, which commitments get made, which conflicts become visible.
Each transition expanded the commitment surface area.
| Paradigm | Operation | Systems Analogy | Commitment Surface |
|---|---|---|---|
| Search | Read | I/O | Results are the answer |
| RAG | Write | Memory injection | Claims enter context |
| Tool use | Action | Syscall | Results become premises |
| Agents | Orchestration | Scheduler | Multiple sources multiply conflicts |
The more sources consulted, the more potential for disagreement. The more actions taken, the more assertions implied. The more complex the orchestration, the harder it becomes to trace which source contributed which claim.
One invariant should hold across all of these approaches: when retrieval results enter generation, they must remain addressable. "Addressable" means each claim carries a stable handle—minimally, a tuple of the form:
Addressability is what lets downstream reasoning distinguish a claim from a statute and a claim from a blog post, a measurement from a calibrated instrument and a guess from an uncalibrated one.
The context window does not preserve addresses. Once text is injected, its provenance disappears. A claim from Supplier A and a claim from Supplier B look identical. A holding from 2019 and a holding from 2023 are just strings. The context window is shared memory with no page tables; the operating system has no memory protection.
The Provenance Gap
The claim "Paris is in France" looks the same whether it comes from an encyclopedia, a user's personal note, or a hallucinated interpolation. In the context window, all strings are equal. But they are not equal in the commitment lattice. A claim from a verified database carries different obligations than a claim from an unverified source. A claim attested by a laboratory carries different weight than a claim inferred by pattern matching.
This is the provenance gap: the space between what a claim asserts and what would license believing it(Tan 2001)Peter Buneman and Sanjeev Khanna and Wang-Chiew Tan, "Why and Where: A Characterization of Data Provenance," in Database Theory — ICDT 2001 (Springer, 2001), 316–330.View in bibliography(Tan 2009)James Cheney and Laura Chiticariu and Wang-Chiew Tan, "Provenance in Databases: Why, How, and Where," Foundations and Trends in Databases 1, no. 4 (2009): 379–474.View in bibliography. String-based systems collapse this gap by design. They process tokens, not typed propositions. When retrieval delivers a fact into the context, the fact arrives naked—without its source, its confidence, or its verification regime.
The consequence is structural:
Lemma (Address Loss): If a downstream stage receives claims as bare strings without stable identifiers for (referent, attribute, source, time, logic-regime), then it cannot, in general, decide whether two claims are in conflict—because conflict detection reduces to a join problem over those identifiers, and dropping provenance drops the keys.
A module that receives the string "fabric: silk blend" cannot know whether another module received "fabric: 100% polyester" for the same item. A module that receives "liquidated damages clauses are enforceable" cannot know whether another module received a holding that carves out exceptions. The conflict exists in the source layer; it vanishes in the processing layer.
Human institutions solved this problem long ago. Legal documents cite precedent with pinpoint specificity: case name, reporter, volume, page, year. Scientific papers cite sources with enough information to retrieve and verify. Financial audits trace every number to a supporting document. The citation is not decoration. It is the minimal information required to check the claim.
But a citation is not the same as a witness.
A citation locates: it tells you where a claim came from, providing enough information to retrieve the source. A witness licenses: it tells you how to apply the claim by specifying verification regime, scope, and consumption contract.
When a system appends "Source: Supplier A" to a claim, it has added a citation. When a system passes the pair , where includes the attestation type, the verification method, and the conditions under which the claim may be used, it has provided a witness.
The difference is not cosmetic. Citations decorate; witnesses compose. Grounding that composes is proof-carrying: assertions travel with the object that permits relying on them. A downstream module that receives witnessed assertions can propagate conflicts, merge compatible sources, and refuse to draw conclusions when the evidence is contested. A downstream module that receives strings with footnotes cannot.
The Provenance Judgement
Given the failure (systems that retrieve correct information and still produce incoherent outputs), what is the minimal object that would prevent it?
The answer is a typing judgement that keeps claims tethered to their evidence.
Define a typing judgement for provenance:
where:
- is a context: background assumptions, available sources, trust anchors
- is a proposition: the claim being made
- is the witness type (read: "the type of witnesses for "): the type of evidence that would support
- is a witness: a term of type , constituting the evidence itself
The judgement asserts: under context , is evidence that holds.
Consumption rule: Downstream steps must receive , the claim paired with its witness, not bare .
The judgement is the minimal typing that prevents the opening failures. Given the court-record case, the system would distinguish:
- : "Liquidated damages clauses are enforceable if reasonable forecast" witnessed by 2019 holding
- : "Such clauses are unenforceable if procedurally unconscionable" witnessed by 2023 holding
With both pairs in hand, downstream reasoning can see the relationship. The 2023 holding is not a contradiction of the 2019 holding; it is a refinement that carves out an exception—a scope constraint on the earlier rule. But without the witnesses, the scope relationship is invisible. The system sees two strings and must synthesize; it cannot see that one holding qualifies the other. The synthesis failed by collapsing a rule lattice into a single sentence.
The judgement tells us that a witness exists. It does not yet tell us what kind of witness: what verification regime governs it, what operations are valid on it, what level of trust downstream modules may assume.
A2b — Witnessed Assertion:
A witnessed assertion is a pair where:
- is a proposition
- is a witness
- is a verification function
The verification function is typed according to the witness class.
A2c — Witness Classes:
Witnesses are typed by verification regime:
| Class | Verification | Output | Examples |
|---|---|---|---|
| Decidable | Total, deterministic | Type check, arithmetic, schema validation | |
| Probabilistic | Terminates with confidence | ML classifier, embedding similarity, statistical test | |
| Attested | Checks provenance chain | Supplier contract, certificate signature, human attestation |
The class determines what "verification" means. A decidable witness can be checked algorithmically with a definite answer. A probabilistic witness returns a confidence interval. An attested witness returns "ok if you trust this authority"—and the system must know whether it does.
Algebraic properties of the composition algebra. The witness classes form a total order under strength: . When witnesses compose, the composite inherits the class of the weakest component — formally, composition takes the meet in this order. This operation is associative: composing three witnesses yields the same class as , namely . The unit is : composing any witness with a decidable witness preserves the original class. These properties ensure that witness-class tracking is well-defined under arbitrary composition sequences — the order of combination does not affect the resulting verification regime.
Operational Consequences of Witness Class
The witness class determines not only how verification works but which downstream operations the witness can support. The full composition rules appear later; this table summarizes what each class permits and what artifact it produces on conflict.
| Class | Compose | Transport | Glue | Conflict Artifact |
|---|---|---|---|---|
| Decidable | Yes | Yes | Yes | UnsatCore with derivation |
| Probabilistic | Yes (propagate bounds) | Yes (preserve confidence) | Under declared approximation policy | DisagreementWitness with bounds and policy |
| Attested | If same authority | Within authority scope | Requires authority agreement | AuthorityConflict; escalate per governance policy |
Decidable witnesses admit all operations. If two decidable witnesses conflict, the conflict is definite and the system produces a structured failure artifact—an unsat core or derivation showing the contradiction. There is no middle ground; the arithmetic either checks or it does not.
Probabilistic witnesses compose by propagating uncertainty. When you transport a probabilistic claim, the confidence bounds travel with it. When you attempt to glue probabilistic claims, the system must either produce an obstruction witness or apply an explicit tolerance policy and record it. On conflict, the system may defer judgment pending more evidence or widen bounds to encompass both claims at the cost of precision.
Attested witnesses are scoped to their authority. Two attested claims can compose only if they share an authority or if a meta-authority relationship exists. Transport requires the target context to recognize the originating authority. Gluing attested claims from different authorities requires either a reconciliation protocol or escalation per governance policy. On conflict, the system does not guess; it escalates.
The table is not a menu of choices. It is a consequence of what each witness class represents. A decidable witness is a proof; proofs compose tightly. A probabilistic witness is a bet; bets compose with bookkeeping. An attested witness is a delegation; delegations compose only within chains of trust.
The fabric composition claim from Supplier A has an attested witness: the supplier's feed, governed by a contractual relationship, checkable by "do we trust Supplier A for fabric data?" The holding from the 2019 case has an attested witness: the court reporter, checkable by "is this a valid citation to the official record?"
An embedding model's claim that two items are "similar" has a probabilistic witness: the cosine similarity score with confidence bounds. A type checker's claim that a program is well-typed has a decidable witness: the derivation tree, checkable by replaying the algorithm.
The three classes are not exhaustive (finer distinctions exist), but they capture the major verification regimes that matter for system design. The key insight is that not all witnesses are alike. A system that treats an attested claim and a probabilistic claim identically will make mistakes that a typed system would catch.
The class-level algebraic properties (associativity of min, identity of Decidable) are straightforward, as established above. The remaining limitation is scope partiality: composition is defined only when scopes overlap, and fails — producing an obstruction witness — when they are disjoint or when authority chains conflict. Across arbitrary scope combinations, the composition behaves as a partial monoid rather than a total one. The interaction between scope intersection and authority agreement under composition is not fully formalized in this volume. Extending the algebra to characterize when partial composition chains terminate in obstruction versus amalgamation is a natural direction for future work.
Attested witnesses defer to authority: the claim is valid if the authority is trusted. This raises a natural question: what witnesses the authority's trustworthiness? The regress is real but not infinite. It terminates in root trust anchors (certificates, contracts, institutional roles) that a system must assume or verify by out-of-band means. The regress is analogous to certificate chains in public-key infrastructure: each certificate is signed by an authority, whose certificate is signed by another authority, terminating in root certificates that the system trusts by fiat. We will formalize trust anchors when we need them; for now, it suffices to note that attested witnesses are scoped by their authority.
The Fashion Catalog Revisited
Return to the catalog with the new machinery in hand.
For SKU DRESS-2847, the system now maintains:
- : "fabric = silk blend (70/30)" witnessed by Supplier A feed, class: attested, authority: Supplier A
- : "fabric = 100% polyester" witnessed by Supplier B feed, class: attested, authority: Supplier B
- : "fabric = satin" witnessed by Supplier C feed, class: attested, authority: Supplier C
When queried, the system sees that and are in conflict: they assert different values for the same predicate (fiber composition) on the same entity. It sees that is orthogonal: "satin" is a weave, not a fiber, so Supplier C is making a claim about a different predicate that the string pipeline had merged into "fabric." The conflict is not only value-level; it is predicate-level. The system can now respond correctly:
"Conflicting provenance for fabric composition. Supplier A (attested) claims silk blend. Supplier B (attested) claims 100% polyester. Supplier C's claim (satin) addresses weave, not fiber. Resolution requires business rule or human arbiter."
This is not a refusal to answer. It is an answer that respects the evidential situation. The user learns that the data is contested and can choose how to proceed. The system has not committed to a claim it cannot support.
Same Referent, Different Sources
A harder case: the system retrieves a record from Catalog A that calls an item a "cocktail dress." It retrieves a record from Catalog B that calls what appears to be the same item an "evening dress." Are these the same dress? If so, which label is correct?
Embedding similarity might be high; the images look alike, the descriptions overlap. But similarity is not identity. The items could be the same dress sold under different names. They could be different dresses that happen to look similar. They could be the same physical garment at different price points.
The commitment-discipline question is: can the system assert "these are the same item" without a witness? The answer is no. Similarity can propose identity; only a witness can certify it.
What would constitute a witness for identity? The strongest would be a shared key: same SKU, same UPC, same manufacturer part number. Failing that, a confluence of matching attributes: identical measurements, same supplier origin, same production date. Failing that, human attestation: a merchandiser who examined both records and declared them equivalent.
Each of these witnesses has a different class. A shared key is decidable: either the keys match or they don't. Attribute confluence is probabilistic: how many attributes must match, with what tolerance? Human attestation is attested: valid if you trust the merchandiser.
Without witnesses, the system cannot safely merge the records(Sunter 1969)Ivan P. Fellegi and Alan B. Sunter, "A Theory for Record Linkage," Journal of the American Statistical Association 64, no. 328 (1969): 1183–1210.View in bibliography. With witnesses, it can merge and record the justification. The provenance travels with the claim.
This is the reference problem, what philosophers call the "morning star / evening star" puzzle(Frege 1892)Gottlob Frege, "Über Sinn und Bedeutung," Zeitschrift für Philosophie und philosophische Kritik 100 (1892): 25–50.View in bibliography. Two names can refer to the same object (Venus) without the system knowing it. The solution is not to assume identity; it is to require witnessed equivalence before asserting it.
Negation Depends on Source
One more complication. Supplier A's catalog operates under the closed-world assumption(Reiter 1978)Raymond Reiter, "On Closed World Data Bases," in Logic and Data Bases, ed. Hervé Gallaire and Jack Minker (New York: Plenum Press, 1978), 55–76.View in bibliography: if an attribute is not listed, the item does not have that attribute. For Supplier A, if sustainable = null, the item is not sustainable.
Supplier C's catalog operates under the open-world assumption: if an attribute is not listed, the item's status is unknown. For Supplier C, if sustainable = null, the item's sustainability is undetermined.
A user asks: "Show me sustainable dresses."
Without logic annotations, the system faces an impossible merge. Should it exclude Supplier C items (applying CWA globally, treating unknown as false)? Should it include them (applying OWA globally, treating unknown as possible)? Either choice is a commitment the system doesn't realize it's making.
The solution is to include logic in the provenance. The witness for Supplier A's sustainability claims carries a CWA tag: absence means negation. The witness for Supplier C's claims carries an OWA tag: absence means unknown.
When the query arrives, the system can now respond appropriately:
"Results from Supplier A reflect closed-world semantics (missing = not sustainable). Results from Supplier C reflect open-world semantics (missing = unknown). Do you want to see: (a) only confirmed sustainable items, (b) confirmed + unknown, or (c) all items with sustainability status displayed?"
This is not pedantry. It is the difference between a system that silently makes epistemic commitments and a system that surfaces them for user decision. Logic is part of provenance.
The Typed Pipeline
The full picture is now clear. Retrieval produces candidate claims. Each claim should be wrapped with a witness before passing downstream. The witness specifies the source, the verification regime, and the logic under which the claim was made. Downstream reasoning consumes the pair , not the bare string .
Retrieval Candidate Claims Provenance Reasoning
Sources --> (raw strings) --> Wrapper --> (p, π) with
verify(p,π) obligations
The failure case is when the wrapper is dropped. Claims become "free"—untethered from their evidence, indistinguishable from each other, impossible to reconcile when they conflict. Downstream modules see strings; they cannot see sources. Conflicts that exist in the retrieval layer become invisible in the reasoning layer.
The structural consequence, restated:
Address Loss Lemma: Conflict detection reduces to a join over (entity, predicate, source, time, logic-regime). If claims arrive as bare strings, the join keys are missing. Without keys, no join. Without join, no conflict detection except by accident or re-fetching.
This is not a contingent fact about current systems. It is a structural consequence of the architecture. If claims do not carry their witnesses, then downstream modules have no basis on which to detect or resolve conflicts. The information is simply not there.
Consequence
Grounding was supposed to solve the commitment problem. If a system retrieved its claims from external sources, surely the sources would guarantee consistency. The system would not hallucinate because it would not generate; it would look up.
The promise was half-kept. Retrieval-augmented systems do hallucinate less on well-documented topics. They can cite sources, providing at least a rhetorical gesture toward accountability. But the commitment problem did not disappear. It relocated.
The system must now track not only what it has asserted but where each assertion came from. It must recognize when sources disagree. It must know what kind of verification each source supports. It must understand that different sources may operate under different logical regimes: some closed-world, some open-world, some probabilistic. Without this machinery, the system commits to claims it cannot support, synthesizes conflicts it cannot see, and presents contested terrain as settled ground.
Provenance typing is the first object the Third Mode requires: a way to distinguish proposal from certification. A claim without its witness is not grounded; it is merely repeated.
But provenance alone does not tell us what vocabulary to use. A system that tracks sources perfectly can still be asked for a "flowy dress" and have no column in which to answer. The string empire fails for lack of commitment discipline. The retrieval layer fails for lack of provenance discipline. Perhaps the schema empire, the world of tables, constraints, and referential integrity, offers a solution.
When evidence loses its address, contradiction becomes invisible. The next chapter turns to the second empire, where evidence has addresses but vocabulary does not grow.
Litmus Cases
This chapter advanced two tests from the touchstone battery.
| Case | Name | Chapter 1 Status | Chapter 2 Status |
|---|---|---|---|
| T2 | Reference | Foreshadowed as "identity without witnesses" | Advanced: same referent, different sources; provenance distinguishes |
| T5 | Negation | Foreshadowed as "negation without witnesses" | Advanced: logic is part of provenance; CWA/OWA as source metadata |
T2 (Reference): Two expressions may refer to the same entity ("cocktail dress" in Catalog A, "evening dress" in Catalog B), but the system cannot assert identity without a witness. Embedding similarity proposes; witnessed equivalence certifies. Resolution requires the machinery of Part II.
T5 (Negation/Absence): The meaning of a missing attribute depends on the source's logical regime. Closed-world sources treat absence as negation; open-world sources treat absence as unknown. Merging sources without logic annotations produces silent commitments. Full resolution requires the epistemic-status machinery of Chapter 4.
The touchstones are not resolved here. They are sharpened: the failures now have names, and the names point toward the objects we must build.