The Coherence Topos and Vocabulary Evolution
Appendix L
The Coherence Topos and Vocabulary Evolution
This appendix addresses a specific problem: how autonomous computational agents might invent new concepts, certify them against existing commitments, transport them across institutional boundaries, and account for the cost — within a mathematically rigorous structure.
Existing approaches address fragments of this problem. Retrieval-augmented generation retrieves without coherence guarantees. Multi-agent frameworks compose outputs without gluing conditions. Knowledge graphs structure without vocabulary invention. Schema systems enforce without evolution. The composition of these fragments under formal guarantees remains open.
Parts I–VI of The Proofs assembled the components: commitment sets (A1), witnessed equivalence (A10), context sites (A12b), the sheaf condition (A13), fibrations (A14), transport discipline (A16), predicate invention (A17), conservative extension (A17b), and the coherence cost model (A21). This appendix states the structural consequence that these components jointly entail, develops several results that are (to our knowledge) novel, positions the work explicitly against the existing landscape, and identifies concrete research programs for the mathematical community.
The topos theorem (L.2) is a consequence, not a contribution — it follows from Giraud's theorem applied to the context site. We state it because its corollaries are the contribution: the internal logic subsumes the logic-selection machinery of A15, the subobject classifier provides the multi-valued truth that A4 reached for, and the monadic structure of predicate invention (L.6) gives a formal theory of vocabulary evolution that has not, to our knowledge, been developed elsewhere.
L.1 What This Appendix Claims
We distinguish three levels of novelty:
Standard results applied to a new domain (L.2, L.3): The coherence topos theorem and its internal-logic corollary are instances of known mathematics (Giraud, Mac Lane–Moerdijk). We claim only that the instantiation is well-formed and that the corollaries are operationally significant for distributed systems.
Novel results (L.4–L.6): The Obstruction Cohomology computation, the acyclicity theorem for hierarchical sites, the meta-obstruction for federated sites, the non-composability of overlap agreement, and the characterization of the predicate invention monad as a quotient of a free monad are (to our knowledge) new. They are provable within standard sheaf theory and model theory but have not appeared in the literature because the combination — sheaf-theoretic coherence applied to vocabulary evolution under conservative extension constraints — has not been studied.
Landscape comparison (L.7): We position this work explicitly against Spivak's functorial data migration, Goguen's sheaf semantics, Abramsky's sheaf-theoretic contextuality, and Caramello's bridge program. The comparison identifies what is shared, what is new, and where the framework extends existing work.
Open problems (L.8): Seven precisely stated problems for the mathematical community, including two new problems motivated by the results of this appendix (the Eilenberg-Moore category of and persistent cohomology of evolving sites).
Notation
Symbols from Parts I–VI (commitment sets, anchors, etc.) follow the conventions in Appendix H. The following notation is specific to this appendix or used here with specialized meaning. Standard category-theoretic and sheaf-theoretic notation follows Mac Lane & Moerdijk(Lane 1992)Saunders Mac Lane, Sheaves in Geometry and Logic: A First Introduction to Topos Theory (New York: Springer-Verlag, 1992).View in bibliography.
| Symbol | Meaning | Introduced |
|---|---|---|
| Context site: category with Grothendieck topology | A12b | |
| Presheaf category | L.2 | |
| Sheaf category (the coherence topos) | L.2 | |
| Subobject classifier; = -closed sieves on | L.2 | |
| Sheafification: left exact left adjoint to inclusion | L.2 | |
| Exponential sheaf: certification space from proposals to witnesses | L.3 | |
| Čech -cochains of presheaf with respect to cover | L.4 | |
| Čech -th cohomology group | L.4 | |
| Čech coboundary map | L.4 | |
| Overlap (fiber product) of and over | L.4.1 | |
| Second page of the Čech-to-derived-functor spectral sequence | L.4.1 | |
| Presheaf of local cohomology groups | L.4.1 | |
| Category of signatures with inclusion morphisms | L.5 | |
| Signatures (finite sets of typed predicate/function symbols) | A17 | |
| Proposal endofunctor: = single-predicate extension proposals | L.6.1 | |
| Free monad on : finite sequences of proposals | L.6.1 | |
| Predicate invention monad: admissible extensions of | L.6.2 | |
| Quotient monad morphism (surjective) | L.6.2 | |
| Monad unit and multiplication | L.6.2 | |
| Kleisli category: vocabulary evolution paths | L.6.2 | |
| Kernel of the quotient: inadmissible proposal combinations | L.6.3 |
L.2 The Coherence Topos
Throughout this appendix, is the context site from A12b and is assumed essentially small.
is a Grothendieck topos(Verdier 1972--1973)Michael Artin and Alexander Grothendieck and Jean-Louis Verdier, Théorie des Topos et Cohomologie Étale des Schémas (SGA 4) (Berlin: Springer-Verlag, 1972--1973).View in bibliography(Lane 1992, ch. III, §4)Saunders Mac Lane, Sheaves in Geometry and Logic: A First Introduction to Topos Theory (New York: Springer-Verlag, 1992), ch. III, §4.View in bibliography. It has all finite limits, all small colimits, exponentials, a subobject classifier , and the inclusion has a left exact left adjoint (sheafification).
By Giraud's theorem(Verdier 1972--1973)Michael Artin and Alexander Grothendieck and Jean-Louis Verdier, Théorie des Topos et Cohomologie Étale des Schémas (SGA 4) (Berlin: Springer-Verlag, 1972--1973).View in bibliography(Lane 1992, ch. III, Theorem 1)Saunders Mac Lane, Sheaves in Geometry and Logic: A First Introduction to Topos Theory (New York: Springer-Verlag, 1992), ch. III, Theorem 1.View in bibliography. The topology determines a Lawvere-Tierney operator via , which is idempotent, preserves top, and preserves meets. The -sheaves are the -sheaves, and the category of -sheaves in a topos is a topos (Mac Lane & Moerdijk, Ch. V, Theorem 1).
Corollary: The Subobject Classifier and Epistemic Status
The subobject classifier . Truth values are not but -closed sieves: families of contexts in which a claim holds, closed under the covering relation.
| A4 Epistemic Status | Topos Interpretation |
|---|---|
| True in | The maximal sieve (all refinements) |
| False in | The empty sieve |
| Undetermined in | A proper non-empty -closed sieve |
| Conflict at | Both and are non-empty, proper sieves |
The internal logic is intuitionistic. Excluded middle holds at iff the restricted topology is discrete (every sieve covers). This is the closed-world assumption. Non-discrete topologies yield open-world reasoning natively — no adapter required.
The indexed logic selection of A15 is a special case of relativizing to sub-topologies. Specifically: for all iff restricted to the sieve below is the discrete topology. The CWA/OWA distinction is not an engineering parameter but a structural property of the topology over each context.
A Grothendieck topos is Boolean iff , which holds iff every -closed sieve is maximal or empty — the discrete topology(Lane 1992, ch. VI, §6)Saunders Mac Lane, Sheaves in Geometry and Logic: A First Introduction to Topos Theory (New York: Springer-Verlag, 1992), ch. VI, §6.View in bibliography. Restricting to the slice yields a sub-topos whose Booleanness depends on the induced topology on the under-category .
L.3 Exponentials and Certification
The topos has exponentials. For sheaves (proposals, per A19) and (witnesses, per A2c):
A certification contract (A19b) is a global section : a natural transformation that commutes with all restriction maps. The topos guarantees the space of certifications is a well-defined sheaf. Coherence of certification across contexts is naturality. Whether a particular certification exists is the engineering problem; the topos provides the space in which to search.
L.4 Obstruction Cohomology: A Worked Computation
This section contains what we believe to be novel: an explicit computation of the first sheaf cohomology group for a concrete context site arising in data integration, and its interpretation as classifying ambiguous identity resolution. The companion paper Predicate Invention Under Sheaf Constraints (SCPI) proves that the same classifies obstructions to predicate invention across heterogeneous agent contexts, formalizing the descent problem that A17's three obligations address. The SHEAF Protocol extends this diagnostic to a distributed setting with mechanism-design enforcement.
The Setup: Three-Merchant Catalog
Let be the poset category with objects where are merchant contexts covering the catalog context , and the -objects are pairwise overlaps. Morphisms are inclusions (each overlap refines both parents).
The topology declares as a cover.
Let be the presheaf of product identifiers:
- (merchant A's products)
- (merchant B's products)
- (merchant C's products)
On overlaps, restriction identifies shared products:
- : product and are "the same item" — but the identification is ambiguous (two possible matchings exist)
- : product and are unambiguously identified
- : no shared products
The Čech Complex
The Čech cohomology of with respect to the cover is computed from the cochain complex:
where:
- — local sections (one per merchant)
- — comparison on overlaps
- — triple overlaps (empty here)
The coboundary sends a local section to the tuple of restrictions: .
Computing and
— the global sections. These are the tuples of local products that agree on all overlaps: the coherent global catalog. If the identifications on overlaps are consistent, is the glued catalog.
— the ambiguity group.
For the three-merchant site above, whenever the overlap admits multiple consistent identifications of shared products. Concretely: if could match either or (both matchings are consistent with the restriction maps), then has order , and its elements correspond bijectively to the distinct global catalogs that could be assembled from the same local data.
A 1-cocycle assigns to each overlap an identification that satisfies the cocycle condition on triple overlaps (vacuously here, since ). Two 1-cocycles are cohomologous if they differ by a coboundary — a relabeling of local products that induces the identification difference.
When admits two matchings and , these define distinct 1-cocycles. They are cohomologous iff there exists a relabeling of or that transforms one matching into the other. If and neither is in the image of any other identification, no such relabeling exists, and the cocycles represent distinct cohomology classes.
Each class corresponds to a distinct global catalog: the "same" local data assembled into different global pictures depending on which identification is chosen.
This is the formal version of a problem every data integration practitioner knows: two sources share some entities, the matching is ambiguous, and different matchings produce different downstream results. is the mathematical name for this ambiguity. The group structure tells you how many distinct resolutions exist and how they relate. This is not metaphor — it is computable for finite context sites.
For the agentic substrate specifically: when two AI agents operating in different contexts propose identity claims about shared entities, measures the irreducible ambiguity in reconciling those claims. No amount of embedding similarity resolves it; only an explicit choice of cocycle representative (a witnessed identification) does.
L.4.1 Acyclicity of Hierarchical Sites
The three-merchant example has non-trivial because the overlap structure admits ambiguity. A natural question: for which site structures does ambiguity vanish? The answer connects organizational topology to coherence cost.
A context site is hierarchical if:
- is a finite rooted tree (poset where every element except the root has exactly one immediate predecessor)
- The topology is generated by parent-children families: for each non-leaf node with children , the family is a cover
- For distinct siblings (children of the same parent), the overlap is the initial object (no shared sub-context between different branches)
Let be a hierarchical context site. For any abelian presheaf on and any cover in :
In particular, : there is no ambiguity in identity resolution for hierarchical organizations.
We prove this by analyzing the Čech complex directly.
Step 1: Structure of overlaps in a tree.
Let be a node with children forming a cover. For , the overlap by the tree condition (distinct branches share no sub-context). Therefore for any presheaf :
Step 2: Collapse of the Čech complex.
The Čech complex for cover of is:
Since for all , every term for . The complex is:
Therefore for all .
Step 3: Extension to composite covers.
For a cover of a non-root node, the same argument applies locally: each non-leaf is covered by its children, which are pairwise disjoint. By the Čech-to-derived-functor spectral sequence (or directly by Leray's theorem applied to the refinement of any cover by the canonical parent-children covers), the vanishing extends to all covers in , not just the generating ones.
Step 4: Recursive argument for depth .
For a tree of depth , consider the cover of the root by its children, then each child by its children, etc. The Čech-to-sheaf cohomology spectral sequence for this iterated cover has:
where is the presheaf of local cohomology. By induction on depth: for (each sub-tree is acyclic by the inductive hypothesis), so for . And for by Step 2. Therefore the spectral sequence degenerates and for all .
This theorem has a precise operational meaning: hierarchical organizations have no identity ambiguity. When contexts are organized as a tree — a corporate hierarchy, a taxonomic classification, a file system — the hierarchy itself resolves all identity questions. Two items in different branches are either identified by a common ancestor's decree or they are not. There is no room for multiple consistent identifications because distinct branches share no sub-context on which to disagree.
This explains a familiar phenomenon: hierarchical organizations are easy to integrate. Corporate mergers between divisions that shared no operations succeed trivially. Taxonomies with strict inclusion are unambiguous. File systems never have merge conflicts within a single tree.
The price of acyclicity is rigidity. A tree cannot express "A and B share some context but neither subsumes the other." Peer-to-peer and federated structures can, and they pay for it with non-trivial cohomology.
L.4.2 Higher Obstructions in Federated Sites
Federated structures are the opposite extreme from hierarchies: multiple overlapping authorities, no single root, non-trivial shared contexts. We show that federated sites can have non-trivial , which classifies meta-conflicts — disagreements not about identity itself but about how to resolve identity disagreements.
A context site is federated if:
- contains a set of federation nodes and member nodes
- Each member belongs to at least one federation: for each , there exists with a morphism (membership)
- The topology includes the cover for each federation
- Members of distinct federations may share non-trivial overlaps: need not be initial
- There exists a global context covered by
There exists a federated context site and presheaf such that for a cover of the global context. Elements of classify meta-obstructions: situations where pairwise identity resolutions exist but no globally consistent resolution strategy exists.
Construction. Let have:
- Global context
- Three federation nodes covering
- Six member nodes for , where belongs to both and (the shared member between federations and )
- Triple overlap belonging to all three federations
The cover of is . The pairwise overlaps are . The triple overlap is .
Let be a presheaf of identification protocols (an abelian group, for concreteness -valued):
- for each federation (two possible identity conventions: "match by name" vs "match by code")
- (the agreed convention on the shared member)
The restriction maps are the identity (each federation imposes its convention on its shared members).
The Čech complex:
The coboundary .
The coboundary (the alternating sum on the triple overlap).
Computing : : we need in , i.e., . This kernel has order $4$ (any two of the three values determine the third).
: the image consists of . Over , this gives vectors . When ranges over , the image has order $4{(0,0,0), (1,1,1)}, so the image has __CURRENCY_2__/2 = 4 elements).
Therefore ... Let us compute more carefully.
: with :
So , which has order 4.
: we need : — also order 4.
So in this case.
Now .
: . Since , the image is all of .
So here as well. This is because the nerve of this cover is the 2-simplex , which is contractible.
The non-trivial case requires a cover whose nerve has non-trivial . We modify the construction: let be covered by four federations with pairwise overlaps for all , triple overlaps for all , but no quadruple overlap (). The nerve is the boundary of a 3-simplex , which has .
Concretely, the Čech complex becomes:
(The last term is $0\check^3 = F(M_) = F(\emptyset) = 0$.)
The standard computation gives . A non-trivial 2-cocycle assigns values to each triple overlap such that the alternating sum condition is satisfied, but these values cannot be decomposed as coboundaries from pairwise overlaps. This is a meta-obstruction: each pair of federations can resolve their identity disagreements, and each triple of federations can find a consistent resolution, but there is no single global resolution strategy compatible with all four federations simultaneously.
The meta-obstruction has a vivid operational interpretation. Consider four regulatory bodies () each overseeing a set of financial institutions. Any two regulators can agree on how to identify shared entities. Any three can find a consistent protocol. But when all four try to federate, a global obstruction emerges: the pairwise agreements, though locally consistent in triples, cannot be simultaneously satisfied. This is a higher-order coordination failure — not a conflict about data but a conflict about conflict-resolution strategies.
For the agentic substrate: means that even if every pair of AI agents can resolve their identity disputes, and every triple can coordinate, the system as a whole may still lack a globally consistent identity protocol. The obstruction is structural, residing in the topology of the federation, not in any particular data disagreement.
The nerve of the cover is the key invariant: when it has non-trivial higher homotopy, higher cohomology obstructions emerge. This connects the formal theory to classical algebraic topology in a precise and computable way.
L.4.3 The Cohomological Hierarchy: A Classification
The results of L.4, L.4.1, and L.4.2 fit into a single classification:
| Site Structure | Nerve Topology | Operational Meaning | |||
|---|---|---|---|---|---|
| Hierarchical (tree) | Contractible | Global sections | $0 | __CURRENCY_5__ | No ambiguity; hierarchy resolves all | |
| Flat peer-to-peer | (wedge of circles) | Partial globals | Non-trivial | $0$ | Identity ambiguity; finitely many resolutions |
| Federated (overlapping authorities) | or higher | Partial globals | May be non-trivial | Non-trivial | Meta-obstruction; coordination strategy conflict |
| Fully connected | Contractible () | Global sections | $0 | __CURRENCY_8__ | Total overlap; everyone sees everything |
The fully connected case is as acyclic as the hierarchical case, but for the opposite reason: in a tree, siblings share nothing; in a complete graph, everyone shares everything. Both extremes are cohomologically trivial. The interesting (and realistic) cases lie between these extremes — partial overlap, partial authority, partial sharing. These are exactly the structures that arise in multi-agent AI systems, federated databases, and inter-organizational data sharing.
The cohomological hierarchy provides a quantitative topology of organizational coherence cost. An architect choosing between a hierarchical and federated design is choosing a point in this hierarchy, with precise consequences for the complexity of identity resolution.
L.5 Vocabulary Evolution: Composability and Its Limits
is the category of signatures (finite sets of typed predicate/function symbols) with morphisms the signature inclusions .
If and are both conservative extensions (A17b), then is conservative.
Let be a -sentence with . Since is also a -sentence, conservativity of yields . Conservativity of then yields . The converse is monotonicity.
This composability is what makes incremental vocabulary evolution safe. A chain of conservative extensions is conservative. You verify each step; the chain is automatic.
But overlap agreement does not compose. This is the central tension in the theory of vocabulary evolution, and we state it as a theorem:
There exist admissible extensions and , each satisfying all three obligations of A17, such that fails Obligation 2 (overlap agreement).
Construction. Let have three objects: , , and . Let contain a sort (dresses).
Define (a scoring predicate) with:
- In :
- In :
- On overlap : agreement holds (same definition).
Define with:
- In : (thresholded from )
- In : (independent of )
- On overlap : agreement holds — both views happen to agree on the items in the overlap.
Individually, passes Obligation 2 (same definition in both views), and passes Obligation 2 (agreement on overlap for the current items).
Now add both. The compound predicate is derivable in . In , this means , which simplifies to . In , this means . On the overlap, these may disagree: an item with and satisfies the -version but not the -version.
The interaction between and creates a derived predicate that fails overlap agreement, even though each individually passed.
This non-composability theorem is the formal reason why the coherence cost model (A21) exhibits quadratic scaling. Each new predicate must be checked against all existing predicates on all overlaps, not just in isolation. The monad multiplication — composing two rounds of predicate invention — requires a full re-verification of Obligation 2 for the composite.
For the agentic substrate: this means that autonomous agents cannot safely invent vocabulary in parallel and then merge the results. Vocabulary invention is inherently sequential at the overlap-checking stage. An agentic system that invents predicates concurrently must synchronize at the point of overlap verification. This is a structural limit, not an engineering deficiency.
L.6 The Predicate Invention Monad
Despite the non-composability of Obligation 2, predicate invention has a well-defined algebraic structure when the full A17 pipeline (including re-verification) is included. We develop this structure in three stages: the free monad of unconstrained proposals, the quotient that enforces admissibility, and the resulting algebraic characterization.
L.6.1 The Proposal Endofunctor
Define the proposal endofunctor by:
where specifies the sort, arity, and local definition of in each context. sends a signature to the set of all single-predicate extension proposals (without checking admissibility). On morphisms: an inclusion maps a -proposal to the -proposal when , and discards it otherwise (the proposed predicate already exists).
The free monad on the endofunctor is defined by:
An element of is a finite sequence of extension proposals applied to . The monadic structure:
- Unit embeds as the empty sequence of proposals.
- Multiplication flattens a sequence-of-sequences into a single sequence by concatenation.
is the free monad on in the sense of the universal property: for any monad and natural transformation , there exists a unique monad morphism extending .
L.6.2 The Admissibility Quotient
The free monad allows any sequence of proposals. The predicate invention monad is the quotient that enforces the three obligations of A17.
Define the admissibility relation on : two proposal sequences are equivalent if they yield the same final signature and both pass (or both fail) the A17 admissibility check. Define:
ordered by inclusion. There is a surjective monad morphism that sends each proposal sequence to its composite extension (if admissible) or discards it (if not). The monadic structure:
- Unit — the identity extension (always admissible).
- Multiplication — compose extensions and re-verify Obligation 2 for the composite. is well-defined because conservative extension composes (L.5) and Obligations 1 and 3 are monotone in signature; only Obligation 2 requires re-checking.
The Kleisli category has:
- Objects: signatures
- Morphisms : admissible extensions
- Composition: extension-then-re-verify
This is the category of vocabulary evolution paths. A morphism in is a certified route from one vocabulary to another.
is a quotient monad of . Specifically, there is a surjective monad morphism whose kernel is the congruence generated by two relations:
- Path independence: when both orderings yield the same composite extension
- Admissibility filtering: when the composite fails any obligation of A17
Consequently, the category of -algebras is a reflective subcategory of -algebras, consisting of those -algebras where the Obligation 2 equations hold.
That is a monad morphism: We must show commutes with unit and multiplication. For the unit: . For multiplication: let be a sequence in , consisting of a sequence of sequences of proposals. Then = the composite of the flattened sequence, and = the composite of the composites. Since extension composition is associative (signature union is associative), these agree when both are admissible. When either is inadmissible, both map to .
Surjectivity: Every admissible extension with is the image of the proposal sequence under .
Kernel characterization: Two proposal sequences have the same image under iff they yield the same composite signature (path independence) or both are inadmissible (admissibility filtering). These generate a congruence on because both relations are compatible with the monad multiplication (re-verification depends only on the composite, not the path).
Reflective subcategory: An -algebra is a signature equipped with an action — a way to "absorb" admissible extensions. This is a -algebra that additionally satisfies: whenever two proposal sequences yield extensions that individually pass A17 but whose composite fails Obligation 2, the algebra's action must reject the composite. The reflector is the functor that takes a -algebra and quotients by the Obligation 2 relations.
The monad laws hold:
- Left unit: (extending by nothing, then composing, is identity).
- Right unit: (composing with the identity extension is identity).
- Associativity: — this holds because re-verification of Obligation 2 for the composite is independent of the order in which we compose three extensions. The overlap structure depends only on the final signature, not on the path taken to reach it.
The last point is significant: the cost of re-verification may depend on the path (some orderings may allow caching), but the result does not. The monad captures what is invariant (the admissibility condition); the cost model (A21) captures what varies (the verification effort).
L.6.3 The Algebraic Content of Non-Composability
The quotient structure makes the non-composability theorem (L.5) algebraically precise.
is not a free monad on any endofunctor. Equivalently: the kernel of is non-trivial; it contains proposal sequences that are admissible individually but inadmissible in combination.
If were free on some endofunctor , then every -algebra would be determined by a -action, with no additional equations. But the Obligation 2 constraint imposes equations that depend on pairs of proposals and their interaction on overlaps — equations that cannot be captured by the structure map of a single endofunctor. Specifically: the non-composability theorem (L.5) exhibits two proposals such that and , but .
In a free monad on endofunctor , if and , then (the monad multiplication is total). The predicate invention monad's multiplication is partial on the underlying set: not every pair of admissible extensions composes to an admissible extension. This partiality, formalized as a non-trivial kernel in , is the algebraic signature of non-freeness.
The precise obstruction: is presented by the generators and the relations (Obligation 2 failures), making it a quotient rather than a free monad. This is analogous to how a group presented by generators and relations is not a free group unless the relations are trivial.
This characterization resolves a question implicit in the earlier formulation: why can't we "just" invent predicates in parallel? The answer is algebraic: is not free, and the non-freeness comes precisely from the inter-predicate constraints of Obligation 2. A free monad would allow unrestricted parallel composition. The quotient structure forces sequential verification at the overlap boundary.
This also explains the cost model: the coherence budget (A21) is computing the size of the kernel of , restricted to a given overlap structure. A larger kernel means more inadmissible combinations, hence more verification work per predicate added. The quadratic scaling of Obligation 2 checking is a consequence of the kernel growing quadratically with signature size.
L.7 Relation to Existing Frameworks
The coherence topos framework occupies a specific position in the landscape of categorical approaches to data integration and distributed systems. We make the comparisons explicit to identify precisely what is shared, what is new, and what remains open.
L.7.1 Spivak's Functorial Data Migration
Spivak's program(Spivak 2012)David I. Spivak, "Functorial Data Migration," Information and Computation 217 (2012): 31–51.View in bibliography models databases as functors from a schema category (encoding tables, columns, and foreign keys) to (the actual data). Data migration between schemas and is a functor inducing three adjoint operations:
where is pullback (direct image), is left Kan extension (existential migration), and is right Kan extension (universal migration).
What the coherence topos shares with Spivak: Both use category theory to formalize data integration. Both treat schemas as categories and data as functors. The restriction maps of our presheaves correspond to Spivak's pullback functors .
What the coherence topos adds that Spivak does not:
-
Vocabulary invention. Spivak's framework migrates data between fixed schemas. The functor exists before migration begins. In our framework, the signature itself evolves: agents invent new predicates, and the admissibility of the invention is the central question. Spivak has no analog of Obligation 2 (overlap agreement for invented predicates) because his schemas do not grow during operation.
-
Scoped truth and non-Boolean logic. Spivak's instances are -valued functors: a row either exists or does not. Our sheaves carry epistemic status (A4): claims can be true, false, undetermined, or in conflict, with the logic varying by context (A15). The subobject classifier of the coherence topos (L.2) subsumes this; Spivak's -valued model does not.
-
Cohomological obstruction theory. Spivak does not develop obstruction theory for migration. When fails (the pullback does not exist or is trivial), the failure is unstructured. Our computation (L.4) provides a classification of the distinct ways migration can fail, with a group structure on the failure modes. The acyclicity theorem (L.4.1) and the meta-obstruction (L.4.2) have no analogs in Spivak's work.
-
Cost accounting. Spivak's adjunctions are "free" — there is no cost model for migration. Our coherence budget (A21) makes the cost of maintaining sheaf conditions explicit, and the quadratic scaling of Obligation 2 checking (a consequence of L.5's non-composability) quantifies the engineering tradeoff.
Spivak's framework is the right foundation for structural data migration: moving data between known schemas with known relationships. The coherence topos is designed for the harder problem: semantic data integration where the schemas themselves are evolving, the relationships are being discovered (not given), and the correctness of the discovery must be certified against formal obligations.
A precise connection: the Kleisli category of the predicate invention monad (L.6) can be viewed as a category of schemas with certified evolution paths. Spivak's functors correspond to morphisms in where the evolution is a single-step conservative extension. The framework developed here extends Spivak's to the setting where schemas evolve under formal governance.
L.7.2 Goguen's Sheaf Semantics
Goguen(Burstall 1992)Joseph A. Goguen and Rod M. Burstall, "Institutions: Abstract Model Theory for Specification and Programming," Journal of the ACM 39, no. 1 (1992): 95–146.View in bibliography proposed sheaves as a semantics for concurrent interacting objects, where each object has a local state and objects interact by sharing state on overlaps. This is the closest ancestor to our use of sheaves.
What we share with Goguen: The core insight — sheaves formalize when local information composes into global information — is Goguen's. Our site structure is a descendant of his interaction sites.
What we add: Goguen's sheaves are on fixed interaction structures. He does not develop: predicate invention (the site's presheaf growing during operation), the non-composability of overlap agreement (L.5), obstruction cohomology as a classification of integration failures (L.4), or the monad structure of vocabulary evolution (L.6). Goguen also does not develop the connection to model-theoretic conservativity (A17b), which is essential for the safety guarantees of predicate invention.
L.7.3 Abramsky's Sheaf-Theoretic Contextuality
Abramsky and Brandenburger(Abramsky 2011)Citation not found: abramsky2011View in bibliography use sheaf theory to formalize contextuality in quantum mechanics: a family of local measurements is contextual if it has no global section — a presheaf that fails the sheaf condition. Their Čech cohomology detects contextuality, with implying strong contextuality.
What we share with Abramsky: The Čech cohomology machinery and the interpretation of as measuring obstruction to global consistency. Our computation (L.4) follows the same pattern.
What differs: Abramsky's presheaves are empirical models — probability distributions on measurement outcomes. Ours are data claims — assertions by computational agents about shared entities. The obstruction in Abramsky is physical (no hidden-variable model exists); ours is semantic (no consistent global identity assignment exists). The mathematics is the same; the domain and operational consequences are different. Critically, we develop the higher cohomology (, L.4.2) and the structural classification (L.4.3), which Abramsky does not pursue in the same setting.
L.7.4 Caramello's Toposes as Bridges
Caramello's program(Caramello 2018)Olivia Caramello, Theories, Sites, Toposes: Relating and Studying Mathematical Theories through Topos-Theoretic `Bridges' (Oxford: Oxford University Press, 2018).View in bibliography uses Morita equivalence of toposes as a tool for transferring results between mathematical theories. Two theories are "Morita equivalent" if they classify the same topos, and the topos serves as a "bridge" for transferring invariants.
Connection to our work: Problem 1 in L.8 asks for the geometric theory classified by the coherence topos. If this theory can be identified, Caramello's bridge technique would immediately transfer invariants from other Morita-equivalent theories, potentially connecting coherent vocabulary evolution to problems in algebraic geometry, logic, or topology that have been studied independently.
What we add: Caramello's program is a meta-mathematical tool — it relates theories via their classifying toposes. We provide a specific instantiation: the coherence topos, with its specific site, specific presheaves, and specific theorems (acyclicity, non-composability, the monad characterization). Our work provides a concrete object for Caramello's program to analyze.
L.7.5 Capabilities Comparison
The landscape can be summarized in a table of capabilities:
| Capability | Spivak | Goguen | Abramsky | Caramello | This Work |
|---|---|---|---|---|---|
| Sheaf-theoretic coherence | Implicit | Yes | Yes | Meta-level | Yes |
| Vocabulary invention | No | No | No | No | Yes (A17, L.6) |
| Obstruction cohomology | No | No | only | No | through (L.4) |
| Non-composability theorem | No | No | No | No | Yes (L.5) |
| Free monad characterization | No | No | No | No | Yes (L.6) |
| Cost accounting | No | No | No | No | Yes (A21) |
| Scoped non-Boolean logic | No | No | Implicit | Yes | Yes (A15, L.2) |
| Hierarchical acyclicity | N/A | No | No | No | Yes (L.4.1) |
| Multi-agent operational semantics | No | Partial | No | No | Yes (L.9) |
The gap is not that sheaf theory is unapplied to data integration — Goguen applied it in 1992. The gap is that no existing framework addresses the full lifecycle of vocabulary in a distributed system: invention, certification, transport, versioning, cost, and the algebraic structure of the evolution process. Each prior framework addresses a fragment. This work addresses the composition of these fragments under formal guarantees.
L.8 Open Problems for the Mathematical Community
The following problems are precisely stated and, we believe, tractable for researchers in topos theory, HoTT, and categorical logic. They are not speculative — each connects to concrete phenomena in distributed data systems and multi-agent AI.
Problem 1: Classify the geometric theory of the coherence topos. Every Grothendieck topos classifies a geometric theory such that models of in any topos correspond to geometric morphisms . What is for the coherence topos? This theory would axiomatize exactly those structures admitting coherent vocabulary evolution. Connection: Caramello's "bridge" program(Caramello 2018)Olivia Caramello, Theories, Sites, Toposes: Relating and Studying Mathematical Theories through Topos-Theoretic `Bridges' (Oxford: Oxford University Press, 2018).View in bibliography.
Problem 2 (Partially resolved): Cohomology of structured context sites. Section L.4.1 proved for hierarchical sites, confirming the tree-acyclicity conjecture. Section L.4.2 constructed a federated site with , confirming the meta-obstruction conjecture. Remaining open: (a) Compute for random context sites (Erdős–Rényi overlap graphs) and determine the threshold for vanishing. (b) For sites arising from real organizational structures, characterize the relationship between the Betti numbers of the nerve and the operational cost of coherence maintenance. (c) Determine whether the Čech cohomology equals the derived-functor cohomology for context sites with non-Hausdorff nerve (this holds for paracompact nerves by Leray's theorem but may fail in general).
Problem 3: Extend to -toposes. Witnesses (A10) carry structure: kinds, composition, coherence conditions. The correct categorical home may be an -topos where witnesses are 1-morphisms and witness-equivalences are 2-morphisms. Does the coherence topos extend to an -topos? Does the resulting type theory validate a scoped univalence axiom? Connection: Lurie(Lurie 2009)Citation not found: lurie2009View in bibliography, Shulman.
Problem 4: Morita equivalence of context sites. When do two context sites and produce equivalent sheaf categories? This would formalize when two institutional arrangements — different organizations, different view decompositions — provide the same coherence guarantees. A Morita equivalence theorem for context sites would be a formal version of "organizational isomorphism from the coherence perspective."
Problem 5: Decidability frontier for the predicate invention monad. For which fragments of the ambient logic is the admissibility check for predicate invention (A17) decidable? The conservativity check is decidable for propositional and equality fragments, semi-decidable for first-order, undecidable for higher-order (see Appendix K, §K.1.1). What is the precise decidability frontier when overlap agreement (Obligation 2) is included? This connects to classical questions in mathematical logic but in a new setting where the signature itself is evolving.
Problem 6 (New): Eilenberg-Moore category of the predicate invention monad. Characterize the category of -algebras (L.6.2). An -algebra is a signature equipped with a "vocabulary absorption" operation satisfying the monad laws. What are the free -algebras? Can the category of -algebras be described as a variety of algebras (in the sense of universal algebra) with explicit equational axioms? The non-freeness theorem (L.6.3) implies the equational theory is non-trivial; its explicit description would connect to Birkhoff's HSP theorem and the theory of algebraic theories(Lane 1971, ch. VI)Saunders Mac Lane, Categories for the Working Mathematician (New York: Springer-Verlag, 1971), ch. VI.View in bibliography.
Problem 7 (New): Persistent cohomology of evolving context sites. As vocabulary evolves (new predicates are added, overlaps change), the context site changes and its cohomology groups evolve. Does the sequence of cohomology groups over time form a persistence module in the sense of topological data analysis? If so, the persistence diagram would classify the lifetime of obstructions: some ambiguities are transient (resolved by adding a predicate that disambiguates), others are persistent (structural, arising from the federation topology). The barcode of this persistence module would be a novel invariant of vocabulary evolution paths.
L.9 Connection to Agentic Systems
This section connects the mathematical framework to a concrete open problem in multi-agent AI: coherent vocabulary evolution in distributed computational agents.
Current multi-agent AI systems compose outputs (text, actions, tool calls) without composing meaning. Agent A proposes "this product is sustainable." Agent B proposes "this product is eco-friendly." Are these the same predicate? Are they consistent? If Agent C needs to act on both claims, what guarantees does it have?
The coherence topos provides the mathematical infrastructure for answering these questions:
-
Predicate invention (A17, L.6): An agent can propose a new concept. The proposal carries obligations. The monad structure (L.6.1–L.6.3) ensures that sequential inventions compose safely (conservative extension), while the non-composability theorem (L.5) and the non-freeness theorem (L.6.3) identify exactly where synchronization is required. The algebraic content is precise: the kernel of is the set of inadmissible combinations, and its growth rate determines the cost of coordination.
-
Obstruction cohomology (L.4): When agents in different contexts make identity claims about shared entities, measures the irreducible ambiguity. The acyclicity theorem (L.4.1) says hierarchical agent organizations are free of this ambiguity. The meta-obstruction (L.4.2) says federated agent systems face a qualitatively harder problem: not just conflicts, but conflicts about conflict-resolution strategies. The cohomological hierarchy (L.4.3) gives a system architect a precise menu of tradeoffs.
-
Scoped transport (A16, Theorem K.3): An agent's claim is valid within its scope. Transporting that claim to another agent's scope requires a certificate. The topos provides the space of possible certifications (L.3); the engineering problem is constructing one.
-
Cost accounting (A21, Theorem K.4): Coherence has a price. The cost model makes the price explicit. The scope boundary in the coherence budget is the system's declaration of how far it will pay for meaning to compose.
-
Coordination without consensus. The framework does not require agents to share objectives, adopt common logics, or trust one another. It requires only overlap discipline: where two agents' domains intersect, their assertions on the intersection must agree (A13). This is a weaker assumption than shared values, and it is computationally verifiable — the only kind of constraint agents can enforce on each other. Conservative extension (A17b) serves each agent's self-interest: it protects prior commitments. The coherence budget (A21) prices coordination without moralizing it. The sheaf condition is a structural consequence of wanting local outputs to compose globally, not a norm imposed from outside.
The enforcement layer lies outside The Proofs. The conservative extension condition (A17b) is a mathematical specification; Factor Prime (Vol II, Ch 17) provides an enforcement mechanism — collateralized bonds whose thermodynamic cost makes defection expensive without requiring trust; The Sovereign Syntax (Vol III, Epilogue) provides the verification artifact — the receipt that gives affected parties standing to contest.
-
Landscape position (L.7): This is not a reimagining of Spivak's functorial data migration or Goguen's sheaf semantics. It is their extension to the setting where schemas evolve, agents invent vocabulary, and the cost of coherence is a first-class citizen. The comparison table (L.7.5) makes the precise contribution explicit.
The target is a substrate where computational agents can invent, certify, transport, and version predicates under formal guarantees — where inter-agent coherence is a checkable property with a computable cost. The mathematics developed here provides a specification. The engineering required to realize it at scale remains substantial.