Vocabulary Operators

Controlled expansion of what can be said

9 min read

A11Prose proofHow a formal system extends its expressive power without breaking coherence.

This interlude serves as a worked example within The Proofs, applying the Part II machinery -- invariants (A7), isomorphisms (A8), adjunctions (A9), and witnessed sameness (A10) -- to the problem of vocabulary evolution. It defines A11 (Vocabulary Operator), which formalizes the feedback loop through which predicates are invented, tested, and refined, subject to the constraint that extensions must be conservative where possible and explicitly witnessed where not. The motivating scenario of evolving product taxonomies is drawn from Vol I, Chapter 5 (Empire of Tables); here it receives formal treatment as a controlled strange loop stabilized by the invariants and equivalences of the preceding chapters.

The Vocabulary Loop

A merchandiser at the fashion catalog company coins a new term: "cottagecore aesthetic." The term enters the product database as a tag. Buyers start searching for it. The search logs reveal which products customers expect to match. Some matches surprise the merchandiser. Others are missing. The definition gets refined. New products get tagged. The refined definition changes what customers find, which changes what they search for, which changes what the merchandiser observes about customer intent.

This is not a bug—it is the structure of any system where vocabulary is not fixed in advance. Hofstadter called such self-modifying feedback a "strange loop."(Hofstadter 1979)

The loop has a shape familiar from cognitive science, where categorization and vocabulary co-evolve(Sander 2013):

Observation: The system sees patterns in data or behavior.
Predicate invention: Someone (human or automated) introduces a new distinction.
Query: The new predicate enables new questions.
New observation: Answers to those questions reveal something previously invisible.
Refinement: The predicate's definition changes in response.
Repeat.

In a static system, vocabulary is given. In a living system, vocabulary evolves. The Third Mode must handle both, which means it must handle the transition from one vocabulary to another without losing coherence.

The danger is obvious. If predicates can change, and predicates determine what counts as true, then "truth" becomes a moving target. A query that returned 47 items yesterday might return 52 today, not because the data changed, but because the definition of "cottagecore" drifted.

The solution is not to freeze vocabulary—that would kill the system's ability to learn—but to control the loop: to specify what changes are permitted, what invariants must hold across changes, and what witnesses must accompany any extension.

The Vocabulary Operator

A11

A vocabulary V consists of:

A signature Σ (sorts, function symbols, relation symbols)
A set of defined predicates (built from Σ using some definitional mechanism)
A logic L that determines consequence

A vocabulary extension V → V′ adds new symbols or predicates to V.

The extension is conservative over invariant set I iff:

For every statement φ expressible in V: if V′, I ⊢ φ, then V, I ⊢ φ.

In plain terms: adding new vocabulary does not change what was already true in the pinned vocabulary version. Old truths remain true; old falsehoods remain false. The extension adds expressive power without rewriting history.(Shoenfield 1967)

Conservativity is evaluated relative to a pinned vocabulary version. Redefining a predicate is not an extension but the introduction of a new symbol $P_{v+1}$ plus witnesses relating it to $P_v$ .

The feedback loop, formally:

\text{observations} \xrightarrow{\text{pattern}} \text{candidate predicate} \xrightarrow{\text{test}} \text{queries} \xrightarrow{\text{evidence}} \text{refinement}

Stabilization requirement: Each step in the loop must either:

Be conservative relative to a pinned $V_t$ , or
Declare a breaking change with explicit migration semantics

The vocabulary operator V ↦ V′ is not a function you call once but a discipline you maintain. Every time you add a predicate, you must answer: does this preserve the invariants? Does it respect existing equivalences? If a witness w says A ∼ B, and you add predicate P, does P(A) = P(B)?

That last question connects A11 back to A10. Witnessed equivalences constrain vocabulary extension. If the system has committed to "navy" ∼ "dark blue" in some scope, then a new predicate that distinguishes them in that scope is a breaking change. It must be flagged, scoped, or rejected.

The "Puffy" Loop

Consider the touchstone case: T6 (Predicate Invention).

A buyer searches for "puffy dresses." The system has no predicate for puffiness. Someone proposes one: a dress is puffy if its silhouette volume exceeds a threshold relative to body fit.

Iteration 1: The predicate is deployed. Queries run. Results include ball gowns, puffer-jacket dresses, and some ruffly items. Users provide feedback: the ball gowns are right, the puffer jackets are wrong (that's "quilted," not "puffy"), the ruffles are ambiguous.

Iteration 2: The predicate is refined: puffy requires volume from fabric structure (tulle, organza, layering), not from insulation or hardware. The quilted items drop out. The ruffles are tagged for human review.

Iteration 3: Edge cases surface. A dress with dramatic sleeves but a fitted bodice. Is that puffy? The merchandiser decides: puffiness is a property of the overall silhouette, not individual elements. The definition tightens.

Each iteration is a vocabulary change. The predicate "puffy_v1" is not the same as "puffy_v2." If the system treats them as identical, it will produce inconsistent query results across versions. If it treats them as completely unrelated, it loses continuity.

The solution is versioned predicates with explicit compatibility witnesses. The system records a witness $w: P_{v2} \preceq P_{v1}$ , where $\preceq$ is the refinement preorder on predicates: $P' \preceq P$ iff $\forall x.\, P'(x) \Rightarrow P(x)$ . Every item satisfying puffy_v2 satisfies puffy_v1, but not conversely. That witness is adjunction-shaped in the sense of A9: transport from v2 to v1 is conservative, but the reverse direction is approximate and requires review. Items that were "puffy" under v1 may not be under v2. A breaking change is a witness that flips some transports from Valid to Unknown (in the sense of A10) until a migration produces a new witness. The migration semantics are explicit: re-tag, human review, or grandfather clause.

Why the Loop Needs Brakes

Without constraints, the vocabulary loop produces drift. Drift is not random; it follows the path of least resistance, which is usually the path of maximum convenience for whoever is coining terms this week. Over time, the vocabulary becomes a palimpsest of local decisions, none of which respect global coherence.

The brakes are:

Invariants (from A7). Some properties must survive any vocabulary change. If your system declares (for merchandising purposes) that "minimalist" and "maximalist" are mutually exclusive, then no predicate extension can introduce an item that violates that constraint. The invariant is a hard brake on the extension.

Conservative extension (from A3b). Where possible, new predicates should not change old truths. This is not always achievable, but it is the default expectation. Non-conservative extensions require justification.

Witnessed equivalences (from A10). If the system has committed to an equivalence with a witness, new predicates must respect the transport rules. A predicate that breaks transport is a breaking change.

Scope (from A10). A vocabulary extension can be scoped. "Puffy_v2" applies in the 2026 catalog; "puffy_v1" applies in the 2025 archive. The versions coexist in different scopes. Transport between scopes requires an explicit witness.

The loop is not stopped. It is governed.

Cottagecore Drift

The merchandiser's "cottagecore aesthetic" term drifts over three seasons.

Spring 2025: Cottagecore = floral prints + natural fibers + relaxed silhouettes. The definition is ostensive: here are 50 exemplars.

Fall 2025: A new designer collection uses "cottagecore" to describe structured linen blazers. The term stretches. Some buyers complain: "That's not cottagecore." The merchandiser adds a soft constraint: cottagecore prefers relaxed over structured. The blazers remain tagged but rank lower in cottagecore searches.

Spring 2026: "Dark cottagecore" emerges as a subgenre: same silhouettes, darker palette, gothic motifs. Is it still cottagecore? The merchandiser creates a hierarchy: cottagecore_classic, cottagecore_dark, with cottagecore as the parent. The parent inherits constraints from both children. The extension is conservative: anything that was cottagecore_classic is still cottagecore.

At each step, the vocabulary operator fires. The question each time: what invariants must hold? What equivalences are preserved? What migrations are required?

Without the discipline, "cottagecore" becomes a useless term: everything is cottagecore if someone tags it that way. With the discipline, "cottagecore" becomes a structured predicate with versions, scope, and transport rules. The drift is not prevented; it is documented and governed.

Touchstones in Motion

T6 (Predicate Invention): The vocabulary loop is T6 instantiated. Every new predicate is a vocabulary extension. The loop is the lifecycle: propose, deploy, observe, refine, version.

T9 (Schema Evolution): Predicate versioning is schema evolution at the semantic level. puffy_v1 → puffy_v2 is a schema migration. The question "employees vs contractors → workers" is the same structure: old predicates, new predicates, compatibility witnesses, migration paths.

The touchstones are not static failure cases but dynamics. The vocabulary loop is where those dynamics live.

Replace "cottagecore" with "critical vulnerability" or "SEV-1 incident": the loop is identical, and the cost of unmanaged drift is higher.

Consequence

Part II established a calculus of sameness: invariants, isomorphisms, adjunctions, witnessed equivalences. But that calculus assumed a fixed vocabulary. Real systems do not have fixed vocabularies. They learn, adapt, and drift.

The vocabulary operator A11 extends the calculus to handle change. A vocabulary extension is a structured event with constraints (conservative where possible), witnesses (respect existing equivalences), and scope (version if necessary). The feedback loop is not chaos; it is a controlled strange loop, stabilized by invariants.

Part III will ask: what is a scope? What is a context? How do local vocabularies compose? The vocabulary operator prepares that question. A vocabulary is not global but indexed by context. Different views may have different predicates. The coherence requirement is not that they share vocabulary, but that their overlaps are compatible.

The loop spins; the brakes hold. The vocabulary grows without making yesterday's answers un-auditable.