Scoped Equivalence

NYC is not always NYC

13 min read

A30Prose proofWhen two things are the same in one context and different in another.

The reference of "evening star" would be the same as that of "morning star," but not the sense.
— Gottlob Frege, 'On Sense and Reference' (1892)

This chapter formalizes scoped equivalence as a typed, witnessed coercion with proof obligations, defining Anchor A30 (Scoped Equivalence). The central construction treats contexts as a poset under refinement and scopes as downward-closed regions within it; an equivalence between entities is valid only when the current context lies within the declared scope. A30 closes the loop opened by A10 (Witnessed Equivalence) and A16 (Transport Certificates), adding enforcement discipline via composition rules, scope violation artifacts, and an equivalence admission procedure. The narrative motivation appears in Volume I, Chapters 6 and 8, where context-free substitution is diagnosed as a systematic failure mode of both the string and schema paradigms.

The Substitution That Seemed Safe

A real estate search system merges two data sources. One contains Manhattan apartment listings; the other contains five-borough property records. Both use location fields. The integration team writes:

if location.city == "NYC" or location.city == "New York City":
    # treat as same

Plausible. Defensible. Wrong.

In the postal database, "NYC" and "New York City" denote the same entity: the five-borough municipality, incorporated in 1898, with ZIP codes spanning Manhattan through Staten Island. In the Manhattan listings, "NYC" means something narrower: the island, the market, the colloquial center. The terms look identical. Their denotations diverge.

When the system substitutes one for the other unconditionally, it conflates a Manhattan-only search with a five-borough search. Users looking for downtown apartments see listings in the Bronx. Users looking for Brooklyn properties see nothing, because the Manhattan source never uses "New York City" at all. Both failures are silent. Both trace to the same cause: a context-free substitution where context determines meaning.

This is T7, the seventh touchstone. We named it in Chapter 5: "Equivalence requires manual normalization." The pathology is not missing data. It is missing structure. The system has no way to say: this substitution is valid here but not there.

A30 makes that structure explicit.

Terms and Entities

The first mistake is conflating terms with entities.

A term is a string: "NYC," "New York City," "Big Apple." Terms appear in documents, listings, query inputs, and user interfaces. They are syntax.

An entity is a referent: the municipality, the island, the colloquial region. Entities are what terms denote. They are semantics.

The relationship between terms and entities is mediated by context. In one context, "NYC" denotes the five-borough city. In another, it denotes Manhattan. The term is the same. The denotation differs. The two-step model makes this precise:

Two-Step Model

Step 1: Denotation Map

Given term $t$ and context $C$ , the denotation map returns an entity:

\llbracket t \rrbracket_C \to \text{Entity}

Step 2: Entity Equivalence

Equivalence is defined between entities, not terms:

e_1 \sim_S e_2

where $S$ is the scope in which the equivalence holds.

The NYC problem becomes precise:

Context	⟦"NYC"⟧	⟦"New York City"⟧	Same referent?
Postal	nyc@USPS	new_york_city@geocoder	NO (different IDs, same referent)
Real estate	manhattan@MLS	city_of_new_york@geocoder	NO (different referents)
Legal	nyc@registry	new_york_city@registry	NO (different IDs, same referent)

Postal and legal contexts require A30: the terms denote different entity IDs from different sources for the same underlying referent, so scoped equivalences bridge them. Real estate fails earlier: the terms denote genuinely different referents (manhattan vs city_of_new_york), and no equivalence should exist.

The postal case is not trivial. "NYC" from USPS data and "New York City" from OpenStreetMap are different entity IDs. Substitution requires a scoped equivalence:

\text{nyc@USPS} \sim_{\text{postal}} \text{new\_york\_city@geocoder}

with witness and transport certificate. A30 does real work.

In real estate context, the terms denote genuinely different entities (manhattan vs city_of_new_york). No equivalence exists or should exist. The system that ignores this distinction will conflate manhattan with city_of_new_york. That is not imprecision. That is a category error.

Scope as Proof Obligation

A10 introduced witnessed equivalence with a scope field. A16 established that transport along equivalences requires certificates. A30 closes the loop: scope is not annotation. It is enforcement.

A30

A30: Scoped Equivalence

Context Structure: Contexts form a poset $(C, \leq)$ under refinement. $C_1 \leq C_2$ means $C_1$ is more specific than $C_2$ . A scope $S$ is a downward-closed region in this poset: if $C \in S$ and $C' \leq C$ , then $C' \in S$ .

Entity Equivalence:

e_1 \sim_S e_2 := \text{witnessed equivalence between } e_1 \text{ and } e_2 \text{, valid in scope } S

Components:

ScopedEquivalence = {
  lhs: Entity,
  rhs: Entity,
  scope: Scope,                      // downward-closed region in context poset
  kind: identity | isomorphism | equivalence | approximation,
  witness: EquivalenceWitness,       // evidence
  transport_certificate: {
    properties_preserved: [Property...],
    properties_not_preserved: [Property...]
  }
}

Coercion Formalization:

An equivalence witness $w$ for $e_1 \sim_S e_2$ is a coercion:

\text{coerce}_{w,S} : e_1 \to e_2

with proof obligation: $\text{current\_context} \in S$ .

A scope violation is a failed proof obligation. It is not a warning. It is a hard error.

Transport Rule:

For any substitution $e_1 \to e_2$ in context $C$ :

Retrieve witness $w$ for $e_1 \sim_S e_2$
CHECK: $C \in S$ (scope membership)
CHECK: required_properties $\subseteq$ transport_certificate.properties_preserved
If pass: coercion permitted, produce TransportReceipt
If fail: produce ScopeViolation artifact

Naming convention: A context (e.g., postal_context) is a specific point in the poset where a query executes. A scope (e.g., postal_scope) is a downward-closed region containing one or more contexts. The check postal_context ∈ postal_scope asks whether the query's context lies within the equivalence's valid region.

The phrase "type error" is not metaphor. The witness is a coercion between types. Using the coercion requires discharging a proof obligation. If the current context is not in scope, the obligation fails. The system refuses the substitution.

What Violations Look Like

Example(Scope Violation Artifact)

Operation: Substitute "New York City" for "NYC"
Context: real_estate_context

ScopeViolation = {
  attempted_substitution: ("NYC", "New York City"),
  in_context: real_estate_context,
  denotations_in_context: {
    lhs: entity:manhattan@MLS,
    rhs: entity:city_of_new_york@geocoder
  },
  reason: "Terms denote different entities in this context",
  
  candidate_witnesses: [
    { 
      witness_id: "postal-nyc-equiv",
      equivalence: "nyc@USPS ~_{postal} new_york_city@geocoder",
      applicable: false,
      reasons: [
        "real_estate_context ∉ postal_scope",
        "denotations differ: manhattan@MLS ≠ nyc@USPS"
      ]
    }
  ],
  
  remediation_options: [
    "Narrow query to postal context (where A30 equivalence applies)",
    "Query without substitution, using 'NYC' as written",
    "Provide evidence that manhattan ~= city_of_new_york in real estate context"
  ]
}

The artifact is auditable. It explains why the substitution failed. It shows which witnesses exist and why they do not apply. It offers paths forward. The user or downstream system can make an informed decision.

Example(Valid Substitution via A30)

Operation: Substitute "New York City" for "NYC"
Context: postal_context

System:
  1. Compute denotation: ⟦"NYC"⟧_{postal} = entity:nyc@USPS
  2. Compute denotation: ⟦"New York City"⟧_{postal} = entity:new_york_city@geocoder
  3. Different entity IDs; search for equivalence
  4. Found: nyc@USPS ~_{postal_scope} new_york_city@geocoder
  5. Check proof obligation: postal_context ∈ postal_scope? YES
  6. Coercion valid

TransportReceipt = {
  substitution: ("NYC", "New York City"),
  in_context: postal_context,
  denotations: (entity:nyc@USPS, entity:new_york_city@geocoder),
  equivalence_used: "nyc@USPS ~_{postal} new_york_city@geocoder",
  witness: postal_authority_ruleset,
  properties_preserved: [delivery_zone, mailing_address],
  properties_not_preserved: [source_id]
}

The postal substitution is not trivial. The terms denote different entity IDs from different sources. A30 bridges them with a scoped equivalence, produces a receipt, and documents which properties survive transport. The system earns the substitution; it does not assume it.

Compare to the alternatives:

String matching would substitute unconditionally. Silent conflation.
Equivalence table without scope would find the postal equivalence and apply it everywhere. Same failure.
No equivalence recorded would treat the terms as different everywhere. Silent separation, missing valid matches in postal context.

A30 navigates between conflation and separation by making scope the arbiter.

Composition Rules

Identity graphs compose. If $e_1 \sim_S e_2$ and $e_2 \sim_T e_3$ , what is the relationship between $e_1$ and $e_3$ ?

Without explicit rules, transitivity leaks scope. The system might conclude $e_1 \sim e_3$ with unbounded scope, reintroducing the very conflation A30 was designed to prevent.

Composition Rules

Transitive Closure:

If $e_1 \sim_S e_2$ and $e_2 \sim_T e_3$ , then:

e_1 \sim_{S \cap T} e_3

The composed equivalence holds only where both original equivalences hold. Scope is intersection, not union.

Conditions:

Transport certificates must compose (preserved properties intersect)
Witnesses must be compatible (not contradictory)

If conditions fail: no composed equivalence. The system stores the path but does not claim the composition.

Kind Meet:

When composing equivalences of different kinds, the result is the weaker kind:

Kind 1	Kind 2	Composed Kind
identity	identity	identity
identity	isomorphism	isomorphism
identity	approximation	approximation
isomorphism	approximation	approximation

Transport becomes more restrictive: the intersection of preserved properties.

Multiple Equivalences:

If $e_1 \sim_S e_2$ and $e_1 \sim_T e_2$ with $S \neq T$ (same pair, different scopes), the system selects the tightest scope containing the current context. Scopes are ordered by set inclusion; "tightest" means minimal-by-inclusion among scopes that contain $C$ .

If $C \in S$ and $C \notin T$ : use S-equivalence
If $C \in S$ and $C \in T$ and $S \subset T$ : use S-equivalence (tighter)
If $C \notin S$ and $C \notin T$ : ScopeViolation with both as candidate_witnesses

These rules prevent the "just compose everything" failure. Scope flows correctly through chains. The system never claims more than the evidence supports.

Admitting Equivalences

Scoped equivalences are not declared by fiat. They are admitted with evidence, following the same discipline as predicate admission (A29).

Equivalence Admission

EquivalenceAdmission $(e_1 \sim_S e_2)$ requires:

Positive Evidence: Witness supporting the equivalence (authority ruling, ontology, measurement, declaration)

Counterexamples: Known contexts where the equivalence does NOT hold. These define the scope boundary.

CounterexampleWitness = {
  context: C,
  reason: "different denotations" | "different properties" | "authority conflict",
  evidence: Reference
}

Scope Declaration: Explicit scope $S$ , defined as region in context poset. Scope must exclude counterexample contexts.
Transport Certificate: Properties preserved and not preserved under substitution.
Authority Tier: Who can admit this equivalence (local, organizational, global).

Result:

Admitted $(e_1 \sim_S e_2$ , EquivalenceReceipt $)$
Rejected(reason, conflicting_evidence)
Provisional(conditions)

Scope widening follows the same pattern as predicate promotion. To extend $e_1 \sim_S e_2$ to $e_1 \sim_{S'} e_2$ where $S \subset S'$ :

New evidence required for $S' \setminus S$
If counterexample exists in $S' \setminus S$ : widening rejected

Example(Scope Widening Rejection)

ScopeWideningRequest = {
  equivalence: "nyc@USPS ~_{postal_scope} new_york_city@geocoder",
  current_scope: postal_scope,
  requested_scope: all_contexts
}

Result: REJECTED

CounterexampleWitness = {
  context: real_estate_context,
  reason: "In real estate, 'NYC' denotes manhattan@MLS; 
           'New York City' denotes city_of_new_york@geocoder. 
           Different entities.",
  evidence: [real_estate_listing_analysis_ref]
}

Remediation: "Widening to all_contexts impossible while counterexample exists.
              You may admit a separate witness for legal_scope, giving two 
              equivalences whose combined coverage is (postal ∪ legal)."

Counterexamples are first-class artifacts. They define scope boundaries. "Widening to (postal ∪ legal)" means admitting two witnesses with evidence for each scope, not unioning a single witness. The system stores multiple scoped equivalences; union is a property of their combined coverage, not an operation on witnesses.

Three Failure Modes

T7 named the pathology: context-free equivalence handling. A30 prevents three specific failure modes:

Silent Conflation: The system treats $x = y$ unconditionally. In some contexts, this is wrong. The real estate integration conflates manhattan and city_of_new_york because it lacks scope. Users see wrong results. No artifact records the error.

A30 prevents conflation by requiring scope checks. The coercion fails if the context is not in scope. The failure is explicit.

Silent Separation: The system treats $x \neq y$ unconditionally. In some contexts, this is wrong. A postal search fails to find "New York City" records when querying "NYC" because no equivalence is recorded. Users miss valid results. No artifact records the gap.

A30 prevents separation by recording scoped equivalences. Where the equivalence holds (postal context), the system expands queries and merges results. The expansion is witnessed.

Scope Leak: The system uses $x \sim_S y$ transport in context $T$ where $T \notin S$ . The equivalence is recorded but applied too broadly. Scope leak is reusing a valid equivalence in the wrong scope. This is the integration team's error: they found a postal equivalence and applied it in real estate context. The system does not catch the mismatch.

A30 prevents scope leak by making scope a proof obligation. The coercion $e_1 \to e_2$ requires $C \in S$ . If the check fails, the system refuses. No silent leak.

The NYC example generalizes: "blue" and "navy" are equivalent in search context (queries expand, recall improves) but distinct in inventory context (counts must not conflate). "Dr. Smith" in a hospital and "Dr. Smith" in a university share a string but denote different entities; the system resolves this at the denotation step before equivalence is ever considered. A28 handles sense disambiguation; A30 handles scope enforcement. Together, they give a complete discipline for contextual sameness.

What the User Sees

Example(Cross-Context Query)

Query: "Properties in NYC"
Candidate contexts: [real_estate_context, postal_context]  // inferred from data sources
Status: Ambiguous; user must select declared context

System response:

ContextDisambiguation = {
  term: "NYC",
  denotation_by_context: {
    real_estate_context: entity:manhattan@MLS,
    postal_context: entity:nyc@USPS
  },
  message: "Query spans multiple contexts where 'NYC' denotes different entities.",
  options: [
    { context: real_estate_context, meaning: "Manhattan", result_preview: "847 listings" },
    { context: postal_context, meaning: "All five boroughs", result_preview: "4,231 listings" }
  ],
  default_action: "none (context declaration required)"
}

The user sees choices, not conflation. The system does not guess. It presents the divergence and requests a decision. The decision is recorded. The result is scoped. The entire chain is auditable.

Consequence

Equivalence is not a fact about strings but a relation between entities, indexed by context, constrained by scope, and enforced by proof obligation.

A system that treats "NYC" and "New York City" as unconditionally equivalent will conflate where it should separate. A system that treats them as unconditionally different will separate where it should conflate. A system that tracks scope will do neither. It will substitute where substitution is valid, refuse where it is not, and produce artifacts that explain why.

Manual normalization is an unscoped equivalence with no proof obligation. A30 replaces it with witnessed, scoped coercions. The arc from T7 ("equivalence requires manual normalization") to A30 ("equivalence is a typed, scoped, witnessed artifact") is complete.

T7 is resolved. Contextual equivalence is no longer a pathology but a discipline.

Chapter 29 asks the final touchstone question: what happens when named entities participate in events? "Alice introduced Bob to Carol" is not three strings plus a verb. It is a structured relation with roles. The same Bob, in the same event, may be introducer or introducee depending on how the sentence is parsed. T10 awaits.