Scoped Equivalence: NYC is not always NYC
The reference of "evening star" would be the same as that of "morning star," but not the sense.
This chapter formalizes scoped equivalence as a typed, witnessed coercion with proof obligations, defining Anchor A30 (Scoped Equivalence). The central construction treats contexts as a poset under refinement and scopes as downward-closed regions within it; an equivalence between entities is valid only when the current context lies within the declared scope. A30 closes the loop opened by A10 (Witnessed Equivalence) and A16 (Transport Certificates), adding enforcement discipline via composition rules, scope violation artifacts, and an equivalence admission procedure. The narrative motivation appears in Volume I, Chapters 6 and 8, where context-free substitution is diagnosed as a systematic failure mode of both the string and schema paradigms.
The Substitution That Seemed Safe
A real estate search system merges two data sources. One contains Manhattan apartment listings; the other contains five-borough property records. Both use location fields. The integration team writes:
if location.city == "NYC" or location.city == "New York City":
# treat as same
Plausible. Defensible. Wrong.
In the postal database, "NYC" and "New York City" denote the same entity: the five-borough municipality, incorporated in 1898, with ZIP codes spanning Manhattan through Staten Island. In the Manhattan listings, "NYC" means something narrower: the island, the market, the colloquial center. The terms look identical. Their denotations diverge.
When the system substitutes one for the other unconditionally, it conflates a Manhattan-only search with a five-borough search. Users looking for downtown apartments see listings in the Bronx. Users looking for Brooklyn properties see nothing, because the Manhattan source never uses "New York City" at all. Both failures are silent. Both trace to the same cause: a context-free substitution where context determines meaning.
This is T7, the seventh touchstone. We named it in Chapter 5: "Equivalence requires manual normalization." The pathology is not missing data. It is missing structure. The system has no way to say: this substitution is valid here but not there.
A30 makes that structure explicit.
Terms and Entities
The first mistake is conflating terms with entities.
A term is a string: "NYC," "New York City," "Big Apple." Terms appear in documents, listings, query inputs, and user interfaces. They are syntax.
An entity is a referent: the municipality, the island, the colloquial region. Entities are what terms denote. They are semantics.
The relationship between terms and entities is mediated by context. In one context, "NYC" denotes the five-borough city. In another, it denotes Manhattan. The term is the same. The denotation differs. The two-step model makes this precise:
Step 1: Denotation Map
Given term and context , the denotation map returns an entity:
Step 2: Entity Equivalence
Equivalence is defined between entities, not terms:
where is the scope in which the equivalence holds.
The NYC problem becomes precise:
| Context | ⟦"NYC"⟧ | ⟦"New York City"⟧ | Same referent? |
|---|---|---|---|
| Postal | nyc@USPS | new_york_city@geocoder | NO (different IDs, same referent) |
| Real estate | manhattan@MLS | city_of_new_york@geocoder | NO (different referents) |
| Legal | nyc@registry | new_york_city@registry | NO (different IDs, same referent) |
Postal and legal contexts require A30: the terms denote different entity IDs from different sources for the same underlying referent, so scoped equivalences bridge them. Real estate fails earlier: the terms denote genuinely different referents (manhattan vs city_of_new_york), and no equivalence should exist.
The postal case is not trivial. "NYC" from USPS data and "New York City" from OpenStreetMap are different entity IDs. Substitution requires a scoped equivalence:
with witness and transport certificate. A30 does real work.
In real estate context, the terms denote genuinely different entities (manhattan vs city_of_new_york). No equivalence exists or should exist. The system that ignores this distinction will conflate manhattan with city_of_new_york. That is not imprecision. That is a category error.
Scope as Proof Obligation
A10 introduced witnessed equivalence with a scope field. A16 established that transport along equivalences requires certificates. A30 closes the loop: scope is not annotation. It is enforcement.
Context Structure: Contexts form a poset under refinement. means is more specific than . A scope is a downward-closed region in this poset: if and , then .
Entity Equivalence:
Components:
ScopedEquivalence = {
lhs: Entity,
rhs: Entity,
scope: Scope, // downward-closed region in context poset
kind: identity | isomorphism | equivalence | approximation,
witness: EquivalenceWitness, // evidence
transport_certificate: {
properties_preserved: [Property...],
properties_not_preserved: [Property...]
}
}
Coercion Formalization:
An equivalence witness for is a coercion:
with proof obligation: .
A scope violation is a failed proof obligation. It is not a warning. It is a hard error.
Transport Rule:
For any substitution in context :
- Retrieve witness for
- CHECK: (scope membership)
- CHECK: required_properties transport_certificate.properties_preserved
- If pass: coercion permitted, produce TransportReceipt
- If fail: produce ScopeViolation artifact
Naming convention: A context (e.g., postal_context) is a specific point in the poset where a query executes. A scope (e.g., postal_scope) is a downward-closed region containing one or more contexts. The check postal_context ∈ postal_scope asks whether the query's context lies within the equivalence's valid region.
The phrase "type error" is not metaphor. The witness is a coercion between types. Using the coercion requires discharging a proof obligation. If the current context is not in scope, the obligation fails. The system refuses the substitution.
What Violations Look Like
Operation: Substitute "New York City" for "NYC"
Context: real_estate_context
ScopeViolation = {
attempted_substitution: ("NYC", "New York City"),
in_context: real_estate_context,
denotations_in_context: {
lhs: entity:manhattan@MLS,
rhs: entity:city_of_new_york@geocoder
},
reason: "Terms denote different entities in this context",
candidate_witnesses: [
{
witness_id: "postal-nyc-equiv",
equivalence: "nyc@USPS ~_{postal} new_york_city@geocoder",
applicable: false,
reasons: [
"real_estate_context ∉ postal_scope",
"denotations differ: manhattan@MLS ≠ nyc@USPS"
]
}
],
remediation_options: [
"Narrow query to postal context (where A30 equivalence applies)",
"Query without substitution, using 'NYC' as written",
"Provide evidence that manhattan ~= city_of_new_york in real estate context"
]
}
The artifact is auditable. It explains why the substitution failed. It shows which witnesses exist and why they do not apply. It offers paths forward. The user or downstream system can make an informed decision.
Operation: Substitute "New York City" for "NYC"
Context: postal_context
System:
1. Compute denotation: ⟦"NYC"⟧_{postal} = entity:nyc@USPS
2. Compute denotation: ⟦"New York City"⟧_{postal} = entity:new_york_city@geocoder
3. Different entity IDs; search for equivalence
4. Found: nyc@USPS ~_{postal_scope} new_york_city@geocoder
5. Check proof obligation: postal_context ∈ postal_scope? YES
6. Coercion valid
TransportReceipt = {
substitution: ("NYC", "New York City"),
in_context: postal_context,
denotations: (entity:nyc@USPS, entity:new_york_city@geocoder),
equivalence_used: "nyc@USPS ~_{postal} new_york_city@geocoder",
witness: postal_authority_ruleset,
properties_preserved: [delivery_zone, mailing_address],
properties_not_preserved: [source_id]
}
The postal substitution is not trivial. The terms denote different entity IDs from different sources. A30 bridges them with a scoped equivalence, produces a receipt, and documents which properties survive transport. The system earns the substitution; it does not assume it.
Compare to the alternatives:
- String matching would substitute unconditionally. Silent conflation.
- Equivalence table without scope would find the postal equivalence and apply it everywhere. Same failure.
- No equivalence recorded would treat the terms as different everywhere. Silent separation, missing valid matches in postal context.
A30 navigates between conflation and separation by making scope the arbiter.
Composition Rules
Identity graphs compose. If and , what is the relationship between and ?
Without explicit rules, transitivity leaks scope. The system might conclude with unbounded scope, reintroducing the very conflation A30 was designed to prevent.
Transitive Closure:
If and , then:
The composed equivalence holds only where both original equivalences hold. Scope is intersection, not union.
Conditions:
- Transport certificates must compose (preserved properties intersect)
- Witnesses must be compatible (not contradictory)
If conditions fail: no composed equivalence. The system stores the path but does not claim the composition.
Kind Meet:
When composing equivalences of different kinds, the result is the weaker kind:
| Kind 1 | Kind 2 | Composed Kind |
|---|---|---|
| identity | identity | identity |
| identity | isomorphism | isomorphism |
| identity | approximation | approximation |
| isomorphism | approximation | approximation |
Transport becomes more restrictive: the intersection of preserved properties.
Multiple Equivalences:
If and with (same pair, different scopes), the system selects the tightest scope containing the current context. Scopes are ordered by set inclusion; "tightest" means minimal-by-inclusion among scopes that contain .
- If and : use S-equivalence
- If and and : use S-equivalence (tighter)
- If and : ScopeViolation with both as candidate_witnesses
These rules prevent the "just compose everything" failure. Scope flows correctly through chains. The system never claims more than the evidence supports.
Admitting Equivalences
Scoped equivalences are not declared by fiat. They are admitted with evidence, following the same discipline as predicate admission (A29).
EquivalenceAdmission requires:
-
Positive Evidence: Witness supporting the equivalence (authority ruling, ontology, measurement, declaration)
-
Counterexamples: Known contexts where the equivalence does NOT hold. These define the scope boundary.
CounterexampleWitness = { context: C, reason: "different denotations" | "different properties" | "authority conflict", evidence: Reference } -
Scope Declaration: Explicit scope , defined as region in context poset. Scope must exclude counterexample contexts.
-
Transport Certificate: Properties preserved and not preserved under substitution.
-
Authority Tier: Who can admit this equivalence (local, organizational, global).
Result:
- Admitted, EquivalenceReceipt
- Rejected(reason, conflicting_evidence)
- Provisional(conditions)
Scope widening follows the same pattern as predicate promotion. To extend to where :
- New evidence required for
- If counterexample exists in : widening rejected
ScopeWideningRequest = {
equivalence: "nyc@USPS ~_{postal_scope} new_york_city@geocoder",
current_scope: postal_scope,
requested_scope: all_contexts
}
Result: REJECTED
CounterexampleWitness = {
context: real_estate_context,
reason: "In real estate, 'NYC' denotes manhattan@MLS;
'New York City' denotes city_of_new_york@geocoder.
Different entities.",
evidence: [real_estate_listing_analysis_ref]
}
Remediation: "Widening to all_contexts impossible while counterexample exists.
You may admit a separate witness for legal_scope, giving two
equivalences whose combined coverage is (postal ∪ legal)."
Counterexamples are first-class artifacts. They define scope boundaries. "Widening to (postal ∪ legal)" means admitting two witnesses with evidence for each scope, not unioning a single witness. The system stores multiple scoped equivalences; union is a property of their combined coverage, not an operation on witnesses.
Three Failure Modes
T7 named the pathology: context-free equivalence handling. A30 prevents three specific failure modes:
Silent Conflation: The system treats unconditionally. In some contexts, this is wrong. The real estate integration conflates manhattan and city_of_new_york because it lacks scope. Users see wrong results. No artifact records the error.
A30 prevents conflation by requiring scope checks. The coercion fails if the context is not in scope. The failure is explicit.
Silent Separation: The system treats unconditionally. In some contexts, this is wrong. A postal search fails to find "New York City" records when querying "NYC" because no equivalence is recorded. Users miss valid results. No artifact records the gap.
A30 prevents separation by recording scoped equivalences. Where the equivalence holds (postal context), the system expands queries and merges results. The expansion is witnessed.
Scope Leak: The system uses transport in context where . The equivalence is recorded but applied too broadly. Scope leak is reusing a valid equivalence in the wrong scope. This is the integration team's error: they found a postal equivalence and applied it in real estate context. The system does not catch the mismatch.
A30 prevents scope leak by making scope a proof obligation. The coercion requires . If the check fails, the system refuses. No silent leak.
The NYC example generalizes: "blue" and "navy" are equivalent in search context (queries expand, recall improves) but distinct in inventory context (counts must not conflate). "Dr. Smith" in a hospital and "Dr. Smith" in a university share a string but denote different entities; the system resolves this at the denotation step before equivalence is ever considered. A28 handles sense disambiguation; A30 handles scope enforcement. Together, they give a complete discipline for contextual sameness.
What the User Sees
Query: "Properties in NYC"
Candidate contexts: [real_estate_context, postal_context] // inferred from data sources
Status: Ambiguous; user must select declared context
System response:
ContextDisambiguation = {
term: "NYC",
denotation_by_context: {
real_estate_context: entity:manhattan@MLS,
postal_context: entity:nyc@USPS
},
message: "Query spans multiple contexts where 'NYC' denotes different entities.",
options: [
{ context: real_estate_context, meaning: "Manhattan", result_preview: "847 listings" },
{ context: postal_context, meaning: "All five boroughs", result_preview: "4,231 listings" }
],
default_action: "none (context declaration required)"
}
The user sees choices, not conflation. The system does not guess. It presents the divergence and requests a decision. The decision is recorded. The result is scoped. The entire chain is auditable.
Consequence
Equivalence is not a fact about strings but a relation between entities, indexed by context, constrained by scope, and enforced by proof obligation.
A system that treats "NYC" and "New York City" as unconditionally equivalent will conflate where it should separate. A system that treats them as unconditionally different will separate where it should conflate. A system that tracks scope will do neither. It will substitute where substitution is valid, refuse where it is not, and produce artifacts that explain why.
Manual normalization is an unscoped equivalence with no proof obligation. A30 replaces it with witnessed, scoped coercions. The arc from T7 ("equivalence requires manual normalization") to A30 ("equivalence is a typed, scoped, witnessed artifact") is complete.
T7 is resolved. Contextual equivalence is no longer a pathology but a discipline.
Chapter 29 asks the final touchstone question: what happens when named entities participate in events? "Alice introduced Bob to Carol" is not three strings plus a verb. It is a structured relation with roles. The same Bob, in the same event, may be introducer or introducee depending on how the sentence is parsed. T10 awaits.