Adjunctions

The price of translation

19 min read

A9Prose proofWhat is gained and lost when moving between representational frames.

The text of Cervantes and that of Menard are verbally identical, but the second is almost infinitely richer.
— Jorge Luis Borges, 'Pierre Menard, Author of the Quixote' (1939)

This chapter formalizes the price of translation when perfect equivalence is unavailable. Where Chapter 7 (A8) characterized sameness through invertible maps, this chapter introduces A9 (Adjunction) to characterize optimal approximation under structural mismatch: the left adjoint constructs the minimal extension, the right adjoint computes the maximal restriction, and the unit and counit measure the cost of each direction. The chapter develops the Galois connection as the preorder special case before stating the full categorical adjunction, and shows how API versioning, color taxonomy coarsening, and abstract interpretation all instantiate the same universal property. The institutional and philosophical context for understanding lossy translation as a structured rather than arbitrary phenomenon is developed in Vol I, Chapter 4 (Empire of Strings); the formal treatment here is self-contained.

The Imperfect Translation

Backward compatibility is a promise. Sometimes the promise is kept imperfectly—and the imperfection is asymmetric. Translation in one direction may preserve everything; translation back may lose something that cannot be recovered.

Consider the common case. A client built for v1 must communicate with a v2 server. The compatibility layer looks straightforward: promote incoming v1 requests to v2 format, demote outgoing v2 responses back to v1. The adapters are short, the tests pass, the release notes say "backward compatible."

promote : v1.Response → v2.Response
demote  : v2.Response → v1.Response

Version 1 returns {user_id, name}. Version 2 returns {user_id, name, created_at, metadata}. The promote function adds default values for the new fields. The demote function drops them.

Now run the round-trip test from Chapter 7.

Start with a v1 record. Promote it to v2. Demote it back to v1. Compare the result to the original.

demote(promote(r1)) = r1  ✓

The v1 record survives. Promoting adds defaults; demoting strips them; the original fields return unchanged. So far, so good.

Now run the test in the other direction. Start with a v2 record. Demote it to v1. Promote it back to v2. Compare.

promote(demote(r2)) ≠ r2  ✗

The v2 record does not survive. Demoting loses created_at and metadata. Promoting adds defaults, not the original values. Information that left the server does not return.

This is not a bug but a structural asymmetry. One direction is lossless; the other is lossy. The two schemas are not isomorphic. Yet these adapters are not arbitrary. They are the universal compatibility layer, the one that every engineer would write, the one that satisfies a property we have not yet named.

The question is not "can we make this symmetric?" We cannot, not without inventing information. The question is: what makes this particular asymmetric pair principled? Why is this the "right" way to translate between v1 and v2 when perfect equivalence is unavailable?

The answer is that these adapters satisfy a universal property. They are the best possible monotone translations under the declared refinement order, given the constraint that v2 has more structure than v1. And "best possible" has a precise meaning that mathematicians formalized in 1958.

Optimal Under Constraints

"Best" is meaningless without a criterion. Best for whom? Best under what constraints? In engineering, "best" often means "we tried several things and this one works." In mathematics, "best" has a sharper definition: X is the best solution to problem P if every other solution factors through X.

Consider the problem of rounding real numbers to integers. There are many ways to do it: truncate toward zero, round to nearest, round up, round down. Each has uses. But ceiling and floor are distinguished by a universal property.

The ceiling of x, written ⌈x⌉, is the smallest integer greater than or equal to x. Not just any integer above x, but the smallest one. Any other integer that bounds x from above must be at least as large as the ceiling. The ceiling is optimal.

The floor of x, written ⌊x⌋, is the largest integer less than or equal to x. Any other integer that bounds x from below must be at most as large as the floor. The floor is optimal from the other direction.

These optimality conditions are not independent. Ceiling and floor are connected by a relationship:

⌈x⌉ ≤ n if and only if x ≤ n (for integers n)

n ≤ ⌊x⌋ if and only if n ≤ x (for integers n)

The ceiling gives the best integer approximation from above. The floor gives the best from below. And the relationship between them is not accidental; it is a Galois connection.

This pattern appears everywhere translations occur between structures of different granularity. The formal name is adjunction, and it captures something precise: when you translate between two levels of structure, there is often a uniquely determined "best" way to go each direction (when such an adjoint exists), and these two best ways are related by a universal condition. In the special case of preorders, this structure was first systematically studied by Ore under the name "Galois connexion."(Ore 1944)

Daniel Kan introduced adjoint functors in 1958.(Kan 1958) The insight was that many seemingly unrelated constructions share the same shape: free groups, tensor products, limits, colimits, localizations. Each is "optimal" in a specific sense, and the optimality conditions fit a single template. Adjunctions are that template.

The Galois Connection

Before the full categorical machinery, consider the case that matters most for systems: preorders.

A preorder is a set with a reflexive, transitive relation. "Less than or equal to" on numbers. "Is a subtype of" on types. "Refines" on schemas. "Contains more information than" on records.

When both categories are preorders, an adjunction becomes something simpler: a Galois connection. The definition:

Let $(P, \leq)$ and $(Q, \leq)$ be preorders. A Galois connection between them consists of monotone functions:

$F : P \to Q$ (left adjoint)
$G : Q \to P$ (right adjoint)

satisfying, for all $a \in P$ and $b \in Q$ :

F(a) \leq b \quad \text{if and only if} \quad a \leq G(b)

The equivalence says: comparing F(a) to b in Q is the same as comparing a to G(b) in P. You can do the comparison in either world; the answer is the same.

This is the "best approximation" property made literal. If F(a) ≤ b, then F(a) is below b in the ordering. If a ≤ G(b), then a is below the image of b pulled back through G. The Galois connection says these conditions are equivalent. In precise terms: F(a) is the least b in Q such that a ≤ G(b), and G(b) is the greatest a in P such that F(a) ≤ b.

Ceiling and floor on integers form a Galois connection with the real numbers. The inclusion of integers into reals sits between them:

Ceiling ⌈·⌉ is left adjoint to inclusion
Inclusion ι is left adjoint to floor ⌊·⌋

The conditions ⌈x⌉ ≤ n ⟺ x ≤ n and n ≤ ⌊x⌋ ⟺ n ≤ x are exactly the Galois connection equations.

The v1/v2 API problem has the same shape. The integers sit inside the reals as a "coarser" structure; v1 sits inside v2 in the same way. Promote is like ceiling into richer structure (the least completion that contains the original). Demote is like projection back (the greatest restriction that fits).

For API versioning, the preorder is "information content." One schema refines another if its records can be embedded without loss. The promote function sends v1 records to v2 records by adding defaults. The demote function sends v2 records to v1 records by dropping fields.

A critical constraint: the adjunction holds only when "default" means the bottom element in the information ordering—unknown, unspecified, null. If promote inserts a specific value (e.g., created_at = 0), the Galois condition fails. Suppose r2 has created_at = 2024. Then promote(r1) with created_at = 0 is not below r2 in information content; the specific default contradicts the actual data. The adjunction requires that promote adds minimal information, not specific information. This is why the adapters that work are the ones that leave new fields semantically empty, not the ones that populate them with convenient constants.

The Galois connection condition says: promote(r1) ≤ r2 if and only if r1 ≤ demote(r2). In plain terms, a promoted v1 record is below a v2 record in information content exactly when the original v1 record is below the demoted v2 record. Any comparison you want to make can be done in either world; the structure is preserved.

Given that refinement order, this is why the adapters are pinned down by the universal property. Promote gives the minimal v2 record containing a v1 record's information. Demote gives the maximal v1 record contained in a v2 record's information. Formally: once we order representations by refinement (information inclusion) and restrict to monotone translations, F and G are determined by the adjunction law. In a preorder, if an adjoint exists it is unique; in general categories it is unique up to unique isomorphism. Any other pair of monotone adapters would either lose more information or add structure that violates minimality.

The Full Adjunction

When categories have non-trivial morphisms beyond mere ordering, the Galois connection generalizes to the full adjunction.(Lane 1971, ch. IV)

Let $\mathcal{C}$ and $\mathcal{D}$ be categories. An adjunction $F \dashv G$ consists of:

A functor $F : \mathcal{C} \to \mathcal{D}$ (left adjoint)
A functor $G : \mathcal{D} \to \mathcal{C}$ (right adjoint)
A natural bijection: for all $A \in \mathcal{C}$ and $B \in \mathcal{D}$ ,

\text{Hom}_{\mathcal{D}}(F(A), B) \cong \text{Hom}_{\mathcal{C}}(A, G(B))

The bijection is natural in both A and B.

The Hom-bijection says: maps from F(A) to B in the "rich" category correspond exactly to maps from A to G(B) in the "simple" category. The mapping problem does not disappear; it relocates. A morphism in one category corresponds to a morphism in the other, and the correspondence is natural and bijective at the level of morphisms.

Why does this matter? In many systems adjunctions, F behaves like a "promote" or "free" construction and G behaves like "project" or "forget." The adjunction is the guarantee that these are not arbitrary choices: they are optimal relative to the structure you declared. The Hom-bijection says: to give a map out of F(A) is exactly to give a map out of A after forgetting along G. That equivalence is the universal property that makes the translation principled.

The adjunction comes with two natural transformations that measure the cost of translation:

Unit η : Id_C → G ∘ F
Counit ε : F ∘ G → Id_D

The unit η_A : A → G(F(A)) sends A into the image of G applied to F(A). In systems terms: promote then demote. If η is an isomorphism, the round-trip is lossless from the "simple" side.

The counit ε_B : F(G(B)) → B sends the image of F applied to G(B) back to B. In systems terms: demote then promote. If ε is an isomorphism, the round-trip is lossless from the "rich" side.

In many adjunctions of practical interest, η is an isomorphism or close to one, while ε is not. The simple side survives round-tripping; the rich side loses information. This asymmetry is not a defect. It reflects the structural difference between the categories.

The unit and counit satisfy triangle identities:

(\varepsilon_F) \circ (F\eta) = \text{id}_F \qquad (G\varepsilon) \circ (\eta_G) = \text{id}_G

The triangle identities are the coherence laws of approximation. They say the adjunction's two directions fit together without drift: the "promote then forget" embedding behaves like a section, and the "forget then promote" approximation behaves like its retraction on the image. There is a well-defined comparison back to the original, not accumulating distortion.

In the posetal (Galois) case, this coherence becomes literal stabilization: the closure G ∘ F and interior F ∘ G operators are idempotent. Promote, demote, promote again yields the same result as promote once. In systems terms: once you have collapsed information to the v1 shadow, repeating the collapse does not create a new kind of shadow; you get the same approximation.

Pricing the Translation

Every non-isomorphic translation has a cost. Isomorphism is a free transaction: nothing lost, nothing gained, perfect round-trip. Adjunction is a transaction with fees, but the fees are accounted for.

The unit η records the fee for promotion. The counit ε records the fee for demotion. Whichever deviates from identity is where the cost shows up. In the v1/v2 example, η is identity (v1 survives) and ε is lossy (v2 loses fields).

Adjunctions do not eliminate cost. They do something more useful: they eliminate arbitrariness. Given the constraint that v2 has more structure than v1, there is a uniquely characterized way to translate each direction (when an adjoint exists). The promote is the minimal extension; the demote is the maximal restriction. Any other choice would either lose more or add extraneous structure.

This is the emotional thesis of the chapter:

Adjunctions do not fix loss. They fix arbitrariness.

The notary's bill of exchange from Vol I's prologue was not an isomorphism. Currency conversion with fees and exchange-rate drift is not symmetric. But the bill was not arbitrary either. It specified the conversion rate, the correspondent bank, the recourse if dishonored, the maturity date. It made the translation accountable.

Adjunctions are the mathematical version of that accountability. When a translation is adjoint, its cost is explicit, its structure is characterized, and its round-trip behavior is predictable.

When a translation is not adjoint, you are guessing. A promote that sets created_at = now() breaks both stability (time-dependence) and the adjunction itself (specific value, not bottom element). A demote that hashes metadata into a truncated string destroys composability: you cannot recover the hash's input. These are translations, but they are not adjoint. They work until they do not, and you cannot predict when.

Running Examples

API Versioning (Posetal Framing)

We order schemas by refinement: S₁ ⪯ S₂ if there exists a projection p : S₂ → S₁ and a section s : S₁ → S₂ such that p ∘ s = id. This is exactly the v1/v2 situation: every v1 record embeds into v2, and every v2 record projects back to v1. This is the clean split case; real migrations often have a projection without a canonical section, which is where the cost becomes nontrivial.

Order records by projection: r1 ≤ r2 when projecting r2 down to the v1 fields yields r1. This makes comparison verifiable rather than vague.

For v1 = {user_id, name} and v2 = {user_id, name, created_at, metadata}:

Promote F : v1 → v2 adds default values for new fields
Demote G : v2 → v1 drops new fields

The Galois condition: F(r1) ≤ r2 ⟺ r1 ≤ G(r2). Promoting r1 is below r2 in information order exactly when r1 is below the demotion of r2. This is verifiable: compare the field values after canonicalization.

The unit η: for a v1 record r1, we have η(r1) = G(F(r1)) = r1. Promote adds defaults; demote strips them; the original returns unchanged. The unit is identity.

The counit ε: for a v2 record r2, we have F(G(r2)) ≠ r2 in general. Demote loses created_at and metadata; promote adds defaults, not originals. The counit is not identity; it is the least-loss approximation the adjunction can produce.

The triangle identity in action: promote r1, demote to get r1, promote again. You get F(r1), not some further-distorted version. The distortion happens once.

Color Taxonomy (Fine and Coarse)

Catalog A uses fine-grained colors: navy, royal_blue, sky_blue, teal, cerulean.

Catalog B uses coarse categories: blue, green, red.

Order colors by specificity: c1 ≤ c2 if c1 is a special case of c2. Navy ≤ blue. Royal_blue ≤ blue. But navy and royal_blue are incomparable (neither is more specific than the other).

Define F : Coarse → Fine by picking a canonical representative: blue ↦ navy. Define G : Fine → Coarse by collapsing: navy ↦ blue, royal_blue ↦ blue, sky_blue ↦ blue.

The Galois condition: F(c_coarse) ≤ c_fine ⟺ c_coarse ≤ G(c_fine). If navy (the canonical blue) is below some fine color c, then blue must be below G(c). This holds because G sends all blue shades to blue.

The unit: η(blue) = G(F(blue)) = G(navy) = blue. Coarse colors survive.

The counit: F(G(royal_blue)) = F(blue) = navy ≠ royal_blue. Fine distinctions collapse. If you care about the difference between navy and royal_blue, demoting to coarse categories loses it.

The practical consequence: queries at the coarse level are safe. "Show me blue items" will find items tagged navy, royal_blue, or sky_blue. But inventory reconciliation at the fine level will fail if one catalog uses navy and another uses royal_blue for the same physical item. The Galois connection tells you precisely where the loss occurs.

Static Analysis (Abstraction and Concretization)

Abstract interpretation uses a Galois connection to relate concrete program states to abstract domains.(Cousot 1977)

Concrete domain: the set of all possible program states
Abstract domain: a finite lattice of abstractions (e.g., signs: negative, zero, positive)

The abstraction function α maps a set of concrete states to the smallest abstract element that covers them all. The concretization function γ maps an abstract element to all concrete states it represents.

The Galois connection: α(C) ⊑ a ⟺ C ⊆ γ(a). A concrete set C is abstracted to something below a if and only if all states in C are covered by γ(a).

This is why abstract interpretation is sound: if you prove a property of α(C), it holds for all states in C. The Galois connection guarantees that abstraction does not lose relevant distinctions for the properties you can express in the abstract domain.

The unit η: C ⊆ γ(α(C)). Concrete states are covered by their abstraction. The approximation is an over-approximation.

The counit ε: α(γ(a)) ⊑ a. Concretize and then abstract again can only lose precision; in a Galois insertion it returns exactly the same abstract element.

Static analysis tools live on this Galois connection. The abstraction lets them reason efficiently; the connection guarantees the reasoning is sound. This is why soundness proofs for abstract interpreters look like adjunction arguments: the Galois structure is the mathematical content of "abstraction that does not lie."

What Adjunctions Are Not

Not every pair of translations is an adjunction. Adjunctions are structured claims with verifiable properties. Most ad hoc adapters fail them.

Not all translations have an adjoint. Given an arbitrary function F : C → D, there may be no function G : D → C such that F ⊣ G. Adjointness is a strong condition. A promotion without a principled demotion, or a demotion without a principled promotion, is not part of an adjunction.

Adjunctions do not eliminate loss. They make it characterized and minimal. If you need lossless translation, you need isomorphism, not adjunction. An adjunction tells you the minimum loss required given the structural mismatch. It does not make the mismatch disappear.

Adjunctions are relative to categorical structure. A different choice of what counts as "structure-preserving" yields different adjunctions. The v1/v2 adjunction depends on treating schemas as preordered by information content. If you change the ordering, you change what "optimal" means.

Adjunctions are "best" within constraints, not absolutely. There is no universal "best translation." There is only the best translation for a given pair of categories with a given notion of morphism. The universality is local to the structure you have committed to.

The value of recognizing an adjunction is the guarantees it provides. When a translation is adjoint:

The cost is explicit in the unit and counit
The round-trip behavior is predictable (triangle identities)
The canonicity is enforced by the universal property

When a translation is not adjoint, you have none of these guarantees. The adapters might work. They might even be the "right" adapters. But you cannot prove it, and you cannot predict how they will behave under composition.

Consequence

We now have three levels of machinery for relating representations:

Invariants (Chapter 6): what properties survive transformation. The real properties, not the coordinate artifacts.
Isomorphisms (Chapter 7): when two representations are interchangeable. Witnessed by a pair of mutually inverse maps.
Adjunctions (Chapter 8): the price of translation when interchangeability fails. Witnessed by a pair of optimally approximating maps with explicit unit and counit.

Each level answers a different question. Invariants answer "what is real?" Isomorphisms answer "when are two things the same?" Adjunctions answer "what is the best trade when they are not?"

The hierarchy is not a ladder you climb and discard. Real systems use all three levels simultaneously. Some translations are isomorphisms; most are not. The ones that are not may be adjoint, in which case you can reason about them systematically. The ones that are neither isomorphism nor adjunction require ad hoc analysis.

Invariants say what is real. Isomorphisms say when two things are the same. Adjunctions say what it costs when they are not quite the same — and that cost is not a failure but an honest accounting of what translation destroys and what it preserves. Most real translations have a toll. The question is whether the toll is visible. A witness is what lets equivalence travel. An adjunction is what makes the toll explicit.

But we still lack machinery for composition and scope. If A translates to B via one adjunction, and B translates to C via another, what can we say about A and C? If an equivalence is valid in one context but not another, how do we track that? If a witness is partial or probabilistic, how does it compose? The three levels need to be synthesized into a single framework — witnessed sameness as a first-class artifact, with type, scope, cost, and transport rules that compose across steps.