The Formal Core of OCANNL’s Shape and Projection Inference

Working notes toward the workshop/FProPer paper. Definitions, theorems, and proofs for the core fragment (no Affine, no Concat), following the template in A Shape Is Not Its Index. Proof-status labels: [proved], [sketch] (believed routine, gaps flagged), [open]. Full proof write-outs live in the companion appendix, The Formal Core of OCANNL — Proof Appendix (cited as A.1–A.10); this document stays statement-level.

Notation follows the blog series: 1_\emptyset the claim-free unit, \sqsubseteq the broadcast order, \diamond the marker, \alpha dimension variables, \rho row variables.

1. Dimensions

Definition 1.1 (Dimensions). Fix a set \mathcal{B} of basis tags with a distinguished tag \texttt{default} \in \mathcal{B}. The set of dimensions is D \;=\; \{1_\emptyset\} \;\cup\; \{\, n_b \mid n \ge 1,\; b \in \mathcal{B} \,\}. Elements n_b are atoms; 1_\emptyset is the claim-free unit. Adjoin a fresh element \bot and set D_\bot = D \cup \{\bot\}.

Definition 1.2 (Broadcast order). On D: d_1 \sqsubseteq d_2 iff d_2 = 1_\emptyset or d_1 = d_2. Extend to D_\bot by \bot \sqsubseteq d for all d.

Proposition 1.3 (Lattice structure). [proved] (D_\bot, \sqsubseteq) is a bounded lattice of height 2: top 1_\emptyset, bottom \bot, and the atoms form an antichain, each covering \bot and covered by 1_\emptyset. Meets and joins: d \wedge d = d,\quad d \wedge 1_\emptyset = d,\quad d_1 \wedge d_2 = \bot \ (d_1 \ne d_2 \text{ atoms});\qquad d \vee d = d,\quad d \vee \bot = d,\quad d_1 \vee d_2 = 1_\emptyset \ (d_1 \ne d_2 \text{ atoms}).

Proof. Reflexivity is immediate. Antisymmetry: d_1 \sqsubseteq d_2 \sqsubseteq d_1 with d_1 \ne d_2 forces d_2 = 1_\emptyset and d_1 = 1_\emptyset, contradiction. Transitivity: if d_3 = 1_\emptyset then d_1 \sqsubseteq d_3; otherwise d_2 = d_3 \ne 1_\emptyset, so d_1 \sqsubseteq d_2 forces d_1 = d_2. The displayed meets and joins are exhaustive by the case analysis on whether the arguments are equal, the top, or the bottom; each is verified directly to be the glb/lub since the only element strictly above an atom is 1_\emptyset and the only one strictly below is \bot. \square

Proposition 1.4 (Non-distributivity). [proved] If D contains at least three atoms (always: \{1_{\texttt{default}}, 2_{\texttt{default}}, 3_{\texttt{default}}\} \subseteq D), then D_\bot is not distributive, and not modular.

Proof. Let a, b, c be distinct atoms. The set \{\bot, a, b, c, 1_\emptyset\} is closed under \wedge, \vee by Prop. 1.3, hence a sublattice, and it is isomorphic to the diamond M_3. M_3 is non-distributive: a \wedge (b \vee c) = a \wedge 1_\emptyset = a while (a \wedge b) \vee (a \wedge c) = \bot \vee \bot = \bot. Sublattices of distributive (resp. modular) lattices are distributive (resp. modular); M_3 is non-distributive, and although M_3 is modular, taking the height-2 antichain plus… (correction: M_3 is modular, so the claim is only non-distributivity; strike “not modular”). \square

(Note for the paper: claim only non-distributivity; M_3 is the classic modular-but-not-distributive lattice. Non-distributivity already rules out the MLsub/algebraic-subtyping machinery, which is the point of §5 of the abstract.)

2. Rows

Definition 2.1 (Rows). A row is a triple l \cdot \diamond \cdot r with l, r \in D^{*} (the leading and trailing flanks). \mathrm{rank}(l \cdot \diamond \cdot r) = |l| + |r|. Equality of rows is componentwise equality of (l, r) — in particular marker-sensitive: [3]\cdot\diamond\cdot[4] \ne [3,4]\cdot\diamond\cdot[\,]. (This is identity of the algebra’s elements; it is not the satisfaction relation of equality constraints, which Def. 3.4 judges by the flat equivalence \approx of Def. 2.9.)

Definition 2.2 (Expansion). For a row R = l \cdot \diamond \cdot r and n \ge \mathrm{rank}(R), the expansion R{\uparrow}n \in D^{n} is the flat sequence l \cdot (1_\emptyset)^{\,n - \mathrm{rank}(R)} \cdot r.

Definition 2.3 (Row order). l_2 \cdot \diamond \cdot r_2 \;\sqsubseteq\; l_1 \cdot \diamond \cdot r_1 iff (i) |l_1| \le |l_2| and |r_1| \le |r_2| (flanks fit; note this implies \mathrm{rank}_2 \ge \mathrm{rank}_1), and (ii) (l_2 \cdot r_2)[i] \sqsubseteq (l_1\cdot\diamond\cdot r_1){\uparrow}\mathrm{rank}_2\,[i] for all i (pointwise refinement of the expansion).

(Design rationale — why ground rows carry markers. The marker is the ghost of the row variable: a row term keeps its variable at the marker (Def. 3.1), so an open row has a single locus where new axes can appear, and the inference algorithm treats the two ways they appear — broadcast padding and parametric instantiation — uniformly, never distributing a surplus between them. One constraint yields one residue bound (Def. 4.3(i)) instead of a case split over “how much broadcasts, how much instantiates”; the bound store keeps the whole spectrum of resolutions alive, and only closing picks a point on it — the join of caps commits the most* material, i.e. the lowest value that fits (Prop. 6.5; mind the order’s direction — lower means more committed, and the “least” of “least upper bound” is in the order, not in material), while \gamma_\uparrow commits none (Prop. 6.4). For this to remain coherent after a row closes, the ground algebra must remember the locus: expansion (Def. 2.2) inserts at the marker precisely so that axes the variable would have absorbed and axes broadcasting inserts later land in the same place. A marker-less ground algebra would force the distribution choice eagerly, inside the algorithm. Of the two flanks, the left one earns its keep paradigmatically through the slice operation — prepending an outer-anchored axis to a possibly-open row, “operand \approx sliced-axis \cdot \langleresult\rangle”, is inexpressible without material left of the variable; einsum’s pre-ellipsis axes reuse the mechanism (currently slice is exposed batch-only, but the construction generalizes to the other kinds). This is also the boundary condition on the rigid rows of Def. 2.12: they are sound exactly in positions where no further absorption can occur — enforced order-theoretically by clause (d) there, and at the constraint sources by Remark 2.15.)*

Lemma 2.4 (Expansion monotonicity). [proved] If R_2 \sqsubseteq R_1 then for every n \ge \mathrm{rank}(R_2): R_2{\uparrow}n\,[i] \sqsubseteq R_1{\uparrow}n\,[i] for all i.

Proof. Write R_j = l_j \cdot \diamond \cdot r_j. Fix i < n and case on the region of i in R_2{\uparrow}n. Front, i < |l_2|: R_2{\uparrow}n[i] = l_2[i]. If i < |l_1| then R_1{\uparrow}n[i] = l_1[i]; since also i < |l_1| \le |l_2| \le \mathrm{rank}_2, position i is a front position of R_1{\uparrow}\mathrm{rank}_2 as well, so condition (ii) of R_2 \sqsubseteq R_1 gives l_2[i] \sqsubseteq l_1[i]. If |l_1| \le i then position i is not a trailing position of R_1{\uparrow}n either (trailing positions are \ge n - |r_1| \ge n - |r_2| > i when i < |l_2| \le n - |r_2|; the case i \ge n - |r_2| belongs to the next region), so R_1{\uparrow}n[i] = 1_\emptyset and anything refines it. Middle, |l_2| \le i < n - |r_2|: R_2{\uparrow}n[i] = 1_\emptyset. By |l_1| \le |l_2| \le i and i < n - |r_2| \le n - |r_1|, also R_1{\uparrow}n[i] = 1_\emptyset. Back, i \ge n - |r_2|: symmetric to the front. \square

Proposition 2.5 (Partial order). [proved] \sqsubseteq is a partial order on rows.

Proof. Reflexivity: k = 0 and pointwise reflexivity. Antisymmetry: mutual flank-fit gives |l_1| = |l_2|, |r_1| = |r_2|, so the expansions are the rows themselves and pointwise antisymmetry (Prop. 1.3) applies. Transitivity: let R_3 \sqsubseteq R_2 \sqsubseteq R_1 and n = \mathrm{rank}_3. Flank-fit composes. For (ii): (l_3 \cdot r_3)[i] = R_3{\uparrow}n[i] \sqsubseteq R_2{\uparrow}n[i] by hypothesis, and R_2{\uparrow}n[i] \sqsubseteq R_1{\uparrow}n[i] by Lemma 2.4 applied to R_2 \sqsubseteq R_1; conclude by transitivity in D. \square

Proposition 2.6 (The empty row is the greatest element). [proved] For every row R: R \sqsubseteq [\,]\cdot\diamond\cdot[\,], and nothing else is above all rows.

Proof. Flanks of the empty row have length 0 \le anything; its expansion to \mathrm{rank}(R) is all 1_\emptyset, which everything refines. Uniqueness: if R_\top is above the empty row then flank-fit forces |l| = |r| = 0. \square

Proposition 2.7 (Rows form a join-semilattice with top). [proved] Any two rows R_1, R_2 have a least upper bound: R_1 \vee R_2 \;=\; l_\vee \cdot \diamond \cdot r_\vee, \qquad |l_\vee| = \min(|l_1|,|l_2|),\ \ l_\vee[i] = l_1[i] \vee l_2[i], and symmetrically for r_\vee aligned from the back (joins taken in D; note two distinct atoms join to 1_\emptyset, never to \bot, so l_\vee \in D^{*}).

Proof. Upper bound: flanks fit by \min \le both. Pointwise, for R_1 at position i of rank \mathrm{rank}_1: front positions i < |l_\vee| give l_1[i] \sqsubseteq l_1[i] \vee l_2[i]; positions outside the (shorter) flanks of R_\vee see 1_\emptyset in R_\vee{\uparrow}\mathrm{rank}_1. Least: let U be an upper bound, so |l_U| \le |l_1|, |l_2|, hence |l_U| \le |l_\vee| — flanks fit for R_\vee \sqsubseteq U. Pointwise at a front position i < |l_U|: from R_j \sqsubseteq U we get l_j[i] \sqsubseteq l_U[i] for j = 1, 2, so l_U[i] is an upper bound of both and l_\vee[i] = l_1[i] \vee l_2[i] \sqsubseteq l_U[i]. At positions |l_U| \le i outside U’s flanks, U’s expansion shows 1_\emptyset. Back symmetric. \square

(This is what licenses accumulating row caps and later committing a leaf to the join of its lower bounds; the dimension-level instance is the cap-collapse rule.)

Remark 2.8 (Rank-polymorphism lives in the order). Definition 2.3 relates rows of different ranks directly; no quantifier appears anywhere in this section. This is the formal content of the first post’s claim.

Definition 2.9 (Flat equivalence). R \approx R' iff \mathrm{flat}(R) = \mathrm{flat}(R') as sequences, where \mathrm{flat}(l\cdot\diamond\cdot r) = l \cdot r — equivalently, \mathrm{rank}(R) = \mathrm{rank}(R') and R{\uparrow}\mathrm{rank}(R) = R'{\uparrow}\mathrm{rank}(R'), since a row’s expansion at its own rank is its flattening (Def. 2.2 with no padding). \approx is the marker-erasing equivalence; it is what the implementation’s closed-row “equality” computes (§10).

Proposition 2.10 (\approx versus the order). [proved] (i) The equivalence induced by \sqsubseteq (mutual inequality) is the identity — this is just antisymmetry (Prop. 2.5). So \approx is not derivable from the order. (ii) Stronger: any two distinct \approx-related rows are \sqsubseteq-incomparable. (iii) \approx is not a \sqsubseteq-congruence; indeed no nontrivial equivalence is compatible with the order. (iv) \approx is the kernel of expansion at own rank; everything else the order consults — flank-fit, and expansions at strictly higher ranks (broadcasting) — is exactly where \approx-related rows diverge.

Proof. (i) Prop. 2.5. (ii) Equal flattenings with distinct markers force (WLOG) |l| < |l'|, hence |r| > |r'| (the flat lengths agree); flank-fit then fails in both directions. (iii) Witness: [3]\cdot\diamond\cdot[2,4] \sqsubseteq [3]\cdot\diamond\cdot[4] (flanks fit; pointwise [3,2,4] vs [3,1_\emptyset,4]) but [3]\cdot\diamond\cdot[2,4] \not\sqsubseteq [3,4]\cdot\diamond\cdot[\,] (leading flank-fit needs |l| \ge 2). For the general claim: if R \ne R' have identical down-sets, then R \in \mathrm{down}(R) by reflexivity gives R \sqsubseteq R' and symmetrically — mutual \sqsubseteq, identity by (i). (iv) The first identity is Def. 2.2; the divergence claim is (ii)/(iii) read off Def. 2.3’s clauses. \square

(Aside: the kernel of expansion at every common rank is yet a third equivalence — it absorbs explicit inner-edge 1_\emptyset entries into the middle, relating rows of different ranks, and on equal-rank rows it is finer than \approx. It is arguably the observational equivalence of rows-as-operands; notably \approx is not contained in it: flat-equal rows with different markers expand differently. The implementation uses none of this — its equality sites use \approx, its order sites the marked order.)

Remark 2.11 (Equality-as-\approx is underdetermined; the marker placement is a policy choice). Reading the ground meaning of closed-row equality as \approx (forced by practice: an einsum spec row must equate with a literal shape whose marker sits at the front — §10) makes row-equality constraints underdetermined: in l_1\cdot\langle\rho\rangle\cdot r_1 \approx C, the flat content of \gamma\rho is forced (the middle of \mathrm{flat}(C)) but its marker placement is free, and by Prop. 2.10(ii) the candidates are pairwise \sqsubseteq-incomparable — a later inequality on \rho can distinguish them. Witness: \Phi = \{\,\langle\rho\rangle \approx [\,]\cdot\diamond\cdot[3,5],\ \ [3]\cdot\diamond\cdot[9,5] \sqsubseteq \langle\rho\rangle\,\}. Under the former marked-equality reading, \gamma\rho = [\,]\cdot\diamond\cdot[3,5] is forced and the inequality fails ([3,9,5] vs [1_\emptyset,3,5]): unsatisfiable. Under \approx, \gamma\rho = [3]\cdot\diamond\cdot[5] satisfies both ([3,9,5] vs [3,1_\emptyset,5]): satisfiable. The implementation is \approx-checking but marked-committing: it accepts the flat alignment, then commits the closed side’s structural split as the value’s marker placement — on this \Phi it fails. Def. 3.4 now adopts the \approx reading, so that failure is officially a policy rejection (Lemma 5.1’s policy steps), not semantic unsatisfiability: closed-row equality joins closing (Remark 6.6) as a deliberately non-principal site. The discharge — real shape programs never produce a placement-discriminating constraint set, so the policy is unobservable on the reachable fragment — is conjectured at Cor. 5.5; this witness doubles as the probe: it should be unwritable from the surface language if the provenance invariant (§10) holds.

Definition 2.12 (Two-sorted ground rows — proposal). Adjoin to the marked rows of Def. 2.1 a second sort of rigid rows: F^\bullet with F \in D^n, a flat sequence with no marker; \mathrm{rank}(F^\bullet) = n, and the expansion F^\bullet{\uparrow}m is defined only at m = n, where it is F. Extend \sqsubseteq: (a) marked–marked: Def. 2.3 unchanged; (b) F_2^\bullet \sqsubseteq F_1^\bullet iff equal rank and pointwise refinement; (c) F^\bullet \sqsubseteq R (R marked) iff \mathrm{rank}(F) \ge \mathrm{rank}(R) and F[i] \sqsubseteq R{\uparrow}\mathrm{rank}(F)\,[i] pointwise — a rigid result may refine a broadcastable operand; (d) R \sqsubseteq F^\bullet: never — nothing broadcastable sits below a rigid row.

Proposition 2.13 (The two-sorted order). [proved] The extended relation is a partial order; the empty marked row remains the unique top; and on the rigid sort, flat equality is equality — trivially a congruence, since no context consults a marker that does not exist. Clause (d) is forced, not a choice: admitting even rank-equal R \sqsubseteq F^\bullet breaks transitivity — [5]\cdot\diamond\cdot[3] \sqsubseteq [\,]\cdot\diamond\cdot[3] \sqsubseteq [3]^\bullet would demand [5]\cdot\diamond\cdot[3] \sqsubseteq [3]^\bullet, a rank mismatch against a rigid row.

Proof. Reflexivity per sort. Antisymmetry: cross-sort mutual relations are impossible by (d); within sorts, Prop. 2.5 resp. pointwise antisymmetry at equal rank. Transitivity: the all-marked case is Prop. 2.5 and the all-rigid case is pointwise composition at the common rank; by (d) the only mixed chains are \bullet \sqsubseteq \bullet \sqsubseteq \diamond — ranks: equal, then \ge; pointwise composition at the common rank of the two rigid rows — and \bullet \sqsubseteq \diamond \sqsubseteq \diamond — \mathrm{rank}(F) \ge \mathrm{rank}(R_2) \ge \mathrm{rank}(R_1), and pointwise via Lemma 2.4 applied to R_2 \sqsubseteq R_1 at n = \mathrm{rank}(F). Top: F^\bullet \sqsubseteq [\,]\cdot\diamond\cdot[\,] by (c) against all-1_\emptyset; uniqueness as in Prop. 2.6, no rigid row being above any marked row by (d). \square

(Why (d) cannot be weakened even to “rank-equal with pointwise refinement” — which would make R and \mathrm{flat}(R)^\bullet mutually comparable, hence order-equivalent. The down-set form of the transitivity argument: R \sqsubseteq F^\bullet demands \mathrm{down}(R) \subseteq \mathrm{down}(F^\bullet), but a marked row’s down-set contains rows of every rank \ge \mathrm{rank}(R) — its materialized expansions l \cdot 1_\emptyset^{\,k} \cdot r refine it — while a rigid row’s down-set is rank-bounded. A marked row carries outstanding broadcast credit: the higher-rank refinements its marker still licenses; a rigid upper bound cannot honor that credit. Two consolations: the desired relation exists in the system as the derived one-shot relation — “R \sqsubseteq F^\bullet at equal rank, pointwise” is literally R \sqsubseteq^{1} F^\bullet of Remark 2.15 — it is just not a clause of the hereditary order; and the order already places every marked row directly above its flattening, \mathrm{flat}(R)^\bullet \sqsubseteq R by (c) with pointwise reflexivity, so the pair is adjacent with an asymmetric, irreversible one-step gap that is the broadcast credit. Collapsing the gap into an equivalence is the flat-congruence wish that Prop. 2.10(iii) rules out.)

Remark 2.14 (What the rigid sort buys, and what it costs). This dodges Prop. 2.10’s obstruction not by quotienting marked rows by \approx — impossible, \approx is no congruence — but by adjoining the intended quotient as new elements and routing equality through them:

Ground row equality in the constraint language becomes uniformly content equality: same rank, same flat sequence. This is the principled replacement for Def. 3.4’s former “syntactic identity” — now adopted there: equalities between ground rows only ever check (and bind variables); they never rewrite one ground row into another, so no congruence demand arises even between sorts. The einsum-spec-vs-literal equation goes through with no marker erasure, because the rigid participant never had one.
Remark 2.11’s witness acquires a definite truth value instead of a policy: \rho’s value is the rigid [3,5]^\bullet, and the subsequent inequality fails by rank-rigidity (clause (b)/(c) demand rank compatibility; there is no insertion point). Underdetermination is resolved by removing the determinand.
The conjecture “the implementation’s marker policy is never observed” upgrades to a static sort discipline: row variables carry sorts; a variable constrained by both a rigid equation and a broadcastable host is a sort error, or an explicit coercion site where a placement must be declared rather than guessed. Remark 2.15 locates the sorts for einsum: ... is a kind-indexed row variable (not a marker declaration), row-variable values are uniformly rigid, and markers enter the semantics only from tensor rows and the left-broadcast convention for results — so no variable crosses sorts, and the theorem to aim for (sort preservation \Rightarrow the solver never faces a marker choice) holds by construction of the constraint sources.
It adjudicates row.ml’s trailing-guard asymmetry (§10) as plain incompleteness against the rigid-equality specification, and reads the existing flat closed–closed unification as rigid equality — the implementation distance is a third bcast constructor (or a rigidity bit on closed rows) plus rank-checking Row_ineq against rigid operands.
Costs [open]: Prop. 2.7’s joins need a mixed formula — a rigid cap is a rank-rigid lower bound, and two rigid caps of different ranks have only marked upper bounds, so closing’s join-of-caps must mix sorts; and shape.ml needs an audit of which constraint sources are rigid-sorted and whether any row variable genuinely crosses sorts (if some do, coercions are real, and the policy returns — but localized and explicit).

Remark 2.15 (Einsum constraints: kind-indexed ellipsis, checking vs inference, and the one-shot order). Grounding the sort discipline in the actual einsum notation corrects Remark 2.14’s sort assignment and sharpens the proposal:

The ellipsis is a row variable, not a marker. ... in a spec is a context-dependent row variable — equivalent to ..batch.., ..input.. or ..output.. according to the axis kind of its position — shared across the operand and result rows of the same kind. The sharing is the point: it transports the absorbed axes from operands to result. Open spec rows are row terms in the sense of Def. 3.1 (the variable sits at the marker), and the variable, not the user, owns the middle.
Checking semantics: an inequality sandwich under the one-shot order. Purely for shape checking, an einsum could denote inequalities rather than equalities: the result tensor’s row at the bottom, the operand tensors’ rows at the top, and the template (spec) rows in the middle, tied to one another by their shared dimension and row variables — \mathrm{res} \sqsubseteq^{1} T_{\mathrm{res}} and T_{\mathrm{opnd}_j} \sqsubseteq^{1} \mathrm{opnd}_j. Here \sqsubseteq^{1} is the one-shot relation X \sqsubseteq^{1} Y \iff \mathrm{flat}(X)^\bullet \sqsubseteq Y — clause (c) of Def. 2.12 after rigidifying the lower side. One-shot is forced, not optional: the hereditary marked order excludes intended matches. Witness: [3]\cdot\langle\rho\rangle\cdot[4] \sqsubseteq [\,]\cdot\diamond\cdot[3,5,4] — an operand template against a literal shape — has no rank-3 solution in the marked order, since flank-fit demands three trailing axes below the literal’s front marker, forcing rank \ge 4; \sqsubseteq^{1} admits the intended \rho \mapsto [5]. So the rigid sort’s real role is sharper than Remark 2.14 suggested: it is the semantic device that strips flank-fit (heredity) off the lower sides of constraints, and the only coercion it needs is \mathrm{flat}(\cdot) — canonical, marked-to-rigid; the problematic direction (minting a marker) never occurs. The induced equivalence closes an earlier loop: while mutual \sqsubseteq is the identity (Prop. 2.10(i)), mutual \sqsubseteq^{1} is exactly \approx — X \sqsubseteq^{1} Y and Y \sqsubseteq^{1} X force \mathrm{rank}(X) \ge \mathrm{rank}(Y) \ge \mathrm{rank}(X) via clause (c)’s rank condition, at equal rank expansion is flattening, and pointwise antisymmetry in D gives \mathrm{flat}(X) = \mathrm{flat}(Y). So the flat equivalence, underivable from the hereditary order (Prop. 2.10), is the equivalence induced by the one-shot order — which is what licenses Def. 3.4’s equality semantics: row-equality constraints mean \sqsubseteq^{1}-antisymmetry, i.e. content equality.
Results broadcast on the left, by convention. The result side of an einsum is treated as broadcastable on the left of the variable’s segment, preserving the declared flanks: the upper row of the result constraint is the grounded result template with the variable’s contribution flattened and the marker at its left edge — T_{\mathrm{res}} = l\cdot\langle\rho\rangle\cdot r grounds to l \cdot \diamond \cdot (\mathrm{flat}(\gamma\rho) \cdot r). For a no-ellipsis result spec (l = [\,], no variable) this degenerates to the far-left marker. [Code pointers: no re-marking operation exists; the convention is realized at construction.** The parser’s no-ellipsis production puts all spec axes in the trailing flank (([], None, specs) in parser.mly); axes_spec_to_dims_bio’s no-variable default is Broadcastable with empty beg_dims (shape.ml:193); Shape.make builds user dims the same way; and for solved middles, unify_row’s open-vs-closed value construction inherits the closed side’s split, which for literal closed sides (all axes in dims) places the value’s marker exactly at the middle’s left edge. The general rule the code implements is thus inherit the declared placement; left edge of the middle when none was declared — the left convention is the no-declaration special case, and the unify policy is its provenance-faithful generalization (a closed side with a declared beg/dims boundary inside the middle transports that declaration into the solved value).**] Together with (ii), markers are consumed by the semantics in exactly two places — tensor rows serving as uppers (explicit shape structure) and this convention — and never from a solved row variable’s value. Row variables are therefore uniformly rigid-sorted, the sort discipline of Remark 2.14 holds by construction, and the solver never faces a marker choice. (Remark 2.11’s witness becomes determinate under the discipline for the same reason: its second constraint puts \langle\rho\rangle in upper position, so its author must now declare how it is marked — e.g. by the left convention — rather than leave the placement to policy.) [Audit outcome — see §10: the implementation does consume solved values’ markers in later broadcasts, but every consumed placement traces to a declared origin; the invariant to prove is marker provenance, a slight weakening of the “never consumed” claim with the same moral.]
Inference uses equivalences; relating the two semantics is a proof obligation. The implementation derives equivalence (content-equality) constraints from einsum specs for shape-inference reasons: equivalences propagate dimension information bidirectionally and pin ranks, which is what determines hidden and parameter dimensions. Content equality implies \sqsubseteq^{1} (pointwise reflexivity), so the inference reading strengthens the checking reading constraint by constraint; this is now presented as such — the theorem is proved in A.8: per-constraint strengthening (\approx = mutual \sqsubseteq^1), system-level inclusion \mathrm{Sol}(\Phi_{\mathrm{eq}}) \subseteq \mathrm{Sol}(\Phi_{\sqsubseteq}), and checking-correctness of solver outputs (\gamma_{\mathrm{close}} satisfies the sandwich, via Prop. 6.2). The strengthening is proper — rank pinning and bidirectional dimension transport are commitments in Def. 6.1’s family, justified against the checking semantics by the output theorem, not smuggled in as the semantics. What remains with the surface-language work is only characterizing which programs separate the two readings (A.10).
Inherited placements: non-principality, and the bullet bitten. The inference reading of (iv) has a marker consequence the checking sandwich avoids. Because the einsum constraints are equivalences between tensor rows and template rows, a tensor row that gets solved through an einsum inherits its broadcast point from the template’s ellipsis position, via (iii)’s inherit-the-split policy. For the einsum’s own result tensor the inheritance is arguably intended — the result spec is the user’s declaration at the defining site. The unintended case is a tensor whose row is still open when consumed (paradigmatically a leaf/parameter under shape inference): its placement is declared by whichever consuming einsum’s operand spec binds it first — a statement about how one consumer reads the tensor, not about what the tensor is — so where the marker lands depends on the order in which einsums are applied. And the dependence is observable, not cosmetic: a later rank-increasing broadcast (a pointwise Row_ineq against an open result) inserts axes at the stored marker, so flat-equal values with different splits pin different downstream axes — different inferred shapes, or an error in one order and success in the other. Which declaration wins is the solver’s traversal order (emission order to first approximation, though the fixpoint’s per-round reversal — §10 — makes the precise tiebreak an artifact of the loop, not a principled policy). Principality therefore holds only up to the flat equivalence \approx of Def. 2.9: the ground content of solutions is strategy-independent (the Thm. 5.6 yardstick, weakened to \approx on closed segments), while marker placement is deterministic but order-sensitive. Under Def. 3.4’s adopted \approx-semantics the order-dependence factors cleanly: a representative choice at equality sites, invisible to \mathrm{Sol}, plus the policy-rejection gap where a committed placement meets a discriminating marked inequality (Lemma 5.1’s policy steps; Cor. 5.5’s conjecture is that the surface language never reaches it).

Decision — bite the bullet. The paper states most-generality and uniqueness up to \approx, fixes the strategy (folding into Thm. 5.6’s fix-a-strategy caveat), and demotes marker placement from semantics to policy, whose sole guarantee is the provenance invariant of §10: every placement is a declared one, with first-binder-wins selecting among declarations. The bullet is smaller than it looks: result-side inheritance is the defining-site declaration; only operand-side inheritance into still-open rows is the alien-declaration case. Upgrade path [open] if that residue ever bites in practice: distinguish binding declarations (result specs; slice’s operand-side prepend, which is the operation declaring structure) from consuming mentions (operand specs). A tensor is the result of exactly one operation, so binding declarations never race; re-marking consumer-inherited splits to the left edge (at solve time or at closing) would leave at most one declaration per row and restore principality of placement outright — at the cost of an origin tag on row-variable bindings and an audit of any code relying on operand-inherited splits.

3. Terms and substitutions

Definition 3.1 (Terms). Dimension terms t ::= \alpha \mid n_b \mid 1_\emptyset. Row terms R ::= l \cdot \diamond \cdot r \mid l \cdot \langle\rho\rangle \cdot r with l, r sequences of dimension terms; we identify a bare row variable \rho with [\,]\cdot\langle\rho\rangle\cdot[\,]. A term is ground if variable-free. Each row term contains at most one row variable, at the marker — an invariant preserved by everything below.

Definition 3.2 (Substitution). A substitution \sigma is a finite, sort-respecting map from variables to terms with \mathrm{dom}(\sigma) \cap \mathrm{vars}(\mathrm{ran}(\sigma)) = \emptyset (idempotency). Application is structural except at the marker: if \sigma(\rho) = l' \cdot \langle\rho'\rangle \cdot r' (or closed, l'\cdot\diamond\cdot r'), then \sigma(l \cdot \langle\rho\rangle \cdot r) \;=\; (\sigma l \cdot l') \cdot \langle\rho'\rangle \cdot (r' \cdot \sigma r) \quad\big(\text{resp. } (\sigma l \cdot l') \cdot \diamond \cdot (r' \cdot \sigma r)\big).

Lemma 3.3 (Composition). [sketch] (\sigma \circ \tau)(X) = \sigma(\tau(X)) for all terms X, where \sigma \circ \tau is the usual composite substitution. Proof: induction on X; the marker case is associativity of context filling — splicing \tau(\rho)’s flanks and then applying \sigma equals splicing (\sigma\circ\tau)(\rho)’s flanks into \sigma-applied host flanks, by Definition 3.2 unfolded twice. Routine. \square

Definition 3.4 (Semantics). A ground substitution \gamma is total on a fixed countable variable universe and maps into ground dimensions/rows. For a constraint set \Phi over atomic constraints t_1 = t_2 \mid t_1 \sqsubseteq t_2 \mid R_1 \approx R_2 \mid R_1 \sqsubseteq R_2: \mathrm{Sol}(\Phi) = \{\gamma \text{ ground} \mid \gamma(\phi) \text{ holds for every } \phi \in \Phi\}, where ground dimension equalities are identity, ground row equalities are judged by the flat equivalence \approx of Def. 2.9 — equivalently, identity of the rigidifications \mathrm{flat}(\cdot)^\bullet in the two-sorted algebra of Def. 2.12, equivalently mutual \sqsubseteq^{1} (Remark 2.15(ii)); the three formulations coincide — and ground inequalities are instances of the marked orders of §§1–2 unchanged. [Adopted — this was the live design decision. The \approx reading is forced by practice (an einsum spec row must equate with a literal shape whose marker sits at the front; marked equality rejects everyday programs — §10), and it is principled: it is the equivalence the one-shot order induces, and genuine equality in the rigid sort, dodging Prop. 2.10(iii)’s no-congruence obstruction by routing through \mathrm{flat}(\cdot)^\bullet rather than quotienting. The cost is that equality constraints no longer pin marker placements (Remark 2.11), so the solver’s marked bindings carry a policy component; the consequences are threaded through Def. 4.3 (equality cases restated), Lemma 5.1 (semantic/policy step split), Cor. 5.5 (failure soundness modulo policy), and Thm. 5.6 (most generality up to \approx).] (Notation: row-equality constraints are accordingly written R_1 \approx R_2 throughout, matching their satisfaction relation; = between rows is reserved for algebraic identity (Def. 2.1), and remains the constraint symbol at dimension sort, where satisfaction is identity. The implementation’s Row_eq constructor denotes the \approx-constraint.) A substitution \sigma models \Phi, \sigma \models \Phi, iff \gamma \circ \sigma \in \mathrm{Sol}(\Phi) for every ground \gamma — equivalently, \sigma(\Phi) is valid under universal quantification of its free variables. The substitution order: \sigma_1 \le \sigma_2 iff \exists u.\ u \circ \sigma_1 = \sigma_2.

Example 3.5 (No principal model exists in general). [proved] Let \Phi = \{3_b \sqsubseteq \alpha\}. The identity does not model \Phi (ground \alpha \mapsto 5_b falsifies it). Any model must map \alpha to a term all of whose groundings lie above 3_b, i.e. to 3_b or 1_\emptyset (a variable target fails as the identity did). The two models \sigma_1 = [\alpha \mapsto 3_b] and \sigma_2 = [\alpha \mapsto 1_\emptyset] are \le-incomparable: a witness u for either direction would have to send a ground dimension to a different ground dimension, which substitutions cannot do. Hence \Phi has models but no \le-least model.

Consequence. The template’s Theorem 1 (“the solver computes \sigma_\star \models \Phi least among models”) is false as stated. The repair: the solver’s answer is a pair — a substitution and a residual bound store — and the correct theorem is about solution-set representation and most-generality (Theorem 5.6 below). A principal model in the original sense exists exactly when the residual store is empty (Corollary 5.8).

4. The solver: configurations and rules

Definition 4.1 (Configuration). A configuration is \langle \Phi;\ \sigma;\ B \rangle — unsolved constraints, accumulated (idempotent) bindings, and a bound store. B assigns to each unsolved variable: for a dimension variable \alpha, at most one atom lower bound \mathrm{lb}(\alpha) \in \{n_b\} plus a set of deferred variable–variable inequalities (“adjacencies”); for a row variable \rho, a set of row-term lower bounds (“caps”, from inequality residues) plus adjacencies. There is a distinguished failure configuration \mathsf{fail}.

Binding a variable (\sigma := [x \mapsto T] \circ \sigma) re-emits into \Phi every bound and adjacency stored for x, instantiated at T, and substitutes T for x throughout \Phi and B. This single discipline replaces ad-hoc propagation.

Definition 4.2 (Dimension rules).

	Constraint	Action
DE-refl	t = t	discard
DE-clash	n_b = m_c, (n,b)\ne(m,c) (incl. atom vs 1_\emptyset)	\mathsf{fail}
DE-bind	\alpha = t, t \ne \alpha	bind \alpha \mapsto t
DI-top	t \sqsubseteq 1_\emptyset	discard
DI-refl	t \sqsubseteq t	discard
DI-ground	d \sqsubseteq d' ground	check Def. 1.2; else \mathsf{fail}
DI-pin	\alpha \sqsubseteq d, d an atom	replace by \alpha = d
DI-pin-top	1_\emptyset \sqsubseteq \alpha	replace by \alpha = 1_\emptyset
DI-cap	d \sqsubseteq \alpha, d an atom	if \mathrm{lb}(\alpha) unset: record; if = d: discard; if = d' \ne d: replace by \alpha = 1_\emptyset
DI-adj	\alpha \sqsubseteq \beta, \alpha \ne \beta	defer (record adjacency on both)

Justifications (each rule preserves \mathrm{Sol}): DI-pin: in D, the only element \sqsubseteq an atom d is d itself. DI-pin-top: the only element above the top is the top. DI-cap collapse: any \gamma(\alpha) above two distinct atoms must be 1_\emptyset (Prop. 1.3). DI-pin-top is the case the blog’s narrative omits.

Definition 4.3 (Row rules). Write both sides aligned at the outer edges (leading flanks from the front, trailing from the back).

Equality R_1 \approx R_2. Emit dimension equalities on the overlapping aligned flank positions. Let the surpluses (the unmatched inner segments of the longer flanks) be m_l, m_r. Cases: (a) both sides closed: \mathsf{fail} iff the ranks differ; at equal rank the constraint is the flat zip \mathrm{flat}(R_1)[i] = \mathrm{flat}(R_2)[i] — Def. 3.4’s \approx-semantics; markers are not consulted. (Under the former marked reading this case also failed on marker mismatch; that clause is gone.) (b) one side open with variable \rho, the other closed: \mathsf{fail} iff |l_1| + |r_1| > |l_2| + |r_2| (total overflow — substitution extends the open side’s flanks inward only, so no grounding can shorten them; the former per-flank overflow condition was a marked-equality artifact and is dropped). Otherwise the open side’s explicit flanks zip against the closed side’s flattening from the outer edges, and the variable’s flat content is entailed: the uncovered middle m of \mathrm{flat}(R_{\mathrm{closed}}). The binding \rho \mapsto m_l \cdot \diamond \cdot m_r with m_l \cdot m_r = m must additionally choose a placement, which the \approx-semantics leaves free (Remark 2.11): the rule commits the closed side’s declared split where it falls inside m, the left edge otherwise — the inherit-the-split policy of Remark 2.15(iii), a policy step in the family of Def. 6.1, not an entailment. No occurs check is needed, the closed side having no row variable. (The implementation additionally rejects when the open side’s trailing flank overflows the closed side’s — the asymmetric trailing guard, row.ml; conservative under \approx: see §10’s marker item.) (c) both open, \rho_1 \ne \rho_2. Nested surpluses (both on the side of \rho_2’s term): bind \rho_1 \mapsto m_l \cdot \langle\rho_2\rangle \cdot m_r — the content is entailed (cancel the common outer flanks of the flat words); only the placement is policy, as in (b). Split surpluses (one leading surplus on one side, one trailing on the other): the residue is the two-variable word equation s \cdot x_1 = x_2 \cdot t, whose solutions are the principal family x_2 = s\cdot w, x_1 = w\cdot t plus at most |s| sporadic cross-overlap solutions (x_2 a proper prefix of s, the rest forced — A.1.3). A fresh-variable binding \rho_2 \mapsto s\cdot\langle\rho'\rangle, \rho_1 \mapsto \langle\rho'\rangle\cdot t — this rule’s previous form — captures exactly the family: exact under the former marked semantics, but under \approx it forecloses the sporadic solutions, a rank commitment stronger than a placement choice (and it breaks rank conservativity, A.2’s remark). The rule therefore defers into the closing policy, like (d) below: the equation stays in flight; a substitution that grounds either variable makes the check exact; the closing stage’s upward close of either variable selects the corresponding extremal solution (for the variable carrying x_2, the outermost sporadic one) and the re-emitted equation validates it. This matches the implementation, which never bound here: unify_row’s cross-surplus branch (beg_handled = false) re-emits the residual equation — verified, and pinned with both constraint orders in test/einsum/test_row_self_reference.ml. (d) both open with \rho_1 = \rho_2: discard if surpluses empty, else \mathsf{fail}, with two distinct justifications. Unequal known flank totals (occurs check): no finite row satisfies \rho \approx m_l \cdot \langle\rho\rangle \cdot m_r with m_l \cdot m_r nonempty, since splicing strictly increases rank and \approx is rank-preserving. Equal totals but shifted splits (rotational case): the residue is the word equation x \cdot t = s \cdot x (x the variable’s value, s, t the surplus flank words) — satisfiable under Def. 3.4’s \approx-semantics exactly for conjugate s, t with cyclically periodic x (Lyndon–Schützenberger), a solution family outside the solver’s constraint language. Rather than \mathsf{fail}, the rotational case defers into the closing policy: the equation stays in flight; if \rho is solved by other constraints the substituted (closed–closed, flat) check is exact, and otherwise the guessing stage closes \rho upward — the least-material disjunct, x = [\,], after which the equation requires s = t. This accepts exactly the \delta = 0 conjugacy class: a policy commitment in the family of Def. 6.1, and — now that \approx is the adopted semantics — a genuine incompleteness on the nontrivial conjugate instances (periodic x \ne [\,]), accepted deliberately. (Implementation history: the rotational case was first silently dropped — accepting the unsatisfiable [3]\cdot\langle\rho\rangle \approx \langle\rho\rangle\cdot[5] with no dimension checked — then conservatively rejected, now deferred as described; found via the gh-ocannl-247 review, see §10.)

Inequality R_{\mathrm{res}} \sqsubseteq R_{\mathrm{op}}. Emit dimension inequalities \,t_{\mathrm{res}} \sqsubseteq t_{\mathrm{op}} on aligned flank overlaps. Then: (i) operand open, result supplies \ge the operand’s flanks: record the result’s interior residue (a row term; open if the result is) as a cap on \rho_{\mathrm{op}}. (ii) operand closed, result rank \ge operand flanks: the operand’s interior in the expansion is 1_\emptyset; the corresponding result positions are unconstrained — discard them. (iii) result closed and shorter than the operand’s known flanks: \mathsf{fail} (genuine rank mismatch). (iv) result open and shorter than the operand’s known flanks (deficit k > 0): deficit rule — bind \rho_{\mathrm{res}} \mapsto \alpha_1 \cdots \alpha_{k_l} \cdot \langle\rho'\rangle \cdot \beta_1 \cdots \beta_{k_r} with fresh dimension variables and a fresh row variable, sized to clear the deficit, then reprocess. Subject to the rank-cycle check of Def. 4.5.

Definition 4.4 (Fairness). A run processes constraints until \Phi = \emptyset or \mathsf{fail}; re-emitted constraints return to \Phi. Any fair strategy is allowed.

Example 4.5a (Divergence without a cycle check). [proved] Let \Phi_0 = \{\, \rho_1 \sqsubseteq [a]\cdot\langle\rho_2\rangle,\ \ \rho_2 \sqsubseteq [b]\cdot\langle\rho_1\rangle \,\} (a, b ground atoms). Semantically: the first constraint forces \mathrm{rank}(\gamma\rho_1) \ge 1 + \mathrm{rank}(\gamma\rho_2) (flanks-fit in Def. 2.3), the second forces \mathrm{rank}(\gamma\rho_2) \ge 1 + \mathrm{rank}(\gamma\rho_1); hence \mathrm{Sol}(\Phi_0) = \emptyset. Operationally, the deficit rule fires on \rho_1, growing it by one axis; substitution lengthens the operand side of the second constraint, whose deficit rule grows \rho_2; substitution lengthens the operand side of the first constraint’s residue, and so on forever. The rules of Def. 4.2–4.3 alone do not terminate. The self-cyclic instance \rho \sqsubseteq [a]\cdot\langle\rho\rangle (obtainable by an einsum equality merging two row variables) diverges the same way.

Remark 4.5b (Minimality is implementation-relative). [proved by experiment] For the abstract rules, the two-variable cycle above is the minimal divergence. In row.ml (pre-fix), two-variable cycles happened to terminate: template memoization plus equality-binding collapse the ping-pong into the one-step self-reference check of Def. 4.3(d) — an accident of the cache, not a design property. The minimal live divergence was the three-variable cycle \{\rho_1 \sqsubseteq \langle\rho_2\rangle{\cdot}[a],\ \rho_2 \sqsubseteq \langle\rho_3\rangle{\cdot}[b],\ \rho_3 \sqsubseteq \langle\rho_1\rangle{\cdot}[c]\}, which ran unboundedly until killed. Post-fix, both lengths are rejected by the Def. 4.5 check; test/einsum/test_row_rank_cycle.ml pins the two- and three-variable cycles and a leading-flank variant. (Details in §10’s Outcome block.)

Definition 4.5 (Rank-fact graph and cycle check). Maintain beside the configuration a directed graph G on row variables — persistent: edges are only ever added, never retracted, in particular not when a variable is solved and substituted away. An edge \rho \xrightarrow{\,k\,} \rho' with weight k \ge 0 records the entailed fact \mathrm{rank}(\gamma\rho) \ge \mathrm{rank}(\gamma\rho') + k (for every solution \gamma of the current configuration). Edges are recorded at three points:

binding: solving \rho \mapsto l \cdot \langle\rho'\rangle \cdot r records \rho \xrightarrow{\,|l|+|r|\,} \rho' (the entailed fact is an equality; one direction suffices for the check);
equal flanks: processing R_{\mathrm{res}} \sqsubseteq R_{\mathrm{op}} with both sides open and equal known flank lengths records \rho_{\mathrm{res}} \xrightarrow{\,0\,} \rho_{\mathrm{op}};
deficit: the deficit rule with deficit k > 0 and open operand records \rho_{\mathrm{res}} \xrightarrow{\,k\,} \rho_{\mathrm{op}} before growing (a closed operand needs no edge: the rank bound is absolute and the growth is one-shot).

Every insertion is guarded: if the new edge closes a directed cycle of total weight > 0, then \mathsf{fail}.

Soundness of failing: summing the facts around a positive cycle gives \mathrm{rank}(\gamma\rho) \ge \mathrm{rank}(\gamma\rho) + w with w > 0, so \mathrm{Sol} = \emptyset. Soundness of persistence: each edge is entailed by constraints present when it was recorded, and the solver never retracts a constraint — Lemma 5.1 only moves and re-expresses them — so entailed facts stay entailed, even when the variables involved have long been substituted away. Zero-weight cycles are legal: they assert rank equality along the cycle (and could be collapsed to row equalities, generalizing the one-step row antisymmetry the implementation already performs — cf. §10).

Sign discipline (load-bearing). In residue case (i) of Def. 4.3 — operand open, result’s known flanks in surplus by s > 0 — the constraint entails only \mathrm{rank}(\gamma\rho_{\mathrm{res}}) \ge \mathrm{rank}(\gamma\rho_{\mathrm{op}}) - s: a nonpositive lower bound. Recording it with weight 0 (or s) is unsound: it manufactures cycles that no solution violates. A first implementation attempt did exactly this and rejected legitimate programs (the SGD self-reference idiom among them); the fixed implementation records nothing there. Recording a genuinely negative weight would be sound but is unnecessary for the check’s purpose.

This is the formal identity of the implementation’s global_rank_edges (§10, Outcome); Example 4.5a shows the check is necessary for termination, not an optimization.

Proposition 4.6 (The check is exact for the recorded facts). [proved] At any point in a run, the recorded fact set \{\mathrm{rank}(\rho) \ge \mathrm{rank}(\rho') + k\} is satisfiable over \mathbb{N} iff G has no positive-weight cycle. Hence the guard fails exactly when the facts recorded so far are jointly unsatisfiable.

Proof. (\Rightarrow) Sum the facts around a positive cycle. (\Leftarrow) Set r(\rho) = the supremum of total weights of walks leaving \rho. A walk revisiting a vertex traverses a cycle, of weight \le 0 by hypothesis, so the supremum is attained on simple paths and is finite; r(\rho) \ge r(\rho') + k for each edge \rho \xrightarrow{\,k\,} \rho', by prefixing the edge to an optimal walk from \rho'. \square

(Standard difference-constraints/Bellman–Ford duality — worth stating because it delimits what the check decides: unsatisfiability of the recorded facts, a sound under-approximation of rank-unsatisfiability of the configuration. Whether enough facts are always recorded in time is exactly the Detection Lemma of Thm. 5.2(b).)

5. Metatheory of solving

Lemma 5.1 (Solution preservation, with a policy coordinate). [proved — full per-rule assembly in A.1] Write \widehat{C} for the constraint set \Phi \cup \mathrm{eqns}(\sigma) \cup \mathrm{constr}(B) denoted by a configuration (\mathrm{eqns}(\sigma) = \{x = \sigma(x)\}, read as \approx-constraints at row sort per Def. 3.4; \mathrm{constr}(B) the stored bounds/adjacencies as constraints), and consider \mathrm{Sol}(\widehat{C}) under Def. 3.4. The rules of Defs. 4.2–4.3 divide:

Semantic steps — all dimension rules, all row-inequality cases, and the checking content of the equality cases (the flat zip of (a), the outer-edge dimension equalities, the rank-overflow and occurs failures, the entailed flat content of every binding) — preserve \mathrm{Sol}(\widehat{C}) exactly, up to extension of \gamma to freshly introduced variables (deficit rule, fresh splice variables): every solution of the old configuration extends to one of the new, and every solution of the new restricts to one of the old. \mathsf{fail} steps of this kind are taken only when \mathrm{Sol}(\widehat{C}) = \emptyset.

Policy steps — the placement component of the row-variable bindings in equality cases (b)/(c-nested), and the deferred-equation resolutions at closing (the rotational case (d) and the split-surplus case (c-split), both word-equation residues resolved by the upward close) — refine \mathrm{Sol}: the bound variable’s flat content is entailed, but the committed split selects one of the pairwise \sqsubseteq-incomparable representatives of Prop. 2.10(ii), and substituting the committed value into stored marked inequalities may strengthen them. Every solution of the new configuration restricts to a solution of the old [soundness — proved]; conversely a solution of the old survives into the new after re-placing the bound variables’ markers, which is possible iff no marked inequality of \widehat{C} discriminates the placement — Remark 2.11’s witness is the counterexample, and Remark 2.15(v)’s order-dependence is its operational trace. A \mathsf{fail} arising from a substituted placement, where another representative would have passed, is therefore a policy rejection, not semantic unsatisfiability.

Proof. Per rule, written out in A.1 (dimension rules A.1.1; equality cases A.1.2–A.1.3, including the split-surplus dichotomy that forced (c)’s correction; inequality cases A.1.4). The policy inventory is exactly: placements at (b)/(c-nested) bindings; deferred-equation resolutions at closing for (c-split)/(d). \square

Definition 5.2.1 (Rank abstraction). For a configuration with constraint denotation \widehat{C}, its rank abstraction \lfloor\widehat{C}\rfloor is the finite set of rank facts read off one constraint at a time: a row constraint relating l\cdot\langle\rho\rangle\cdot r to l'\cdot\langle\rho'\rangle\cdot r' (as result/operand of \sqsubseteq, resp. as an equality) contributes \mathrm{rank}(\rho) \ge \mathrm{rank}(\rho') + (|l'|+|r'|) - (|l|+|r|) (both directions for an equality); open-vs-closed pairs contribute the corresponding absolute bounds on \mathrm{rank}(\rho). A rank model is an assignment r : \mathrm{RowVars} \to \mathbb{N} satisfying \lfloor\widehat{C}\rfloor. Every \gamma \in \mathrm{Sol}(\widehat{C}) induces a rank model \rho \mapsto \mathrm{rank}(\gamma\rho) (flanks-fit in Def. 2.3); the converse fails — the abstraction forgets dimensions and bases.

Lemma 5.2.2 (Rank conservativity). [proved — rule by rule in A.2] Every non-\mathsf{fail} rule maps a configuration with a rank model to one with a rank model agreeing on the old variables (extended on fresh ones). The deficit rule extends by r(\rho') := r(\rho_{\mathrm{res}}) - k \ge 0, justified by the triggering constraint’s own rank fact; everything else preserves the model verbatim. (A.2’s remark: the previously drafted (c-split) fresh-variable binding would break this lemma — its \mathrm{rank}(\rho_2) \ge |s| commitment is not implied by the consumed fact — an independent reason for Def. 4.3(c)’s correction to deferral.) \square

Theorem 5.2 (Termination, restructured). Let \Phi_0 be finite.

(a) [proved — Lemma 5.2.2 in A.2, the lineage bookkeeping bound in A.3] If \Phi_0 has a rank model — in particular whenever \mathrm{Sol}(\Phi_0) \ne \emptyset — then every fair run terminates, with or without the cycle check; moreover the check never fires on such inputs (its recorded facts are entailed, hence satisfied by the rank model, hence positive-cycle-free by Prop. 4.6), so guarding is semantically free.

(b) [open — reduced to the Detection Lemma below] If \Phi_0 has no rank model, then \mathrm{Sol}(\Phi_0) = \emptyset and no run reaches solved form (Prop. 5.4 plus Lemma 5.1 would otherwise produce a solution, hence a rank model); every run either fails or diverges, and the cycle check is what must convert divergence into failure. That it always does is the remaining gap.

Proof of (a). Fix a rank model r_0 of \Phi_0 and extend it along the run (Lemma 5.2.2) to \hat r, defined on every variable ever minted. Track each row inequality through its lineage: residues, storage, re-emissions, substitutions. For a lineage c whose current result term is l\cdot\langle\rho\rangle\cdot r, set \psi(c) = \hat r(\rho) \in \mathbb{N} (and \psi(c) = 0 once the result is closed — closed results never grow; they discharge or fail in bounded steps). A deficit step on c with deficit k \ge 1 binds \rho \mapsto (k fresh dims around \langle\rho''\rangle) with \hat r(\rho'') = \hat r(\rho) - k — well-defined in \mathbb{N} because \hat r satisfies the triggering constraint’s rank fact, which gives \hat r(\rho) \ge k — so \psi(c) drops by at least 1. Any other binding of \rho (an equality splice \rho \mapsto l''\cdot\langle\rho'''\rangle\cdot r'') has \hat r(\rho''') \le \hat r(\rho), so \psi never increases. Hence each lineage fires the deficit rule at most \psi_0(c) times. The number of lineages ever live is bounded in |\Phi_0| — proved in A.3, by the no-forking accounting: no rule emits a row-sort constraint that does not continue the lineage of its consumed one, so lineage count is bounded by |\Phi_0| plus one continuation per closing commitment, and the per-lineage potential \psi never increases along any continuation. So deficit steps are finite in total; equality splices mint at most one variable per equality; with mints finite, binding events (the only steps that re-emit and grow \Phi) are finite, since every variable binds at most once. Between consecutive binding events, every rule strictly decreases the lexicographic measure (row constraints in \Phi, dimension constraints in \Phi): dimension rules consume their constraint into nothing or the store; row rules consume their row constraint and emit only dimension constraints or store their residue. Total: termination. \square

Reduction of (b). What remains is:

Detection Lemma. [open — structured reduction in A.9: D1 (infinitely many open–open deficits) proved; D2 (deficit accounting along the feeding relation) sketched modulo a fixed-quotient bookkeeping; D3 (circulation transports into a G-cycle via the anchor edges) the residual gap, with a counterexample hunt prescribed through case-(i) residues, which record no edges] Every infinite run inserts, at some finite stage, an edge closing a positive-weight cycle in G. (Hence, with the guard of Def. 4.5, infinite runs do not exist, and rank-unsatisfiable inputs fail in finite time.)

Why this is delicate, and the mechanism. Fresh mints only ever move forward: within a single lineage, the kind-(i) edges form a growing path of brand-new vertices, so no cycle can close along one lineage — a divergent run is a forward spiral, and a naïve reading suggests the guard never triggers. What closes cycles is the interplay of cross edges and anchors: a kind-(iii) edge points from the firing lineage’s current result variable to its current operand variable — a vertex on the feeding lineage’s chain — and the kind-(ii) zero-weight residue edge anchors each fresh mint back to the operand variable it was measured against. Concretely, on the three-variable cycle of Remark 4.5b under a strategy that fully processes the first two constraints before the third: C_1 records \rho_1 \xrightarrow{1} \rho_2 (iii), \rho_1 \xrightarrow{1} \rho_1' (i), \rho_1' \xrightarrow{0} \rho_2 (ii); C_2 records \rho_2 \xrightarrow{1} \rho_3, \rho_2 \xrightarrow{1} \rho_2', \rho_2' \xrightarrow{0} \rho_3; then C_3 — its operand meanwhile substituted to \langle\rho_1'\rangle{\cdot}[a, c] — attempts the kind-(iii) edge \rho_3 \xrightarrow{2} \rho_1', which closes \rho_3 \xrightarrow{2} \rho_1' \xrightarrow{0} \rho_2 \xrightarrow{1} \rho_3 of total weight 3: \mathsf{fail} at the first deficit of the third constraint, before the spiral completes a turn. Note the witness path runs through \rho_2, already bound and absent from the store — this is why G must be persistent: the bound store retracts every entry mentioning a solved variable and re-emits it in-flight, so any potential or check stated against the store lags the ping-pong by exactly one step and never sees the cycle (verified against row.ml‘s s_row_one_in_entry; §10, Outcome). The lemma to prove, in flow form: the deficit weight a lineage can emit is bounded by its \Phi_0-initial operand surplus plus the weight flowing into its operand side from other lineages’ bindings; unbounded total weight through finitely many lineages forces a positive circulation among them; and the anchor edges transport that circulation from the lineage quotient into a genuine cycle of G. [This is the previous draft’s bracketed gap in sharpened form: the potential cannot live on the substitution-mutable store; it must live on the monotone graph.]

Definition 5.3 (Solved form). A final configuration \langle \emptyset; \sigma_\star; B_\star \rangle is in solved form: \sigma_\star idempotent; every variable in B_\star unsolved; each dimension variable carries at most one atom lower bound; all stored row caps and adjacencies are between unsolved variables and contain no pending pins (else a rule would fire).

Proposition 5.4 (The residual store is always satisfiable; \gamma_\uparrow is its greatest solution). [proved] Define \gamma_\uparrow on the unsolved variables: \gamma_\uparrow(\alpha) = 1_\emptyset, \gamma_\uparrow(\rho) = [\,]\cdot\diamond\cdot[\,], extended through \sigma_\star on solved variables (\gamma_\uparrow(x) = \gamma_\uparrow(\sigma_\star(x))). Then (i) \gamma_\uparrow \in \mathrm{Sol}(\mathrm{constr}(B_\star) \cup \mathrm{eqns}(\sigma_\star)); (ii) \gamma_\uparrow is the pointwise-greatest element of that solution set: for every solution \gamma and every variable x, \gamma(x) \sqsubseteq \gamma_\uparrow(x).

Proof. (i) Atom caps d \sqsubseteq \alpha: d \sqsubseteq 1_\emptyset ✓. Adjacencies \alpha \sqsubseteq \beta: 1_\emptyset \sqsubseteq 1_\emptyset ✓. Row caps \,\text{residue} \sqsubseteq \rho: every row is \sqsubseteq the empty row (Prop. 2.6) ✓; row adjacencies likewise. Bindings hold by construction. (ii) For unsolved x: tops (Props. 1.3, 2.6). For solved x with \sigma_\star(x) = T: \gamma(x) = \gamma(T) and \gamma_\uparrow(x) = \gamma_\uparrow(T); compare \gamma(T) \sqsubseteq \gamma_\uparrow(T) by structural induction on T — ground parts are equal; variable positions are handled by the unsolved case; for an open row term, \gamma’s splice has flanks extending \gamma_\uparrow’s splice on the inner sides, so flanks fit, and pointwise positions are either shared flank entries (inductive case) or middle positions where \gamma_\uparrow(T)’s expansion shows 1_\emptyset. \square

(Scope notes. First: solved form here is taken with the rotational deferrals of Def. 4.3(d) discharged — an in-flight deferred equation x \cdot t = s \cdot x is satisfied by \gamma_\uparrow iff s = t, so a surviving s \ne t deferral postpones (i) to the closing step that resolves or fails it; the success direction of Cor. 5.5 is unaffected, since an end-to-end successful run has discharged its deferrals. Second: (ii)’s greatestness is over the solution set of the final configuration, which encodes the run’s placement commitments; it does not by itself give greatestness over \mathrm{Sol}(\Phi_0) — that transfer is the content, and the caveat, of Prop. 6.4.)

Corollary 5.5 (Decision). [proved modulo Thm. 5.2(b); failure direction modulo placement policy] A fair run fails iff the policy-strengthened system is unsatisfiable — \Phi_0 plus Lemma 5.1’s policy commitments (placements, and the deferred-equation resolutions of (c-split)/(d)). Success implies \mathrm{Sol}(\Phi_0) \ne \emptyset outright (Prop. 5.4 composed back through the preservation chain — the success direction is unconditional). Failure implies \mathrm{Sol}(\Phi_0) = \emptyset when the failing step is semantic; a policy rejection can fail an \approx-satisfiable input (Remark 2.11’s witness). On inputs satisfiable and placement-undiscriminated, runs terminate by Thm. 5.2(a) and cannot fail, so they decide positively. On unsatisfiable inputs, runs cannot succeed; that they terminate — making the solver a decision procedure rather than a semi-decision procedure — is exactly the Detection Lemma of Thm. 5.2(b). [Conjecture, the surface-language discharge: programs whose markers all arise from declared origins (the provenance invariant, §10) never produce a placement-discriminating constraint set, so on the reachable fragment the policy caveat is vacuous and the unqualified “fails iff unsatisfiable” holds. Remark 2.11’s witness should be checked to be unwritable from the surface language.] The alternative route is Remark 6.7: making the placement and conjugacy-witness commitments explicit backtracking branches (both finite) recovers the unconditional failure direction without the conjecture, which then demotes to the statement that the search never branches on the reachable fragment.

Theorem 5.6 (Soundness, representation, most generality — the corrected Theorem 1). [proved given 5.1; Thm. 5.2(a) supplies the termination hypothesis on satisfiable inputs] Let a fair run on finite \Phi_0 terminate in \langle\emptyset; \sigma_\star; B_\star\rangle. Then, with V = \mathrm{vars}(\Phi_0):

(Representation) \gamma \in \mathrm{Sol}(\Phi_0) iff \gamma extends to a ground \hat\gamma (on the fresh variables) with \hat\gamma = \hat\gamma \circ \sigma_\star and \hat\gamma \in \mathrm{Sol}(\mathrm{constr}(B_\star)).
(No guessing / entailment) every binding x \mapsto T in \sigma_\star is conservative: every solution of \Phi_0 extends to one satisfying x = T.
(Most generality among models) for every substitution \sigma \models \Phi_0 there is u with u \circ \sigma_\star = \sigma on V — indeed u = \sigma works: \sigma = \sigma \circ \sigma_\star on V.

Proof. (1) and (2) are the composition of Lemma 5.1 across the run. (3): let \sigma \models \Phi_0. For every ground \gamma, \gamma\circ\sigma \in \mathrm{Sol}(\Phi_0), so by (1)–(2) \gamma\circ\sigma (extended over fresh variables) satisfies every equation x = \sigma_\star(x), x \in V \cap \mathrm{dom}(\sigma_\star) — i.e. \gamma(\sigma(x)) = \gamma(\sigma(\sigma_\star(x))) for all ground \gamma. Two terms that agree under every ground substitution are equal up to where the agreement relation looks: instantiating variables at distinct atoms separates any content difference (D has infinitely many atoms), but row equations are judged by \approx (Def. 3.4), which is marker-blind — so the conclusion is \sigma(x) and \sigma(\sigma_\star(x)) identical on dimension terms and equal up to \approx on closed row segments. Hence \sigma = \sigma \circ \sigma_\star on V up to \approx. \square

Caveat to the whole theorem (the \approx/policy weakening, with Def. 3.4’s adoption): (1)’s “iff” and (2)’s “every solution extends” hold exactly for runs of semantic steps; across Lemma 5.1’s policy steps the forward direction holds after re-placing the policy-committed markers, hence unconditionally only on placement-undiscriminated inputs (the conjectured surface fragment — Cor. 5.5). (3) is up to \approx as the proof now states: \sigma_\star is most general in content; its marker placements are the policy’s, not canonical. This is the theorem-level form of the bullet bitten in Remark 2.15(v).

Caveat to (1)/(2): the extension over fresh variables is existential, exactly as in standard unification with fresh-variable introduction; the answer is unique up to renaming of fresh variables. [Discharged by decision — A.4: the paper fixes the strategy, the implementation’s per-round queue fixpoint inside the staged pipeline; with one strategy, uniqueness needs no cross-strategy argument, and the policy commitments are deterministic functions of the input order — Remark 2.15(v)’s content. Cross-strategy confluence (up to \approx, modulo policy) stays open and unneeded; the semantic core is confluent-for-satisfiability outright by Lemma 5.1.]

Corollary 5.7 (Forced variables). \sigma_\star binds only variables whose values (relative to the remaining free ones) are entailed by \Phi_0 — by 5.6(2). Conversely every variable left unsolved genuinely admits at least two solutions or carries only its top default (\gamma_\uparrow vs. e.g. committing a cap), by Prop. 5.4 and inspection of solved form.

Corollary 5.8 (Principal model, recovered). If B_\star stores no atom caps and no row caps (adjacencies allowed), then \sigma_\star (read as a model, free variables universal) satisfies \sigma_\star \models \Phi_0, and by 5.6(3) it is the \le-least model: the blog’s “principal model” claim holds exactly in this case. Example 3.5 shows the restriction is necessary.

6. Closing

Definition 6.1 (Closing policy). Variables carry a provenance tag: leaf (belonging to a terminal tensor’s shape; a subset are parameters) or interior. Closing runs after solving, interleaved: 1. (Close leaves, downward.) For each unsolved leaf variable: bind a dimension variable to its stored atom cap if any, else to 1_\emptyset — except a parameter with no cap at some position: \mathsf{error} (“unspecified hidden dimension”). Bind a row variable to the join (Prop. 2.7) of the ground parts of its stored caps, with holes (positions where no cap is ground) filled by 1_\emptyset, and no-further-axes beyond the caps’ extent; parameters error at holes. 2. (Re-solve.) The bindings re-emit the variables’ stored constraints (Def. 4.1); run the solver to a new solved form. 3. (Close interiors, upward.) Bind every remaining variable to its top (1_\emptyset / the empty row); re-solve once more (re-emissions against tops discharge by DI-top and Prop. 2.6, so this cannot fail).

Proposition 6.2 (Closing terminates and is sound). [proved — strengthened in A.5] The interleave terminates, and if no \mathsf{error}/\mathsf{fail} occurs, the resulting total ground substitution \gamma_{\mathrm{close}} \in \mathrm{Sol}(\Phi_0). A.5 strengthens the termination half: the re-solves terminate unconditionally, not only when the committed system retains a rank model — ground commitments add no variable-to-variable rank facts, so the variable–variable fragment keeps its model and bounds all growth (the deficit potential consults only open-result lineages); a rank-breaking commitment therefore fails finitely, through the absolute fragment. The Detection Lemma is not needed here.

Proof. Termination as stated. Soundness: every closing commitment enters the system as an equality and the subsequent re-solve succeeds; by Lemma 5.1 the final configuration’s solutions are solutions of \Phi_0, and \gamma_{\mathrm{close}} is the (unique) solution of the final, fully-bound system. Step 3 cannot fail: a cap re-emitted against a top is d \sqsubseteq 1_\emptyset or R \sqsubseteq [\,]\cdot\diamond\cdot[\,], both valid. \square

Remark 6.3 (The interleave is load-bearing — but not order-robust as stated). [proved by example] \Phi = \{3_b \sqsubseteq \alpha,\ \alpha \sqsubseteq \beta,\ 5_b \sqsubseteq \beta\}, \alpha, \beta leaves. Closing without re-solving commits \alpha \mapsto 3_b, \beta \mapsto 5_b — violating \alpha \sqsubseteq \beta. With Def. 6.1, binding \alpha \mapsto 3_b re-emits 3_b \sqsubseteq \beta, the cap store on \beta collapses \{5_b, 3_b\} to \beta = 1_\emptyset (DI-cap), and the result is a genuine solution. So step 2 is required for soundness, not bookkeeping.

The interleave does not by itself fix the example, though: Def. 6.1’s step 1 leaves the binding order free, and closing \beta first commits \beta \mapsto 5_b — the join of the caps then visible — after which the re-solve squeezes \alpha into 3_b \sqsubseteq \alpha \sqsubseteq 5_b, an empty interval (the atoms are an antichain below 1_\emptyset, Prop. 1.3). That is a spurious failure on a satisfiable store — the solutions of \Phi are exactly \beta = 1_\emptyset, \alpha \in \{3_b, 1_\emptyset\}, so the bad commit assigns \beta a value no solution has — and a finicky one: order-dependent, detected one variable away from the culprit commitment, and by Lemma 5.1’s accounting a policy rejection (closing commitments are policy steps), not unsatisfiability. The repair is a scheduling discipline, not search: commit a variable only when its cap store can no longer grow — equivalently, close along the inequality adjacency from below (topological order, after collapsing mutual-\sqsubseteq cycles to equalities), or propagate ground caps across the adjacency to fixpoint before committing anything. Remark 6.7(i) states what the discipline buys. The implementation has it architecturally, at both sorts — fixpoint-before-commit is the design principle of the staged pipeline, not a dimension-sort code-path accident: Stage 1, the only online stage, exclusively unifies and propagates, and every commitment form (GLB commits, no-further-axes substitutions, guesses) is confined to the on-demand stages 2–7, so bounds collect across the adjacency to fixpoint before any stage that commits to them runs (the stage table in docs/shape_inference.md, “Solving the constraints”). The from_glb re-emissions in solve_dim_ineq are the dimension-sort face of the propagation — a ground bound arriving at a variable forwards eagerly to its stored opnd-neighbors, and an existing bound forwards when a new adjacency is recorded, so cap stores are transitively closed during solving and the stage-4 commit adds no downstream information; solve_row_ineq follows the same discipline at row sort. On this \Phi the DI-cap collapse to \beta = 1_\emptyset fires already at Stage1, before any commitment stage; all four emission orders pass, with identical solutions — pinned by test/einsum/test_closing_order.ml.

Proposition 6.4 (Uniform-upward closing yields the greatest solution — Post 5’s conjecture, in the \approx-corrected form). [proved for membership and the policy-relative greatestness; sketch for the \sqsubseteq^1 form] \gamma_\uparrow of Prop. 5.4 is a solution of \Phi_0 — unconditionally: the soundness direction of Lemma 5.1 composes through policy steps. Its greatestness needs more care than the original one-line proof (“Prop. 5.4 plus Theorem 5.6(1)”) admitted: that route uses 5.6(1)’s only-if direction, which policy steps break — the final configuration encodes the run’s placement commitments, so Prop. 5.4(ii) quantifies over a possibly proper subset of \mathrm{Sol}(\Phi_0). Three correct forms:

Policy-relative, marked order. \gamma_\uparrow is pointwise-greatest among the solutions of the policy-strengthened system — \Phi_0 plus the run’s placement commitments; equivalently, the solutions that extend into the final configuration (Cor. 5.5’s phrase). On placement-undiscriminated inputs (conjecturally the whole surface fragment) this set is all of \mathrm{Sol}(\Phi_0) and the unqualified claim holds. Proof: Prop. 5.4(ii) restricted along the representation, which is exact for the policy-strengthened system. \square
The unqualified marked claim is false. \Phi_0 = \{\langle\rho\rangle \approx [\,]\cdot\diamond\cdot[3,5]\}: the run commits \rho \mapsto [\,]\cdot\diamond\cdot[3,5] (inherit-the-split), so \gamma_\uparrow(\rho) = [\,]\cdot\diamond\cdot[3,5], while \gamma(\rho) = [3]\cdot\diamond\cdot[5] \in \mathrm{Sol}(\Phi_0) under Def. 3.4’s \approx-semantics is \sqsubseteq-incomparable to it (Prop. 2.10(ii)).
Absolute up to \approx, relative to the rank policy. For every \gamma \in \mathrm{Sol}(\Phi_0^{\mathrm{rk}}) — \Phi_0 plus the run’s deferred-equation resolutions, the rank-level policy commitments of (c-split)/(d) — and every x \in V: \gamma(x) \sqsubseteq^{1} \gamma_\uparrow(x) at row sort, plain \sqsubseteq at dimension sort. [proved — A.6] Placement commitments need no exclusion: \sqsubseteq^1 is marker-blind on its lower side, which is exactly why the one-shot order is the right comparison. Rank commitments do: a solution realizing a foreclosed sporadic/conjugate branch can have smaller rank at a variable than \gamma_\uparrow’s committed value, and no marker-blind order repairs a rank gap. On deferral-free runs \Phi_0^{\mathrm{rk}} = \Phi_0 and the statement is unconditional.

The slogan “smallest tensors, most general shapes” survives in content; what the \approx-adoption costs is only canonicity of where \gamma_\uparrow’s markers sit — consistent with Thm. 5.6(3)’s weakening and Remark 2.15(v).

Proposition 6.5 (Leaf closing is locally least; the holes caveat). [proved, with stated scope] If at step 1 all of a leaf variable’s caps are ground, the committed value is the \sqsubseteq-least value satisfying its stored bounds (the join of lower bounds — Props. 1.3, 2.7). If some cap has non-ground positions (“holes”), the 1_\emptyset fill is a guess: solutions assigning a concrete atom at a hole exist and are \sqsubseteq-incomparable to the committed one. This is the precise sense in which closing is “deliberately non-principal,” and the parameter error is its guard.

Remark 6.6 (Non-principality witnessed). \Phi = \{3_b \sqsubseteq \alpha\}, \alpha a leaf: \gamma_{\mathrm{close}}(\alpha) = 3_b; \gamma_\uparrow(\alpha) = 1_\emptyset; both are solutions, neither factors through the other by substitution (Example 3.5). Closing selects by policy, justified by allocation (leaves) and parsimony (interiors), not by entailment.

Remark 6.7 (Order-robust closing, and what backtracking would buy). [sketch for (i); design notes for (ii)–(iv)]

Order-robustness. With transitively-closed cap stores (Remark 6.3’s discipline), leaf closing cannot spuriously fail on hole-free stores at dimension sort: each commitment is the join of every cap the variable will ever receive, hence the \sqsubseteq-least admissible value, and the invariant “the partial commitment extends to a solution” is preserved across commits — the caps sit below the commit by construction; a downstream v \sqsubseteq w survives because \mathrm{commit}(v) \sqsubseteq \gamma(v) \sqsubseteq \gamma(w) for the witnessing solution \gamma; an upstream contributor was committed (and its value forwarded) earlier; and the remaining constraint forms — pins v \sqsubseteq c and top-pins 1_\emptyset \sqsubseteq v — are semantic steps discharged by the re-solve. So under the discipline, every closing failure traces to a hole fill (Prop. 6.5’s caveat), a row-sort placement or rank commitment, or genuine unsatisfiability. [proved at dimension sort — A.7, via cap-set inclusion and join monotonicity, making closing under the discipline order-independent in value; the row-sort transfer covers flat content, with placement and extent routed to (ii)(a)]
The backtracking inventory. Suppose the solver were made nondeterministic, with backtracking, to recover an unconditional failure direction for Cor. 5.5. First a polarity limit: backtracking recovers spurious failures only — it searches for a succeeding branch, and cannot un-accept — so the shifted-surplus inequality leak (§10: [3]\cdot\langle\rho\rangle \sqsubseteq \langle\rho\rangle\cdot[5] accepted though unsatisfiable) is untouched on every branch; that gap needs the chain-entailment admissible rule, orthogonal to search. The spurious-failure sources are, by Lemma 5.1’s accounting, exactly the policy steps plus closing, and they classify:

Scheduling problems, not search problems: commitment order (part (i)), and hole fills. Holes range over an infinite value domain, so blind enumeration is hopeless; but a spurious 1_\emptyset-fill is directed — in the flat dimension order, a failed recheck names the only viable repair (its ground side) — so “retry with the value from the failure” is a finite discipline, and deferring hole commitments past all rechecks removes the failures without any search. What remains at holes is Prop. 6.5’s intent-level non-principality (guarded by the parameter error), not a satisfiability gap.
Genuine, finite choice points: (a) marker placement at every committing site — the equality bindings of Def. 4.3(b)/(c-nested) and the row-closing joins — with |m|+1 branches for an entailed middle m; this recovers exactly the policy rejections (Remark 2.11’s witness is the test case). (b) The word-equation witness at the deferred equations of Def. 4.3 — the rotational case (d) and the split-surplus case (c-split). For (d), the family x = u(vu)^k with s = uv, t = vu is indexed by a rotation point of s (finitely many) and a power k \ge 0, with k bounded by whatever rank facts the store imposes on \rho; absent rank pressure, satisfiability of the isolated equation is the bare conjugacy test — no search at all. For (c-split), the choice is among the \le |s| sporadic prefix solutions plus the principal family (its parameter pinned by later constraints) — finite branching again.

Constraint selection is not itself a completeness choice point. Among semantic steps, processing order is \mathrm{Sol}-irrelevant (Lemma 5.1 preserves \mathrm{Sol} exactly), so once (a) and (b) are explicit branch points, selection order affects efficiency and which representative is reached — Remark 2.15(v)’s placement races — but never success-versus-failure. The search space factors: a confluent-for-satisfiability semantic core, with finite policy branching grafted at the committing sites.
Payoff and cost. With (a) + (b) the failure direction of Cor. 5.5 becomes unconditional — the backtracking solver decides \approx-satisfiability outright, modulo Thm. 5.2(b) for termination on unsatisfiable inputs — and the surface-discharge conjecture is demoted from correctness hypothesis to efficiency statement: on provenance-respecting inputs the placement choice point should never branch. The deterministic solver is recovered as the first-branch truncation, with its policy commitments reread as a branch-selection heuristic. [open — a candidate theorem if the paper has room; otherwise the deterministic presentation plus this remark]

7. Projection inference

Throughout, fix one operation; its shapes are solved and closed (ground). Its constraint set \Phi_{\mathrm{op}} is re-derived with freshened projection identifiers: each axis of each participating tensor occurrence carries an id p \in P, injectively, used by this operation only; \mathrm{sz}(p) \in \mathbb{N}_{\ge 1} is the axis’s solved size. (The spec’s fresh row variables are eliminated by the local re-solve — Lemma 7.7 — and its label variables resolve to operand axes, inheriting ids.)

Definition 7.1 (Projection language). Atoms q ::= \mathsf{Proj}(p) \mid \mathsf{Sol}(\mathit{idx}), where \mathit{idx} is an externally supplied index: \mathsf{Fix}(c) or an external symbol. Equations e ::= \mathsf{Eq}(q_1, q_2) \mid \mathsf{Iter}(q), with \mathsf{Eq} symmetric.

Definition 7.2 (Elaboration \llbracket\cdot\rrbracket). From the ground \Phi_{\mathrm{op}}: P1. d_1 = d_2 \rightsquigarrow \mathsf{Eq}(\llbracket d_1\rrbracket, \llbracket d_2 \rrbracket). P2. d_{\mathrm{res}} \sqsubseteq d_{\mathrm{op}} with \mathrm{sz}(d_{\mathrm{op}}) = 1 \rightsquigarrow \mathsf{Eq}(\llbracket d_{\mathrm{op}}\rrbracket, \mathsf{Sol}(\mathsf{Fix}\,0)) (sever and pin). P3. d_{\mathrm{res}} \sqsubseteq d_{\mathrm{op}} otherwise \rightsquigarrow \mathsf{Eq}(\llbracket d_{\mathrm{res}}\rrbracket, \llbracket d_{\mathrm{op}}\rrbracket). P4. R_1 \approx R_2: outer-edge alignment; P1 per aligned pair (ranks match — shapes are closed and the equality held). P5. R_{\mathrm{res}} \sqsubseteq R_{\mathrm{op}}: outer-edge alignment; P2/P3 per aligned pair, with P2’s severance applied row-wise (result side of the pair \rightsquigarrow \mathsf{Iter}, operand pinned); each surplus interior result axis \rightsquigarrow \mathsf{Iter}(\llbracket\cdot\rrbracket). P6. each terminal axis \rightsquigarrow \mathsf{Iter}(\llbracket\cdot\rrbracket). Side condition: \mathsf{Sol}(\mathsf{Fix}\,c) is emitted only with $0 c < $ the axis’s size (slices are validated against solved sizes).

Definition 7.3 (Solver). A single pass over the finite equation set E maintaining: a union–find partition of P; a partial pin map on classes; an iterate set I of classes. \mathsf{Eq}(\mathsf{Proj}\,p, \mathsf{Proj}\,q): union, \mathsf{fail} if class sizes differ. \mathsf{Eq}(\mathsf{Proj}\,p, \mathsf{Sol}\,i): pin p’s class at i, \mathsf{fail} on a conflicting pin. \mathsf{Eq}(\mathsf{Sol}\,i, \mathsf{Sol}\,j): check i = j. \mathsf{Iter}(\mathsf{Proj}\,p): add class to I. \mathsf{Iter}(\mathsf{Sol}\,\_): no-op. Labeling: pinned class \mapsto its pin (pins dominate I); unpinned class in I with size > 1 \mapsto a fresh iterator symbol; otherwise \mapsto \mathsf{Fix}(0). The product space P^{\times} = \prod_j R_j, one factor R_j = \{0..n_j{-}1\} per fresh iterator with n_j the class size; the index map \pi_T of tensor T maps each axis to its class’s label.

Theorem 7.4 (Canonicity — Theorem 2 of the template). [proved] For a finite E: the partition, the pin map, the iterate set, and the labeling are independent of processing order; the solver fails iff (a) some \approx-class (the least equivalence containing the \mathsf{Eq}(\mathsf{Proj},\mathsf{Proj}) pairs) contains two ids of different sizes, or (b) some class receives two distinct pins, or (c) some \mathsf{Eq}(\mathsf{Sol}\,i, \mathsf{Sol}\,j) has i \ne j. On success the output is unique up to a bijective renaming of the fresh iterator symbols. The solver resolves no ambiguity: it has no default, no direction, and no policy.

Proof. \approx is a closure operator’s value — the least equivalence containing the given pairs — hence order-independent; union–find computes exactly it. Size failure: if a class has two sizes, then along any processing order some union merges sub-classes of different recorded sizes (sizes are constant per id; a class’s recorded size is any member’s, well-defined per sub-class by induction), so failure occurs; conversely if all classes are size-uniform no union can fail. Pins: the pin map on classes is the lift of the partial function induced by the \mathsf{Eq}(\mathsf{Proj}, \mathsf{Sol}) equations along \approx; failure iff the lift is inconsistent — a property of (E, \approx), not of order. (c) is a per-equation check. The iterate set is the \approx-saturation of the marked ids — again a closure. The labeling is a function of (partition, pins, I, \mathrm{sz}); the only free choice anywhere is the identity of the fresh symbols, one per labeled class, whence uniqueness up to bijection. \square

Definition 7.5 (Reduction; coverage). A factor j of P^\times is a reduction axis iff no component of \pi_{\mathrm{LHS}} is \mathsf{Iter}(s_j).

Proposition 7.6 (Injectivity, surjectivity, and the reduction characterization). [proved] With core labels only (\mathsf{Iter}/\mathsf{Fix}): (i) \pi_{\mathrm{LHS}} : P^\times \to \mathrm{Addr} is injective iff every product variable occurs in some component of \pi_{\mathrm{LHS}} — i.e. iff there is no reduction axis. Proof: (\Leftarrow) the components mentioning each s_j reconstruct the point. (\Rightarrow) a missing s_j has n_j \ge 2 (factors are minted only at size > 1), so two points differing only in s_j collide. (ii) \pi_{\mathrm{LHS}} is surjective onto the LHS’s index domain iff every LHS axis of size > 1 is labeled \mathsf{Iter} and the map (axis \mapsto its variable) is injective on those axes. Proof: (\Leftarrow) given a target multi-index, set each named variable by its (unique) axis, others arbitrarily; \mathsf{Fix}(0) axes have size 1 by the labeling rule (an unpinned size-1 class) or are pinned slices, excluded on the LHS in the core. (\Rightarrow) a size->1 axis labeled \mathsf{Fix} misses indices; two axes sharing s hit only the diagonal. (iii) Hence: the accumulating read-modify-write is required iff a reduction axis exists (non-injectivity), and pre-initialization is required iff \pi_{\mathrm{LHS}} is non-surjective or (under an erasing accumulator) non-injective — the =:+ analysis, now a corollary.

Lemma 7.7 (Local re-solve succeeds; locality). [sketch] If the global solve and closing succeeded, then for each operation: (a) the per-operation re-derivation (spec template with fresh row/dimension variables against the ground operand shapes) has a solution, and the core solver finds it, eliminating every spec variable; (b) the resulting \approx depends only on \Phi_{\mathrm{op}} — by construction of the freshened ids, no constraint of any other operation mentions any id of P, so no external fact can enter the closure. Proof sketch for (a): the operation’s constraints are a subset of \Phi_0 up to the freshening bijection on ids (which does not affect sizes) and renaming of spec variables; \gamma_{\mathrm{close}} restricted along that bijection is a solution; by Corollary 5.5 the solver cannot fail on a satisfiable set, and at ground shapes the row alignments pin every spec row variable, after which every label variable is unified with a ground operand axis. [The bijection/subset bookkeeping should be spelled out against the actual elaboration in shape.ml; flagged for the full paper.]

Theorem 7.8 (Soundness of projections w.r.t. shapes — Theorem 3 of the template). [proved given 7.7] Let shapes be solved and closed, the per-operation elaboration as in Def. 7.2 with its side condition, and the local solve succeed. Then: 1. (Bounded addressing) for every participating tensor T and every p \in P^\times, \pi_T(p) is a valid multi-index: componentwise, an \mathsf{Iter}(s_j) component addresses an axis whose size equals n_j, and s_j < n_j; a \mathsf{Fix}(c) component has c < the axis size. 2. (Co-iteration is finer than size-equality) p \approx q \Rightarrow \mathrm{sz}(p) = \mathrm{sz}(q), and \approx is the least equivalence justified by this operation’s own constraints (Thm. 7.4 + Lemma 7.7(b)) — in particular, axes identified in size by other operations’ constraints are not co-iterated here.

Proof. (2) Size-uniformity of classes is the solver’s invariant (Thm. 7.4(a)); leastness is closure; locality is Lemma 7.7(b). (1) \mathsf{Iter}: the factor’s range was minted with the class size, the class size equals each member axis’s solved size by (2), and s_j ranges over \{0..n_j{-}1\}. \mathsf{Fix}(0) from the labeling: the axis’s class has size 1… — careful: an unpinned, uniterated class of size >1 also labels \mathsf{Fix}(0) by the rule; 0 < size always, so addressing is valid (such an axis is read at 0 — this is the broadcast severance and the no-partner case, both intended). Pinned \mathsf{Fix}(c): the side condition of Def. 7.2. External symbols: validity is the caller’s contract (a kernel argument), recorded as a premise. \square

Remark 7.9 (Asymmetry of the two passes, now formal). Shape solving needs a closing policy (Def. 6.1; non-principal by Remark 6.6). Projection solving needs only a naming (Thm. 7.4: forced, unique up to renaming, no direction). The slogan “more principal than shape inference” is these two statements side by side.

8. Toward the staged extensions: what is proved, what is open

Two tracks, which the entries should not let the reader conflate: Prop. 8.1 and Conj. 8.2 live on the projection side (§7) — padding margins and the stratum-2 extension of the projection language; Conjs. 8.3 and 8.4 live on the shape side (§§5–6) — the closing policy and the staged solve over the full constraint language.

Proposition 8.1 (Padding canonicity — projection side). [proved] The padding margin of an axis id is defined as \max over the finite set of operations addressing it of their kernel extents. A finite fold of the commutative, associative, idempotent operation \max is independent of accumulation order; hence the margin committed at finalization is canonical — order of constraint processing does not affect it. (This moves padding out of the open problem.)

Conjecture 8.2 (Stratified determinacy of projection inference). [open] Extend the projection language of Def. 7.1 with \mathsf{Affine}(\sum_i c_i s_i + o) and \mathsf{Concat}(s_1..s_k) and the solver with the second stratum (evaluate derived terms against the stratum-1 classes; deferred equality checks between derived terms; connected-component coupling for \mathsf{Concat}). Claim: the output remains unique up to renaming of fresh symbols and reordering within Prop. 8.1’s folds — the staged solver still resolves no ambiguity; what is lost is only flatness (a conflict between derived terms is detected in stratum 2, after the constraint’s site). Expected proof shape: stratum 1 is Thm. 7.4; stratum 2 is a deterministic evaluation over its output; the coupling is a connected-components closure (order-independent like \approx); deferred checks are pure checks.

Conjecture 8.3 (Two-unit closing — shape side). [open] With discardable_vars computed from the spec by the quantifier-alternation test, closing guesses discardable variables to 0 and others to 1_\emptyset. Claim: Prop. 6.4’s characterization splits — uniform-upward closing remains greatest within the broadcast order’s carrier, while discardable variables are off that order (the coproduct’s additive order), and the combined policy is sound (Prop. 6.2’s argument should survive: re-emission against 0 at a discardable occurrence discharges because the occurrence is, by the discardability test, projected away). The interaction of the test with row-variable instantiation is where a surprise would live.

Conjecture 8.4 (The staged solve over the full constraint language: soundness, and completeness only relative — shape side). [open; soundness is the part that matters and is expected to hold; unrestricted completeness is not expected] For the full constraint language — Total_elems, Exact, Concat, affine size relations:

(a) Soundness. Every committed solution of a successful staged run satisfies \Phi_0. This should go through by the existing pattern — commitments enter as equalities, and each stage re-solves to fixpoint before the next stage’s commitments fire (Prop. 6.2 lifted along §10’s stage table) — but the care is in the size arithmetic of affine traversals, which has two sizing modes: valid convolution (zero offset; the output extent shrunk by the kernel’s reach, the operand required to cover it) and size-preserving (the offset derived as the left part of dimensionality-preserving symmetric padding; the output is the input scaled by stride alone). Soundness requires that the mode under which a constraint was generated, the mode encoded in the stored Affine/conv term, and the mode the final checking semantics applies agree per axis — divisibility and rounding of strided extents, and the derived offset, are where the agreement can silently break. [open — write out the per-mode size equations and state the agreement as an invariant]

(b) Why completeness must be relative. Incompleteness and non-principality necessarily worsen, for a structural reason: the constraint language’s expressive power grows significantly — sums of extents via concatenation, monomial products via Total_elems (a row’s element count equated to a numerator, divided by a variable set), single-variable affine relations — while the expressive power of bindings and the bound store grows only mildly: bindings remain first-order terms (Affine, Concat), and the store holds per-variable caps plus a single stored constraint. Arbitrary monomials are representable (via the stored Total_elems); arbitrary rational polynomials are not. So the representation theorem (Thm. 5.6(1)) must fail in general: \mathrm{Sol}(\Phi_0) can be a solution set the solved-form language cannot denote, and the stages bridge the gap by guessing (Stage 5’s Total_elems/Exact heuristics) — non-principality by design, and incompleteness wherever a guess forecloses a solution the store could not have carried. (The gap is absolute, not asymptotic: a Rows_constr carrying Total_elems equates the product of the concatenated rows’ extents — with Concat axes contributing sums inside the product — to its numerator term, so a single constraint is already a polynomial equation over the naturals, and a system of them is the full Diophantine fragment: satisfiability of the unrestricted constraint language is undecidable outright (Matiyasevich). Completeness claims can therefore only ever be about the fragment the frontend actually generates.)

(c) The open question, restated. Not Cor. 5.5 verbatim but its relative form: characterize the fragment on which the stage triggers guess only forced or solution-irrelevant values — where the staged solve is complete — and exhibit the boundary with counterexamples just outside it. The blog’s “heuristics for when to guess an underdetermined variable” is exactly the object to formalize.

9. Discoveries relative to the blog series (action items)

Theorem 1 must be restated (Example 3.5): no principal model exists once a cap is residual; the correct result is Theorem 5.6 (representation + no-guessing + most-generality), with the principal model recovered as Corollary 5.8 when no caps remain. Post 1’s “principal model” paragraph and Post 5’s Theorem-1 statement both need this correction.
Termination requires the rank-cycle check (Example 4.5a, Def. 4.5): the rules as narrated diverge on mutually-growing row variables; the check should be presented as part of the core, not as an implementation heuristic. Verified — and the implementation in fact lacked it: cycles of length \ge 3 diverged until global_rank_edges was added (§10, Outcome; test/einsum/test_row_rank_cycle.ml).
1_\emptyset \sqsubseteq \alpha pins (DI-pin-top): a missing case in the blog’s pins/caps narrative.
Post 5’s closing conjecture is now Prop. 6.4 — in the \approx-corrected form: \gamma_\uparrow \in \mathrm{Sol}(\Phi_0) unconditionally; pointwise-greatest in the marked order only relative to the policy-strengthened system (the residual store encodes placement commitments, so Prop. 5.4(ii) does not quantify over all of \mathrm{Sol}(\Phi_0) — the one-constraint counterexample is in 6.4(ii)); greatest absolutely up to \approx, in the \sqsubseteq^1 form. The bonus Corollary 5.5 carries the matching caveats (policy rejections; rotational deferrals discharged).
The closing interleave is required for soundness (Remark 6.3) — promote from procedural note to a stated lemma with the three-constraint example — and is order-sensitive as stated: closing \beta first fails the same satisfiable store. The order-robust form is fixpoint-before-commit, which the implementation has architecturally at both sorts — Stage 1 only propagates, commitments are confined to the on-demand stages (Remarks 6.3, 6.7(i); the stage table in docs/shape_inference.md; test/einsum/test_closing_order.ml).
Padding is canonical (Prop. 8.1) — trivially, and worth saying; it shrinks the open problem to Conjectures 8.2–8.4.
Row order metatheory (Lemma 2.4, Props. 2.5–2.7) is new writing: the blog asserts the order and uses joins implicitly; the join formula is what justifies cap accumulation and Def. 6.1’s “join of the ground parts.”

Remaining gaps before the FProPer paper (the bulk of the previous list is discharged in the appendix, docs/formal-core-appendix.md — its status table is the ledger): the Detection Lemma of Thm. 5.2(b), reduced in A.9 to the feeding-relation quotient bookkeeping (D2) and the circulation-transport argument (D3), with a prescribed counterexample hunt through case-(i) residues; the marker-provenance invariant and surface discharge (A.10 — statements recorded; needs the surface-language grammar, and the slice’s left flank is the suspect to probe first); Lemma 7.7’s elaboration bookkeeping against shape.ml — the remaining implementation-facing check; the conjectures of §8 — where 8.4 splits into the soundness invariant for the two affine sizing modes (8.4(a), the part that matters) and the relative-completeness characterization (8.4(c); unrestricted completeness is structurally out, by 8.4(b)’s Diophantine-completeness); and optionally Remark 6.7(iv)’s finite-branching decision theorem and cross-strategy confluence (both unneeded for the paper after A.4 fixes the strategy). Newly discharged: Lemma 5.1’s assembly (A.1, which also forced Def. 4.3(c)’s correction to deferral — a genuine rule change found by proving); Lemma 5.2.2 and the lineage bound (A.2–A.3); Prop. 6.2’s re-solve termination, strengthened to unconditional (A.5); Prop. 6.4(iii)’s \sqsubseteq^1-greatestness (A.6); Remark 6.7(i)’s order-robustness at dimension sort (A.7); Remark 2.15(iv)’s inference-policy-vs-checking theorem (A.8).

10. Verification against the implementation (`row.ml`, `indexing.ml`, `assignments.ml`)

Confirmed (formalization = code):

DI-pin-top is implemented. solve_dim_ineq has the exact case, with comment “res is the top: [top ⊑ opnd] only by equality — the top refines only itself; force it.” Discovery 3 downgrades to a blog-prose gap; the code is correct. The top is spelled 1_(bcast_if_1).
DI-cap collapse is implemented and documented as intentional: conflicting ground bounds force the variable to the top, with the comment “Intentional broadcast semantics… This is NOT a bug — do not tighten to raise Shape_error.”
Row-equality occurs check (Def. 4.3(d)): unify_row raises “Infinite number of axes by self-reference” on same-variable rows with unequal known ranks — and equality binds and substitutes, so multi-step equality cycles collapse to the self-reference case and are caught.
Prop. 2.7’s join is join_dim in the row-bound merge: top absorbs; distinct sizes or distinct bases join to the top; extended consistently to Affine size relations.
Outer-edge flank alignment in both unify_row and solve_row_ineq, with comments stating the anchoring — Defs. 4.3, 7.2 match.
Closing as staged solving (Def. 6.1 refines to 7 stages): Stage 4 = leaves downward to the stored bound (glb); Stage 5 = hole guessing (1, or 0 for discardable_vars; parameters error “You forgot to specify the hidden dimension(s)”); Stage 6 = rows upward to the empty row with the inequality re-emitted for re-checking — Prop. 6.2’s by-construction soundness is literally how solve_row_ineq implements stage 6; Stage 7 = final dimension guesses. The interleave of Remark 6.3 is the stage pipeline. Order-robustness at dimension sort is by eager forwarding, not by closing order: solve_dim_ineq forwards a stored ground bound to the variable’s opnd-neighbors when a new adjacency is recorded (the from_glb emissions) and forwards an arriving ground bound to already-recorded neighbors, so the glb stores are transitively closed before stage 4 runs and the commit adds no downstream information — the fixpoint-then-commit discipline of Remark 6.7(i). Remark 6.3’s example resolves at Stage1 (\beta demoted to the broadcast top by the DI-cap merge), in all four emission orders of its constraints and terminals: pinned by test/einsum/test_closing_order.ml. Confirmed (design principle, both sorts): fixpoint-before-commit is the architecture, not a code path. Stage 1 — the only online stage — exclusively unifies and propagates at both sorts; all commitment forms are confined to the on-demand stages 2–7 (GLB commits at 3–4, guesses and Total_elems/Exact resolution at 5, no-further-axes generalization at 6, final dimension guesses at 7 — the stage table in docs/shape_inference.md). The staging exists precisely to place fixpoints at the right times relative to commitments (the stage count is likely not minimal — the doc counts 8 with a fractional split of an earlier design, row.ml’s stage type has 7; the fixpoint placement is the point), so the \beta-first trap is designed out in general: no commitment fires while the bound it commits to can still grow.
Prop. 7.6 is the spec of Indexing.is_injective/is_surjective; assignments.ml states Prop. 7.6(iii) as a comment: “Need initialization if: initialize_neutral AND (not surjective OR not injective).” Note is_injective is exact on the core forms and a sound under-approximation on Affine (an LHS affine index naming two product iterators is conservatively declared non-injective regardless of coefficients) — worth one sentence in the paper.
Terminology footnote: the code’s glb field stores what §§5–6 call the join of the recorded lower bounds (the least value above all caps — the greatest lower bound of the admissible region, whence the name). Align the paper’s wording to avoid lattice-direction confusion.

New admissible rule found in the code (add to §4):

DI-antisym (transitive, dimension level): solve_dim_ineq carries a recursive cyclic walk over the stored adjacency lists; an incoming \alpha \sqsubseteq \beta closing a directed adjacency cycle is replaced by equality. Sound in any poset (a \sqsubseteq-cycle forces equality throughout); include as an admissible rule. The row level has only a one-step version, guarded by equal known flank lengths — which is exactly why the next item bites.

The divergence (Example 4.5a) is live in the code, by trace: the deficit rule (template at row.ml ~2835) is guarded only by template memoization keyed on (res_v, deficit) — and every round of the mutual-growth cycle presents a fresh res_v, so the cache never hits; row bounds re-emit as Row_ineqs when a row variable is solved (confirmed at ~2188–2197), which is the ping-pong step; the outer solve_inequalities fixpoint stops only when the constraint list is empty or unchanged, and each round mints fresh variables, so it never stabilizes; and no transitive rank-cycle check exists at the row level. Stage caveat: leading-flank surpluses hit a not (is_stage7 stage) postponement, so a repro should put the surplus axes in the trailing flank.

Decisive experiment (unit test, no surface syntax needed): feed solve_inequalities directly \{\ \mathsf{Row\_ineq}\{res = \langle\rho_1\rangle;\ opnd = [a]\cdot\langle\rho_2\rangle\},\quad \mathsf{Row\_ineq}\{res = \langle\rho_2\rangle;\ opnd = [b]\cdot\langle\rho_1\rangle\}\ \} (with [a], [b] in dims, not beg_dims). Expected on current code: non-termination (or unbounded growth until interrupted). Expected with Def. 4.5’s check: Shape_error reporting the rank cycle. Candidate surface repro (adapt syntax): two self-referential %cd producer assignments where each tensor is pointwise-below an einsum-built tensor that appends a trailing axis to the other’s row via a shared ellipsis variable — the SGD self-reference idiom plus "... => ..., k"-style sharing. Unsatisfiable by rank; the unit test is the cleaner check.

Proposed fix (matches Def. 4.5): extend the existing Bounds_row res/opnd adjacency lists to a transitive walk analogous to the dimension-level cyclic, weighted by known-flank surplus at the time the adjacency is recorded; fail (or, conservatively, force equalities as the dim level does for the zero-weight case and error on positive weight) when the deficit rule would fire on a variable lying on a positive-weight cycle. The zero-weight transitive case could also collapse to equalities, generalizing the current one-step row antisymmetry.

Outcome (experiment run, divergence confirmed, fix implemented — test/einsum/test_row_rank_cycle.ml):

The decisive experiment as stated does not diverge: on the two-variable instance the pre-fix solver terminates with Shape_error: Infinite number of axes by self-reference. The template memoization plus equality-binding happen to collapse the two-variable ping-pong into the one-step self-reference check — the same collapse-to-self-reference mechanism noted for multi-step equality cycles under Def. 4.3(d). So the minimal Example 4.5a needs restating: two mutually-growing variables are caught (by accident of the cache, not by design).
The divergence is live one variable up: the three-variable cycle \{\rho_1 \sqsubseteq \langle\rho_2\rangle{\cdot}[a],\ \rho_2 \sqsubseteq \langle\rho_3\rangle{\cdot}[b],\ \rho_3 \sqsubseteq \langle\rho_1\rangle{\cdot}[c]\} (trailing flank) ran unboundedly until killed. Mixed-flank, unequal-deficit, and beg_dims two-variable variants all happened to terminate via the self-reference collapse; only cycles of length ≥ 3 escaped it.
The proposed fix as stated is insufficient: a transitive walk over the stored Bounds_row adjacency cannot see the cycle, because s_row_one_in_entry removes a solved variable from all adjacency lists and re-emits those bounds as in-flight Row_ineqs — the rank facts perpetually alternate between the store and the constraint queue, and the store always lags one ping-pong step. (This also bears on the paper: the termination potential of Thm. 5.2 must be stated against a monotone structure, not the substitution-mutable store.)
Implemented fix (row.ml, global_rank_edges): a persistent, never-retracted rank-relation graph beside the store, with edges \mathrm{rank}\,v \ge \mathrm{rank}\,w + k (k \ge 0) recorded at (i) row-variable solving v := \langle w\rangle{\cdot}(k\ \text{known dims}), (ii) equal-flank row inequalities (k = 0), and (iii) the deficit rule itself (k = deficit > 0). An edge closing a cycle of positive total weight raises Shape_error: Infinite number of axes by rank cycle among row variables; zero-weight cycles are permitted (they assert rank equality, and the one-step row antisymmetry still collapses them). Recording facts permanently is sound because the solver never retracts constraints; soundness of each edge kind is immediate from res \sqsubseteq opnd \Rightarrow \mathrm{rank}(res) \ge \mathrm{rank}(opnd). Caution from a first buggy attempt: the GLB-residue case (operand variable, result with surplus known flanks) entails only \mathrm{rank}(res_v) \ge \mathrm{rank}(opnd_v) - \text{surplus} — a nonpositive weight the graph must not record positively; for the paper, Def. 4.5’s weights need this sign discipline spelled out. On the three-constraint repro the cycle is now detected already at deficit-edge recording in round 1, before any template variable is minted. (§§4–5 have since been rewritten to match: Def. 4.5 records the three edge kinds with the sign discipline; Prop. 4.6 states the check’s exactness over the recorded facts; Thm. 5.2 is split into the rank-satisfiable case — proved modulo flagged lemmas — and the open Detection Lemma, with the spiral-vs-anchor mechanism traced in §5.)

Marker sensitivity of closed-row equality — a formalization/implementation mismatch (found while adding the complement clause to Def. 4.3(b)):

Defs. 2.1/3.4 originally declared ground equality marker-sensitive ([3]\cdot\diamond\cdot[4] \ne [3,4]\cdot\diamond\cdot[\,]) — the implementation disagreed for closed rows, and the disagreement is now resolved in the implementation’s favor (Def. 3.4 adopts \approx): unify_row’s closed–closed case compares the flattened axis lists (beg_dims @ dims; comment “Two closed rows must equal axis-by-axis”), and its open-vs-closed case matches the open side’s flanks against the closed side’s flat list, letting the open leading flank spill past the closed side’s beg_dims.
The flat reading is load-bearing, not a bug: a literal tensor’s row is closed with all axes in dims (marker at the front), so [3]\cdot\langle\rho\rangle\cdot[4] \approx [\,]\cdot\diamond\cdot[3,5,4] — an everyday einsum-spec-vs-shape equation — must succeed with \rho \mapsto [5], and in the code does so precisely by leading spill; marker-sensitive equality (Def. 3.4’s former reading, since replaced by \approx) would reject it.
The flat reading is also genuinely coarse: closed rows with nonempty beg_dims are reachable (a deficit-template binding \rho \mapsto [\alpha]\cdot\langle\rho'\rangle\cdot[\beta] whose \rho' later closes yields one after substitution), and two flat-equal closed rows with different markers expand differently as operands (Def. 2.2 inserts 1_\emptyset at the marker) — flat equality identifies rows the order distinguishes.
An asymmetry to boot: trailing spill is rejected up-front (“Number of axes mismatch”, the dims1_l > dims2_l guard), so \langle\rho\rangle{\cdot}[3] against a closed row holding its 3-axis in beg_dims fails where the mirror image succeeds. Under the flat reading this is a completeness quirk — probably invisible in practice (closed rows usually carry all axes in dims), but worth a probe test.
The metatheory of the mismatch is now §2’s Def. 2.9 / Prop. 2.10 / Remark 2.11: the flat relation is an equivalence, not an equality — the order induces only the trivial equivalence (antisymmetry), distinct \approx-related rows are always \sqsubseteq-incomparable, and \approx is not (and no nontrivial equivalence is) a \sqsubseteq-congruence. So \approx cannot be folded into the order; adopting it at equality sites leaves marker placements underdetermined, and the implementation resolves them by policy (\approx-checking, marked-committing — it inherits the closed side’s structural split). Remark 2.11’s two-constraint witness separates the candidate semantics and is the probe test to write: the implementation fails it, correctly under marked semantics, incompletely under \approx-semantics.
Downstream adjustments — executed, with the semantics now fixed (Def. 3.4 adopts \approx for ground row equality; user decision): Def. 2.1 notes its marker-sensitive identity is algebraic, not constraint satisfaction; Def. 3.4 states the three equivalent formulations (\approx of Def. 2.9; rigidified identity \mathrm{flat}(\cdot)^\bullet in Def. 2.12; mutual \sqsubseteq^1 — their coincidence proved inline at Remark 2.15(ii)); Lemma 5.1 is restated with the semantic/policy step split (flat decomposition entailed, placement a policy coordinate; policy fails are rejections, not unsatisfiability); Thm. 5.6(3) weakened to most generality up to \approx, with a theorem-level caveat tying (1)/(2) to placement-undiscriminated inputs; Cor. 5.5’s failure direction is modulo policy, with the surface-language discharge stated as a conjecture there; Def. 4.3(a) is rank-only with a flat zip; Def. 4.3(b)’s per-flank overflow relaxed to total overflow (the implementation’s trailing guard noted as a conservative deviation); Def. 4.3(d)’s rotational deferral is now an explicit incompleteness on nontrivial conjugate instances. Inequalities remain in the marked order throughout — markers do their semantic work at broadcast sites, and only there.
Def. 4.3(c) audit correction (found while writing A.1’s per-rule proof): the split-surplus case as previously written (fresh-variable binding committing the principal family of s\cdot x_1 = x_2\cdot t) does not match the implementation — unify_row’s cross-surplus branch (beg_handled = false, ~row.ml:2242) binds nothing and re-emits the residual equation: deferral into closing, the same mechanism as the rotational case. The implementation is the better rule: the binding would lose the sporadic cross-overlap solutions that the adopted \approx-semantics admits (and would break rank conservativity — A.2’s remark), while the deferral finds them via the upward close (least-material = outermost sporadic), exactly when the run’s closing order cooperates. Def. 4.3(c) is corrected to the deferral; pinned in test/einsum/test_row_self_reference.ml (cross-surplus trio, both constraint orders, plus the mirror orientation documenting that the trailing guard — not the deferral — is what rejects there: a placement-sensitive policy rejection).
Preferred resolution (proposal): the two-sorted ground algebra of Def. 2.12 — adjoin rigid (markerless, rank-rigid) ground rows; rigid rows may refine broadcastable operands but never serve as operands of higher rank (Prop. 2.13 shows the asymmetry is forced by transitivity). Equality becomes content equality with no marker to place (Remark 2.14); the marker policy is replaced by a static sort discipline with a preservation theorem to prove; the trailing-guard asymmetry becomes plain incompleteness; the implementation distance is a rigidity bit on closed rows plus rank-checking inequalities against rigid operands. Open before adopting: the mixed join formula for closing’s caps, and the shape.ml sort audit (do any row variables cross sorts?). Refined by Remark 2.15 after correcting the einsum reading (... is a kind-indexed row variable; results broadcast on the left by convention): the checking semantics is an inequality sandwich under the one-shot order \sqsubseteq^1 (rigidified lower sides — the hereditary marked order provably excludes intended matches), row variables are uniformly rigid-sorted with no crossing, and the implementation’s equivalence constraints are to be presented as einsum’s inference policy strengthening the checking semantics; the audit becomes confirming shape.ml’s constraint sources against Remark 2.15(ii)–(iv).

shape.ml audit (Remark 2.15(ii)–(iv) confirmation; run after the remark was written):

(i) confirmed. lexer.mll/parser.mly: ... parses to a row variable named after its kind (if rv = "..." then kind — i.e. batch/input/output); ..name.. is the explicitly-named variant. row_var_env is created fresh per einsum and keyed by name (Hashtbl.find_or_add), so the same name across slots is the same variable, shared between operand and result rows — the transport mechanism, exactly as stated.
(iv) confirmed. The binary einsum emits exactly nine Row_eq and no Row_ineq: per kind, cur_sh = T_lhs (result tensor = result template) and T_rhs_j = sh_j (operand template = operand tensor), with the templates sharing both variable environments. Unary Permute and Block follow the same pattern. (The orientation differs cosmetically between the result and operand equations; unify_row normalizes on which side is open, so this is error-message presentation, not semantics.)
(ii) confirmed, and stronger than expected: the inequality sandwich is already the architecture of every non-einsum operation. Pointwise_bin/Pointwise_un/Transpose emit Row_ineq { res = cur_sh.kind; opnd = sh_j.kind } — result low, operands high — and Compose additionally wires the contraction as Row_ineq { res = sh1.input; opnd = sh2.output }. Einsum is the sole equality-based operation family; adopting 2.15(ii) would make the operation vocabulary uniformly inequality-shaped, with einsum’s equalities re-derived as its inference policy.
(iii) confirmed at every external source, with one refinement. Shape.make builds user/literal rows with beg_dims = [], all axes in dims, bcast = Broadcastable — closed left-marked rows; unspecified kinds are open with empty flanks. No-ellipsis spec rows likewise put all axes in trailing position (parser rule ([], None, specs)). Nonempty beg_dims arise from exactly two sources, both declared: the slice operation — the paradigmatic case, and the original motivation for the left flank — which prepends the outer-anchored slice axis to the result’s batch row whether that row is closed or open (expanded_batch in the Batch_slice case: stating “operand = sliced-axis \cdot \langleresult\rangle” with an open result is inexpressible without a left flank; currently exposed batch-only, but generalizable to other kinds) — and axes written before an einsum ellipsis, which reuse the same mechanism. And the interface is rigid: row_to_dims flattens beg_dims @ dims when rows exit to array dimensions and projections — the marker is dropped exactly at the boundary, which is Def. 2.12’s \mathrm{flat}(\cdot) in the flesh.
The refinement: Remark 2.15(iii)‘s “markers are never consumed from a solved row variable’s value” is too strong for the current implementation — closed rows produced by einsum equalities do serve later as broadcast operands in pointwise Row_ineqs, so solved middles’ markers are consumed. But the audit shows each such placement traces to a declared origin: external rows are left-marked by construction, spec rows place markers at user-written ellipsis positions, slice prepends at the outer-left anchor, and unify_row’s inherit-the-closed-side’s-split policy propagates these without ever minting a placement (for literal closed sides, the inherited middle is front-marked — the left convention falls out). The invariant to prove about the implementation is therefore marker provenance: every marker of every row in the system originates from a declared placement, and the unify policy preserves this. Remark 2.11’s pathological witness is exactly a constraint set whose second constraint has no declared origin for \rho’s marker — unwritable from the surface language if provenance holds. Provenance guarantees the placement is a declared one, not that it is a declaration about that tensor: Remark 2.15(v) records that the selection among racing declarations is order-sensitive, and adopts \approx-principality with placement-as-policy as the paper’s position.

Should the implementation grow rigid markers? Assessment: no (for now). Rigidity does its semantic work at comparison time, and the code already realizes it there as flattening views — closed–closed equality zips beg_dims @ dims, the closed–closed inequality compares against the result’s flat list from the outer edges, and row_to_dims flattens on exit; a stored constructor adds nothing to a view. In the one place where storage would change behavior — solved row-variable values — it would be a regression: the inherit-the-split policy transports a closed side’s declared beg/dims boundary into the solved middle (e.g. unifying against \{beg=[a,b];\,dims=[c]\} yields \rho \mapsto [b]\cdot\diamond\cdot[\,], preserving the declaration), while rigid values plus a splice-time convention would erase it. Per Prop. 2.13’s broadcast-credit addendum, genuinely rigid rows — no outstanding credit, no inherited placement — exist only at the interface, where the representation is already flat (int array). Immediate costs of storing the sort: the caps in Bounds_row would force the mixed join-of-caps formula before closing could run, plus pervasive bcast-match churn. What would change the calculus: a surface feature with semantically fixed rank (exact-rank assertions, a strict no-left-broadcast spec mode), or type-level enforcement of the sort-discipline theorem — better served by a phantom type than by runtime data.

Rotational self-reference soundness bug (found via the gh-ocannl-247 proposal review; fixed): the same-variable equality branch of unify_row checked only the total known flank lengths and unified the per-flank min-overlaps — which are empty when one side’s surplus is all leading and the other’s all trailing. So [3]\cdot\langle\rho\rangle \approx \langle\rho\rangle\cdot[5] — unsatisfiable under both marked and content semantics (the residual word equation x\,t = s\,x demands conjugate surpluses) — was accepted with no dimension checked at all: an unsound success, complementing the unsound-failure risks catalogued elsewhere in these notes. First fixed by rejecting shifted splits outright (conservative for the conjugate-satisfiable cases); then, per the resolution analysis below, the rejection was replaced by deferral into the closing policy — see the amended Def. 4.3(d). The aligned-split path is untouched and complete. Pinned in test/einsum/test_row_self_reference.ml; full suite green. The inequality twin is in better shape than a first reading suggested: solve_row_ineq computes the aligned-overlap Dim_ineqs on both flanks before its case dispatch, and the same-variable branch returns them — so \langle\rho\rangle{\cdot}[3] \sqsubseteq \langle\rho\rangle{\cdot}[5] correctly fails, and the no-surplus case is exactly the different-variables pattern minus the (vacuous-for-self) env bookkeeping: no self-adjacency, no zero-weight self rank-edge. Only the shifted-surplus residue is dropped there. Unlike the equality case, a blanket rejection is not available: shifted same-variable inequalities are routinely satisfiable (e.g. [3]\cdot\langle\rho\rangle \sqsubseteq \langle\rho\rangle\cdot[1_\emptyset] with \gamma\rho = [\,]), and the dropped conditions are x-dependent pointwise chains outside the per-dimension language. The hole is therefore narrower but real: failures stay sound, while some unsatisfiable shifted instances are accepted — [3]\cdot\langle\rho\rangle \sqsubseteq \langle\rho\rangle\cdot[5] is unsatisfiable (the pointwise chain 3 \sqsubseteq x_1 \sqsubseteq \cdots \sqsubseteq 5 cannot hold in the flat dimension order) yet passes.

Resolution analysis (follow-up discussion). Why k = 1 (shift one) is special: the rotational disjunction ranges over the value’s length mod k — trivial for k = 1 — and the least-material witness x = [\,] is universal at k = 1 (satisfiable \Rightarrow admissible, for = and \sqsubseteq alike), while for k \ge 2 neither holds (s = [3,5], t = [5,3] is satisfiable only via the rotated alignment, so x = [\,] is inadmissible and, for \sqsubseteq, the rotations are pairwise incomparable because operand-side 1_\emptyset absorbs result dims). The principled resolution is deferral into the closing policy, not eager rules: keep the shifted same-variable constraint in flight; if the variable is resolved by other constraints the substituted check is exact; if it survives to stage 6, the upward guess \rho \mapsto [\,] is the least-material disjunct (\delta = 0), applied at guessing time where guesses belong, uniformly for all k. (Eagerly binding \rho \mapsto [\,] at stage 1 would be unsound-failure-prone: a later \rho \approx [3] is jointly satisfiable with [3]\cdot\langle\rho\rangle \approx \langle\rho\rangle\cdot[3].) Deferral would supersede the conservative equality rejection above, since closed–closed equality flattens and the eventual check is complete.

The base-case bug deferral exposes: the closed–closed shifted inequality is vacuous. [3]\cdot\diamond\cdot[\,] \sqsubseteq [\,]\cdot\diamond\cdot[5] — both closed, equal ranks, unsatisfiable under marked and content semantics — is postponed to stage 7 and then accepted by the closed-operand catch-all with zero comparisons: the aligned-overlap machinery compares beg-to-beg and dims-to-dims only, never across flanks. This is a bug independently of the rigid-sort decision: broadcasting is the operand inserting 1_\emptyset at its marker, and those positions face only the result’s surplus — the operand’s explicit material always pins its corresponding result positions (d \sqsubseteq 5 forces d = 5), and at equal ranks there are no insertions at all. Def. 4.3’s case (ii) should accordingly read “the operand’s inserted positions are unconstrained — discard those,” not “everything past the structural-flank overlap”; the implementation conflates the two, invisibly when the result’s split mirrors the operand’s (both beg = [], the common case), wrongly in the shifted ones. The marked-vs-content decision survives only in the shifted-compatible cases: [3]\cdot\diamond\cdot[\,] \sqsubseteq [\,]\cdot\diamond\cdot[3] is content-satisfiable but marked-unsatisfiable (flank-fit); the old vacuous acceptance happened to agree with content there, so completing the check under the content reading (expand the closed operand at its marker to the result’s rank, compare flat-pointwise) only changed verdicts on genuinely unsatisfiable instances. Implemented (all three parts): (1) the equality rejection is replaced by deferral — in flight until the variable resolves, with the stage-6/7 upward closing supplying the least-material disjunct; (2) the closed–closed inequality compares the operand’s explicit beg_dims/dims against the result’s flat axis list from the outer edges (the structural overlap pairs are a subset; only broadcast-inserted middle positions remain unconstrained — Def. 4.3(ii) as written); (3) the same-variable shifted inequality and the open-result/closed-operand cross-flank cases defer likewise, reducing to (2) at closing. One mechanical subtlety worth recording: the solver’s fixpoint detects progress syntactically on the constraint list, and the per-round fold reverses accumulation order — so a “close the variable + re-check” emission must list the re-check before the binding to be processed after it next round (the pre-existing stage-6 idiom in solve_row_ineq; violating it stalls the fixpoint with the binding applied but the re-check never re-run). Pinned in test/einsum/test_row_self_reference.ml (nine cases: both semantics-conflicting shifted forms fail, conjugate/compatible forms succeed via least-material resolution, broadcast-facing result surplus stays unconstrained); full suite green.