Shape Inference After the Boilerplate: A Literature Survey from OCANNL

2026-06-23

Shape Inference After the Boilerplate

A literature survey from OCANNL’s point of view. By GPT 5.5 Pro.

Shape inference is an overloaded phrase. In one community it means a static analysis over TensorFlow or PyTorch code, looking for a mismatch before a training run explodes. In another it means a dependent or refinement type system for array sizes. In compiler IRs it often means local propagation of tensor metadata through an operator graph. In APL-derived languages it means the static account of rank polymorphism: how a scalar or cell-level operation is lifted over extra axes. In user-facing Python libraries it may mean runtime-checked annotations such as Float[Tensor, "batch channels"].

OCANNL sits near all of these, but not quite inside any of them. Its starting point is not merely that shape errors should be caught. That is table stakes. The deeper annoyance is that modern tensor code is full of manual shape bureaucracy: reshape, unsqueeze, expand, transpose, axis numbers, reduction lists, padding arithmetic, and temporary shape variables. OCANNL’s embedded DSL tries to make tensor operations look closer to the algebra they express. A pointwise outer product is written as an index relation, a contraction as shared indices, a convolution as affine addressing, and concatenation as a shape-level sum. From those operation specifications, OCANNL infers shapes and then derives projection maps for code generation.

That goal matters for the related work. OCANNL is not best understood as “another tensor shape checker.” It is closer to a small compiler whose missing front-end pass is shape inference. The pass must infer ranks, sizes, broadcasts, reductions, affine traversals, and later loop-index projections. It must also decide when a shape variable is still underdetermined and how to close it into something allocatable. This leads to a design that borrows vocabulary from type systems but does not live entirely there: broadcasting is an order; row variables represent unknown rank segments; sizes are solved by constraints rather than carried only as types; and projection inference is separated from shape solving.

This post maps the surrounding literature with that distinction in mind. The question is not “who also has shapes?” Almost everyone has shapes. The useful question is: what is the job assigned to shape information in each system? Is it a type? A refinement? A runtime assertion? A symbolic expression attached to an IR value? A source of loop bounds? A witness that an index expression is in bounds? OCANNL’s answer is unusual because it wants all of the following at once: less user-written boilerplate, rank-polymorphic notation, NumPy-style broadcasting, size arithmetic, and a code-generation account of which loop variable indexes which tensor axis.

OCANNL’s reference point

The formal core gives a compact way to say what OCANNL is trying to do. At the dimension level, OCANNL treats broadcasting as an order. The claim-free unit 1_{\emptyset} is the top in the broadcast order used in the formal core: every concrete dimension refines it, and two incompatible concrete dimensions meet in error. At the row level, a shape is a sequence with a marker: leading flank, broadcast point, trailing flank. Row variables live at that marker, so an unknown middle segment can absorb axes without immediately deciding how many axes are explicit and how many are broadcast padding.

This is rank polymorphism without a type scheme. A row can be more or less specific than another row of different rank because the order itself relates them. Extra axes are not introduced by an explicit map node or an explicit unsqueeze; they are introduced by refinement against a row whose marker admits them. In the core calculus, this gives row inequalities, row equalities judged by flat content, dimension caps, and a residual bound store. The corrected theorem is not that every satisfiable system has a principal ground model. A single residual cap such as 3 \sqsubseteq \alpha already has incomparable minimal ground models. The solver’s answer is instead a substitution plus residual bounds, representing the solution set without guessing. A principal model is recovered when no caps remain.

That correction is important when comparing OCANNL with type-inference systems. OCANNL does have a most-general story, but the story is not the Hindley–Milner slogan transplanted unchanged. The shape pass has semantic steps, which preserve the solution set, and policy steps, which choose a representative when the constraints do not force one. Closing is one of those policy steps: a terminal tensor needs a concrete allocatable shape, while an interior variable can often be closed upward to 1_{\emptyset}. Projection inference is different. Once shapes are solved and closed for an operation, OCANNL freshens projection identifiers locally and solves an equivalence problem. Equality unions axes, broadcasting severs and pins a size-one operand axis, concatenation couples components, and reduction emerges when an output projection omits an iterator. This projection pass is canonical up to renaming of fresh iterator symbols. Shape solving needs a closing policy; projection solving needs only a naming.

That asymmetry is one of OCANNL’s strongest claims. Many systems track or check shapes. Fewer make explicit that the shape pass and the index-projection pass are different mathematical objects. A shape can say two axes have the same length; a projection says whether this operation co-iterates them. Those are not the same fact. Two axes can be equal in size because some other constraint equated them, but they should not be fused into one loop for the current operation unless the current operation’s local index structure equates them. This is the discipline the earlier OCANNL notes summarize as “global sizes, local identities.”

That phrase also explains why the attached comparison with Bachurski, Mycroft, and Orchard’s Star paper is so useful. Star’s slogan is that a shape is not its index. OCANNL agrees, but operationalizes the slogan differently. Star types the correspondence; OCANNL computes along it.

The older lineage: shape separated from data

A good survey should not start with deep learning. The key separation is older: a program has a data computation and, sometimes, an independently computable shape computation. Jay and Cockett’s work on shapely types and shape polymorphism is an early expression of this idea. Shapely types separate the structure of data from its contents: the shape part can be evaluated statically for programs whose result shape depends only on argument shapes. That line continues through FISh and into later discussions of shape-generic array programming.

This matters because it frames the problem as more than array bounds checking. Shape information is not merely a safety predicate. It can be a program in its own right: a computation over shapes that may be run before the data computation. OCANNL fits this lineage. When a convolution output size is derived from input extent, stride, dilation, and kernel span, the shape computation is not a decoration on the data computation. It is a separate computation whose result is needed both for allocation and for loop generation.

APL and J contribute another important axis: rank polymorphism. In APL-family languages, an operation written for cells of some rank is lifted over frames of larger rank. This programming model reduces explicit loops and makes array programs compact, but dynamic APL systems typically report shape errors at runtime. The later PL problem is to make this model static enough for type checking and compilation without losing its concision.

OCANNL’s relationship to this history is selective. It wants the concision of rank-polymorphic array notation, but it is not trying to be a full general-purpose APL. It is an embedded DSL for tensor kernels. Its ... and named row variables play the role of rank-polymorphic frame variables, but they are scoped by tensor-axis kinds and operation specifications rather than by an entire language of array functions. This narrower setting is a strength: a specialized shape solver can exploit facts about tensor operations that a general-purpose rank-polymorphic language cannot assume.

Remora and Refined Remora: rank polymorphism as a type system

The closest theoretical relative is Remora. Slepak, Shivers, and Manolios designed Remora as a statically typed, higher-order language for APL-style rank polymorphism. Its type system tracks array shapes and the rank-polymorphic lifting mechanism; the later paper on rank polymorphism as a constraint problem studies Remora type checking and inference as logical constraints over type indices. Remora is especially relevant because it has both dimension variables and shape-sequence variables, so it directly confronts the same general shape as OCANNL’s row-variable problem.

The computational model is worth spelling out because it is the point at which Remora and OCANNL nearly touch. A Remora function has a cell shape: the shape it actually consumes. If the actual argument has extra leading dimensions, those dimensions form a frame and the function is lifted over the frame. A scalar function such as add1 can be applied to a scalar, a vector, or a matrix, because its cell is empty and every actual shape is all frame. A vector function such as sum consumes a one-dimensional cell and can be lifted over a matrix by treating each row as a cell. With multiple arguments, the function position itself carries frame information, so Remora computes a principal frame by joining the function-array frame with the argument frame. That principal frame is the iteration space of the lifted application.

This is not merely a user-level convenience. In Remora, rank polymorphism is the dynamic semantics of function application and the static semantics of types. The typing rule for application checks that the non-frame part of the argument shape equals the function parameter shape; it then appends the principal frame to the result shape. Shape equality and frame join are therefore in the type system’s critical path. OCANNL has an analogous pressure, but distributes it differently: row inequalities and row equalities are generated by tensor operations, broadcasting is an order over rows, and projection inference happens after shape solving. Remora’s frame/cell split is a language-level discipline. OCANNL’s row marker is a solver-level device.

This difference shows up in polymorphism. In Remora, a function’s type records how it lifts. In OCANNL today, an OCaml function that constructs an OCANNL operation re-emits constraints at each call site with fresh variables. That is a poor man’s type scheme, but not necessarily a bad one. For an embedded DSL, the host language can handle ordinary abstraction while the DSL solver handles the shape facts created by each use. The price is that OCANNL does not get Remora’s clean type-theoretic story for free. The benefit is that OCANNL can keep the surface closer to tensor algebra: a user writes an einsum-like spec or an affine convolution spec, and the solver specializes it to the concrete context.

Refined Remora is the most useful recent development in this line because it identifies exactly where ordinary Remora’s decidable index language becomes too small. Original Remora permits dimension addition and shape concatenation, but many ordinary array facts require more: subtraction, multiplication, division, rank, pointwise relations over shapes, and side conditions such as positivity or divisibility. The paper’s toy append5 example needs a type saying that the lengths of two vectors sum to five. Without refinements, one can describe the inputs as vectors, but not the equation m + n = 5. The convolution example is more relevant to OCANNL: an unrefined Remora type can awkwardly quantify over n-k because addition is available, but stride requires division and multiplication, and the natural output formula is 1 + (@n - @k + p + p) / st. Refined Remora writes that formula directly in a shape refinement, with constraints that stride is positive, input and kernel shapes have the same rank, and the kernel fits inside the padded input.

This is the type-system version of a problem OCANNL solves operationally. OCANNL also wants to know that a convolution output extent is computed from input extent, kernel span, padding, dilation, and stride. But it does not ask the user to put that fact in a refinement type. It generates the affine size relation from the convolution-like operand addressing and solves it as part of the DSL pipeline. Refined Remora says: enrich the index language and ask an SMT solver to prove the needed equalities. OCANNL says: generate the index/shape constraints from the tensor spec and use a tensor-specific solver, with staging and closing policies, to make them executable.

The standard-library examples in Refined Remora sharpen the comparison. Functions such as reduce, flatten, unravel, and indices-of become more natural once the type system can express multiplication, rank, non-emptiness, and pointwise shape constraints. The reduce example is especially revealing. If reduce requires a non-empty frame, an unrefined type system may have to encode the size as n + 1, forcing a minus one pattern to propagate upward through dot, lift-dot, and matmul. Refinements let the programmer write the natural dimension n and constrain it by 0 < n. OCANNL’s corresponding move is not to ask for a refined type of reduce, but to let the operation’s generated constraints accumulate caps and then close leaf variables downward or interior variables upward as needed. The same annoyance is being removed, but at a different layer: Refined Remora removes unnatural type encodings; OCANNL removes explicit shape plumbing in the tensor expression.

Refined Remora’s implementation details are also instructive. Dimensions are translated to symbolic integers with non-negativity constraints. Shape variables become symbolic sequences, because a shape variable has unknown length. Subkinding and index equality are discharged by converting the index environment, shape expressions, and refinements into SMT obligations. This gives a powerful and compact formal account, but it also imports the solver’s strengths and weaknesses. The paper is explicit that the refinement logic is undecidable in general. It reports that practical examples terminate, while a seemingly simple equality between two arbitrary-length shapes whose elements are all one can make Z3 time out badly under the chosen encoding.

That failure mode is close to OCANNL’s own formal-core discoveries. OCANNL’s row solver avoided a general sequence-SMT encoding, but then had to confront its own row pathologies: no principal ground model in the presence of residual caps, the need for a rank-cycle check, flat row equality versus marked row order, and the fact that marker placement is policy rather than semantics. Refined Remora and OCANNL therefore make opposite bets about solver generality. Refined Remora delegates expressive arithmetic and sequence reasoning to SMT, then studies where that is usable. OCANNL writes a bespoke solver for a narrower generated fragment, then has to prove that the bespoke rules preserve the intended solution set except at declared policy points.

One operational detail is worth making explicit in the comparison. OCANNL does not treat all underdetermined tensor shapes alike. Ordinary intermediate variables may close upward to the claim-free top, and ordinary terminal leaves may close downward to the least shape justified by their caps. Weight-parameter tensors are different: an unconstrained hidden dimension is not a license to pick the smallest shape that happens to type-check. It is usually a modeling error, so OCANNL reports it as an omitted hidden dimension rather than silently inventing a parameter extent. This is a small but useful contrast with systems that resolve imprecision by erasing a predicate, accepting an unknown, or inserting a runtime check. OCANNL’s policy is allowed to be pragmatic because it sits after constraint derivation: it acts only where the declarative constraints have stopped determining the answer.

The deepest lesson from Remora, Refined Remora, and OCANNL together is that there are at least three layers one can choose for shape facts. In Remora, shape facts live in a rank-polymorphic type system. In Refined Remora, extra shape facts live in refinements over the index language and are proved by SMT. In OCANNL, operation specifications declaratively derive constraints, and the solver is a separate phase that propagates, represents, and eventually closes those constraints. This distinction matters. OCANNL should not be read as introducing shape facts ad hoc in the solver; the solver introduces commitments only where the declarative constraint semantics underdetermines a representative, or where the full constraint language forces a pragmatic staged choice. Closing, marker placement, parameter-hidden-dimension errors, and heuristics for hard arithmetic constraints live in that second category. The three approaches are not mutually exclusive. A future OCANNL paper could borrow Refined Remora’s vocabulary for the facts that OCANNL currently derives and solves internally: pointwise lifting of shape operations, rank constraints, positivity and divisibility side conditions, and the distinction between decidable core inference and deliberately more expressive arithmetic. But OCANNL’s center of gravity remains different. It is less a type system for arrays than a compiler front end whose type-like constraints are there because code generation needs them.

Automap is a useful neighboring point. It infers implicit map and replicate operations for statically typed array programs, using integer linear programming to choose rank-polymorphic applications. Its motivation is similar to OCANNL’s: remove boilerplate introduced by explicit lifting. Its mechanism is different: infer structural lifting operations compatible with a Hindley–Milner-like setting. OCANNL does not infer map nodes as such. It infers a row order relation, then later derives projections. But both systems show the same pressure: array programmers should not have to spell out every mechanical axis manipulation.

Size-dependent arrays: DML, Qube, Futhark, staging

Another branch starts from dependent or indexed types. Xi and Pfenning’s Dependent ML is the classic reference for enriching ML with restricted dependent types over a decidable index domain. The point is not tensor programming in the modern deep-learning sense, but the architecture is relevant: ordinary programs are typed with additional static indices, and constraints over those indices are discharged separately from runtime execution. Array bounds and sizes are the archetypal use case.

Qube and related dependently typed array languages apply that idea more directly to array shapes. They show that APL-style shape polymorphism and dependent types can coexist, using solvers to verify size constraints and index operations statically. This family marks one end of the design spectrum: make shapes precise at the type level, then require a type checker or solver strong enough to prove the needed facts.

Futhark offers a modern, pragmatic version of size-dependent array typing. Its size-dependent types track array sizes in an ML-style array language. The 2021 and 2023 Futhark papers emphasize a restricted form of size dependency: powerful enough to express common array-size consistency properties, but not full dependent typing. Futhark supports existential sizes and size expressions, and its type checker can enforce equalities and constraints needed by data-parallel array programs.

For OCANNL, Futhark is a close cousin but not a twin. Both care about high-performance array compilation and both want size information early. But Futhark’s type system is about making array-size consistency part of function types in a whole language. OCANNL’s solver is about inferring shape relations from tensor operation specifications in an embedded DSL. Futhark would like a function’s type to expose size relations such as [n]i32 -> [n]i32; OCANNL would like the user not to write the shape relation at all if it can be reconstructed from "i => i", "m n; n p => m p", or an affine convolution spec.

The recent staged shape-dependent-types paper gives another point on the spectrum. Instead of asking a proof engine to establish shape equality or subtyping, it uses staging: shape consistency conditions are extracted and evaluated during compile-time computation. The paper explicitly notes the practical friction of proof obligations and automated proving, then proposes compile-time assertions and best-effort inference of implicit shape-related arguments. That is close in spirit to OCANNL’s willingness to treat shape computation as a compiler phase rather than a purely declarative type proof. Still, the staged system is framed as compile-time tensor shape checking. OCANNL’s additional burden is projection inference: not just “this program has consistent shapes,” but “this operation’s loop nest indexes these operands in this way.”

The clean way to position OCANNL against this branch is: dependent and refinement systems put sizes in types; staged systems run size computations before data computations; OCANNL puts tensor-specific sizes in a constraint solver and then uses the result to derive code. That makes it less general as a programming-language discipline and more direct as a tensor compiler front end.

Star: algebraic shapes and the shape/index split

Bachurski, Mycroft, and Orchard’s Structuring Arrays with Algebraic Shapes is the most interesting comparison because it agrees with OCANNL on a deep structural point while making almost opposite surface choices. The paper introduces Star, a small pointful array calculus whose shapes are built from structural records and variants. Product shapes model named multidimensional axes. Concatenation shapes model sums, padding, and variants. The corresponding index type is not identical to the shape type; it is computed by a metafunction. A shape and its index are distinct but isomorphic.

That is the same sentence OCANNL needed for projection inference. A size is not an iterator. A shape is not an index map. A tensor row can tell us how many axes exist and how large they are, but an operation still needs to say which local iterator addresses each operand axis. OCANNL’s projection identifiers and fresh local projection solve are the operational version of Star’s shape/index separation.

The first convergence is broadcasting. Star presents broadcasting as a meet in its shape lattice. It is useful to orient OCANNL’s current formal order the same way: a more specific, more material shape sits below a less committed one, and the broadcast result is the greatest lower bound of its operands. Nothing deep depends on calling the operation a meet rather than a join; with the opposite, more domain-theoretic information order, 1_{\emptyset} would be the least-information bottom and the same operation would be named dually. The meet wording is a deliberate alignment with the type-theoretic perspective, not the discovery of a new law of broadcasting. Where the orders really differ is distributivity. Star uses a distributive lattice, fitting the algebraic-subtyping tradition and the promise of ML-style principal type inference. OCANNL’s individual-axis order is intentionally non-distributive: concrete sizes form an antichain between top and bottom. For a single axis, two distinct concrete sizes join to the claim-free top and meet in error. Rows are richer because they add sequences, markers, flank fit, one-shot equality, and residual caps; the dimension-level diamond is only the smallest witness that concrete size tracking does not fit the distributive MLsub story.

The second convergence is concatenation. Star’s concatenation shapes are variant-indexed sums: a padded matrix row may be indexed by Top, Centre, or Bottom. OCANNL’s concatenation story reads the same structure as a coproduct of ranges with injection offsets. A variant is a coproduct is a sum type. Star reaches this object from the typing side: make the index a variant and pattern-match on it. OCANNL reaches it from the loop-nest side: split the iteration space into disjoint summands, then generate offsets and cover checks. The mathematics is the same; the operational layer is different.

The third convergence is the slogan itself: a shape is not its index. Star formalizes the correspondence by an index-type metafunction. OCANNL formalizes it by a second pass. Shape solving determines dimensions and rows; projection inference freshens local identities and computes equivalence classes of projection identifiers. In Star, the shape/index relation is typed. In OCANNL, it is computed.

The divergence follows from the job each system assigns to shape information. Star is a type system. It asks programmers to write structural names and variant tags in exchange for static safety. It explicitly avoids resolving arithmetic size problems by using richer structural shape types. Its paper is candid that full NumPy-style size-one broadcasting is hard in this setting, because the type system would need to express mixtures of conjunctions and disjunctions over sizes. Star sidesteps that difficulty by broadcasting over named-axis presence rather than by proving arithmetic facts about size-one axes.

OCANNL cannot sidestep that case. Size-one broadcasting is everyday tensor programming. A vector, a batch dimension, a singleton axis, and an omitted axis must interact in the same shape pass. Thus OCANNL does not try to keep concrete sizes out of the problem. It models 1_{\emptyset} as the claim-free broadcast unit and solves size constraints directly. A conflict between two distinct non-one atoms is an error meet; a residual lower bound becomes a cap; two incompatible caps collapse to the broadcast top when the only common upper value is the claim-free unit.

Star also differs on boilerplate. Its examples make array structure explicit with records and variants. That is good type-system design, but it is not the same optimization target as OCANNL. OCANNL’s primary deliverable is not a more expressive type annotation. It is code the user did not write: omitted expand_dims, omitted reshape, omitted loop bounds, omitted contraction loops, omitted convolution index arithmetic. In a Star-style calculus, the programmer earns safety by naming structure. In OCANNL, the programmer earns concision by writing a compact operation specification and letting the solver recover structure.

The fair summary is therefore not “Star lacks inference” or “OCANNL lacks algebraic shapes.” Star and OCANNL are two points on one map. Star is the type-theoretic point: structural shapes, structural indices, subtyping, type safety, future ML-style inference. OCANNL is the compiler point: broadcast-aware row solving, shape closing, projection inference, affine and concatenation constraints, code generation. The attached comparison’s title is exactly right: a shape is not its index. The rest is a choice about what to do with the correspondence.

Axon and gradual tensor shape systems

Axon is the closest modern deep-learning-specific shape-inference language. It is explicitly a language for shape and rank inference in deep-learning graphs. Tensor dimensions are symbolic expressions built from variables, constants, and arithmetic operators; tensor shapes can be sequences, symbolic shape variables, or appends of other shapes. The language is functional in style and its type system resembles Standard ML extended with shape information. Its shape inference algorithm is constraint-based and aims to infer tensor shapes from both programmer-supplied information and graph structure.

This overlaps strongly with OCANNL. Shape variables, rank variables, append-like shape structure, and arithmetic dimension expressions are all in the same conceptual neighborhood. Axon also shares the goal of making shapes implicit in a functional setting, rather than requiring the user to annotate every tensor.

The difference is that Axon is a language for dynamic shapes in deep-learning graphs, while OCANNL is a tensor-operation DSL with an index-notation surface and a code-generation backend. OCANNL has a more specific account of broadcasting as an order and a more explicit separation between shape inference and projection inference. Axon is a useful comparison when writing about shape inference as constraint solving; it is less directly comparable when writing about contraction, affine operand addressing, or deriving loop-index maps.

Gradual Tensor Shape Checking deserves a deeper comparison because it makes the opposite engineering compromise from OCANNL in a very principled way. The paper starts from two difficulties. First, shapes are first-class objects in real tensor programs: Tensor.zeros takes a runtime list of integers and returns a tensor whose shape is that list, so a precise static account wants a dependent or refinement type such as S : int list -> { r : tensor | shape(r) = S }. Second, checking realistic shape constraints can require non-linear integer arithmetic and list reasoning, as with reshape, whose input and output element products must agree. Full inference over such constraints is not a good target.

The system, GraTen, therefore combines three ideas: best-effort inference, hybrid checking, and gradual typing. Best-effort inference means the system does not try to infer most-general dependent/refinement types. It infers what it can. Hybrid checking means it inserts dynamic assertions at program points where static information is too imprecise. Gradual typing means users can add annotations to improve precision, and the system satisfies a gradual guarantee: less precise programs remain typeable with weaker static information and more dynamic checks, while more precise programs can eliminate checks when the added information is correct.

The introductory example is a good microcosm. A model composes a function f, which maps a vector of length n to a vector of length n / s, with a function g, which expects a vector of length 10. In the simple case, GraTen can infer that the input vector must satisfy nth(0, shape(x)) / s = 10. After adding a conditional branch whose two branches produce shapes the inference procedure cannot reconcile precisely, GraTen falls back to a less precise tensor type for the intermediate value and inserts an assertion before the call to g: dynamically check that y.shape = [10]. If the user adds a suitable type annotation to y, the original precise static type is recovered and the assertion disappears.

That is a clean design. It gives the type checker a safety valve. If static reasoning cannot prove the shape relation, the transformed program checks it at runtime. OCANNL does not have that same safety valve, at least not at the core tensor-kernel layer. A generated kernel either has a solved, closed shape and a valid projection map, or compilation should reject it. Runtime assertions may be useful at a higher API boundary, but the loop nest itself cannot be half-shaped. This is why OCANNL’s policy steps are different from gradual casts. Closing a leaf downward or an interior upward is not inserting a runtime check; it is choosing a representative so allocation and code generation can proceed. A mistaken choice can be a policy rejection or an inference limitation, but it is not a deferred contract in the same sense as GraTen’s inserted assertion.

GraTen’s implementation reinforces the difference. Its type checking has three phases: Hindley–Milner simple type inference, best-effort refinement inference, and consistent-subtyping checking with assertion insertion. During refinement inference, it creates predicate variables for unknown refinements, decomposes subtyping constraints into constrained Horn clauses, and solves them with a handmade heuristic solver rather than relying wholly on SMT or CHC solvers. If a predicate variable cannot be solved, it is assigned true, so it will not make later consistent subtyping impossible. The third phase checks consistent subtyping constraints, simplifies them where possible, falls back to Z3 with timeouts when necessary, and inserts assertions.

This is surprisingly close to OCANNL’s corrected formal story at the meta-level. Both systems refuse to pretend that complete inference is available. Both have an inference phase that is intentionally incomplete. Both keep a distinction between semantic information and policy or fallback behavior. But the fallback target is different. GraTen uses dynamic checks and user-supplied annotations to move a program along a static/dynamic boundary. OCANNL keeps underdetermination inside the compiler’s representation for as long as possible: the residual bound store records what is still open, staged solving propagates what later stages can make visible, and closing commits only when allocation, parameter creation, or code generation require a representative. GraTen’s unknown predicate can become true; OCANNL’s underdetermined interior row can close upward to 1_{\emptyset}, while a parameter tensor with an unconstrained hidden dimension is treated as an error rather than guessed.

The evaluation patterns in GraTen are also useful for an OCANNL related-work discussion. The authors report that imprecision often comes from branches, recursive functions, higher-order shape-polymorphic arguments, record definitions, and imprecise primitive signatures. This is exactly the kind of complexity OCANNL avoids by narrowing its input language. OCANNL is not trying to infer refinements for arbitrary OCaml control flow, recursive functions, or lists of layers. It asks users to express tensor operations inside a DSL whose constraint sources are known. The trade-off is not that OCANNL lacks a whole-program ambition in the abstract; it is that shape and projection inference run at operator call sites along the OCaml paths that are actually traversed. The cost is therefore practical: errors in library tensor logic are caught as early as the test suite or client code exercises those operations, not at the definition of every possible library path. The benefit is that the solver can be specialized to a structured tensor fragment rather than to arbitrary host-language control flow.

The 2024 work on generalizing shape analysis with gradual types extends the gradual direction into migration and tooling. It asks how one can move from less precise to more precise shape information, using gradual types and constraint solving to support annotations, validation, and transformations. That paper is best grouped with GraTen but not collapsed into it. GraTen is the better source for the formal hybrid-checking story and the OCaml-Torch implementation; the migration work is the better source for the broader tooling agenda. Together they mark a major axis in the design space: accept a static/dynamic boundary and help the user move it. OCANNL marks a different axis: design a DSL whose generated shape constraints are meant to be solved before code generation, so the static/dynamic boundary lies outside the kernel.

The distinction matters for paper claims. OCANNL should not claim universal shape inference. Its full constraint language, with total-element constraints over concatenated rows, can encode polynomial equations over natural numbers; unrestricted satisfiability is Diophantine-complete. The right claim is relative completeness on a frontend-generated fragment, not general decidability. Gradual Tensor Shape Checking is a useful reminder that mature related work treats undecidability as a design fact, not a footnote.

Static analyzers for Python tensor programs

Pythia, PyTea, ShapeTracer, and ShapeIt form another cluster. Their target is existing Python tensor code, not a new array calculus.

Pythia performs static analysis of TensorFlow programs, tracking tensor shapes across Python library calls and modeling TensorFlow operator semantics. It reports shape mismatches and error-prone patterns, with evaluation on real TensorFlow bugs. PyTea analyzes PyTorch code by tracing possible execution paths, collecting tensor-shape constraints for each path, and asking whether those constraints are unsatisfiable. ShapeTracer, in industrial TensorFlow settings, emphasizes constraint solving for shape-related errors and contrasts itself with analyses that cannot progress when ranks or dimensions remain unknown. ShapeIt is dynamic: it infers likely symbolic tensor shapes in Python machine-learning programs by observing executions and annotating dimensions with symbolic names.

These tools answer a different question from OCANNL. They ask: given code written against a large dynamic framework, can we discover likely or guaranteed shape errors before the user wastes time? OCANNL asks: can we design a notation whose generated constraints make the intended tensor operation explicit enough that shape inference and code generation become part of the same pipeline?

The Python analyzers are still important related work for three reasons. First, they document the practical importance of shape errors. If shape bugs were rare, these systems would not exist. Second, they show that real tensor programs require path sensitivity, library semantics, and constraint solving; a simplistic local propagation story is not enough. Third, they mark a boundary: OCANNL avoids much of Python’s whole-program analysis problem by not analyzing arbitrary Python control flow. It analyzes the DSL operations it generated. That is a restriction, but also a liberation. The solver can be more semantic because the source of constraints is narrower and better understood.

A good related-work paragraph should therefore treat these analyzers as motivation and contrast, not as direct competitors. OCANNL does not replace PyTea for arbitrary PyTorch training code. It replaces a different piece of workflow: handwritten tensor shape plumbing inside kernels that could have been expressed algebraically.

Tensor comprehensions and index-notation compilers

Tensor Comprehensions is a natural citation because it shares OCANNL’s desire for a compact notation close to mathematical tensor formulas. TC uses a syntax generalizing Einstein notation, supports shape and size inference, and compiles to high-performance CUDA kernels using polyhedral machinery and autotuning. It is one of the clearest predecessors for the idea that deep-learning tensor operations should be specified declaratively and compiled.

The surface model is close enough to OCANNL that the details are worth reviewing. In TC, index variables are introduced implicitly by using them in tensor accesses, and their ranges are inferred from what they index. An index that appears on the right-hand side but not on the left-hand side is a reduction dimension. In a matrix-vector product,

def mv(float(M,K) A, float(K) x) -> (C) {
  C(i) +=! A(i,k) * x(k)
}

i ranges over M, k ranges over K, and k is a reduction variable because it is absent from C(i). The +=! form is an initializing reduction: initialize with the reduction operator’s neutral element and then accumulate. This is exactly the kind of tensor notation OCANNL wants to make feel ordinary. Shared indices induce co-iteration; omitted right-hand-side indices induce reduction; output indices define the visible result axes.

TC also has a carefully stated semantics for multiple statements and in-place updates. It preserves a functional, whole-tensor reading: right-hand sides are read in full before assigning the left-hand side. That matters when the output tensor also appears on the right, because the compiler must reject liveness interference such as an in-place transposition unless the dependence is safe. OCANNL’s current formal core discusses a related problem in different language: projection inference determines when the left-hand-side projection is injective or surjective and hence whether accumulation or initialization is required. Both systems have to connect index notation to update semantics. TC states that connection at the tensor-comprehension language level; OCANNL states it through the projection map and its injectivity/surjectivity properties.

TC’s range inference is the most important comparison point. Tensor comprehensions are concise because loop ranges are usually inferred from context. The paper initially considered a more general Presburger/polyhedral inference story, but deliberately chose a simpler rule for usability: infer rectangular ranges that are as large as possible without reading out of bounds on inputs; if there is more than one way to do this, reject and ask the user for a where clause. That is a very OCANNL-like design instinct. The authors explicitly prefer an inference rule a programmer can mentally emulate over a mathematically maximal but surprising solver.

The one-dimensional convolution example shows the rule. For

def conv1d(float(M) I, float(N) K) -> (O) {
  O(i) = K(x) * I(i + x)
}

there are at least two ways to maximize ranges. If i is maximized first, x becomes nearly useless; if x is maximized from K(x), then i is constrained by I(i+x) and the output becomes the expected valid convolution of length M-N+1 in inclusive-endpoint terms. TC resolves this by inference rounds. In the first round, only accesses with a single unresolved variable are used, so K(x) fixes the kernel index. In the next round, I(i+x) has only one unresolved variable, and the already-known range of x constrains i. The resulting condition is universal over x: for all kernel positions, i+x must be in bounds. The paper explicitly contrasts this with an existential projection semantics that would produce a triangular or projected domain. TC chooses rectangular, uniform domains because they are intuitive to ML users and lead to compact code without conditionals.

This is extremely relevant to OCANNL’s affine story. OCANNL’s convolution-as-contraction view also needs to decide whether an affine operand access describes a valid convolution, a size-preserving padded convolution, or an underconstrained indexing pattern. Like TC, OCANNL should not simply expose the most general polyhedral semantics and call that inference. It needs a convention that users can predict. The difference is that OCANNL folds this into a broader broadcast-aware shape solver. TC range inference starts from declared input tensor shapes and infers loop ranges and temporary output shapes. OCANNL also infers hidden ranks, broadcasts, and sometimes leaf shapes from operation constraints. TC’s where clause is an explicit disambiguator. OCANNL’s analogues are named row variables, explicit size annotations, and the closing policy that commits what remains underdetermined.

TC’s max-pooling example exposes the same boundary. The window variables kw and kh are underconstrained because no input access alone determines how wide the pooling window should be. The TC program therefore writes where kw in 0:2, kh in 0:2. OCANNL’s convolution or pooling surface likewise cannot infer a kernel window from nothing. If the affine expression contains an index variable used only inside a local window, some constraint or annotation must tell the solver its extent. There is no shame in that. The interesting design question is which omissions are ordinary boilerplate and which omissions are genuine missing information.

TC’s examples also show the compiler payoff of declarative tensor notation. A strided convolution is just a tweak to the subscript expression, sh * h + kh and sw * w + kw. Grouped convolution is not a separate primitive; it is a tensor expression that splits the channel dimension into a group dimension and a smaller reduction dimension. The paper reports grouped-convolution experiments where the generated kernels bridge the gap between a research operation and a tuned library implementation. In the production-model section, TC uses fused lookup-table embeddings and fused low-latency MLP layers to reduce kernel-call overhead. These examples are not about shape inference in isolation. They are about the reason shape and range inference matter: once the compiler understands the tensor expression, it can fuse, schedule, and specialize.

That is also OCANNL’s strongest kinship with TC. Both systems start from the conviction that a tensor expression should be closer to mathematics than to a sequence of library calls. Both systems treat ranges and reductions as inferable from index structure. Both systems need annotations when inference is underconstrained. Both systems are compiler front ends, not just checkers.

The workshop examples make the notational contrast sharper. TC signatures write element types such as float(M,K) at the boundary. OCANNL can often omit even that: numerical precision is inferred at the call site, so the notation is not merely rank-polymorphic but precision-polymorphic in the ordinary use case. The multi-head-attention example uses parameter tensors such as { w_q }, { w_k }, { w_v }, and { w_o } without spelling their full shapes; the attention-score contraction names the semantically relevant axes s, t, h, and d, while batch rank, parameter shapes, score shape, and projection maps are inferred. The convolution example is even closer to TC’s subscript arithmetic but more rank-polymorphic: stride*oh + kh and stride*ow + kw are affine operand addresses, kh and kw are contracted with the kernel, ..ic.. abstracts over an input-channel row, ..oc.. over an output-channel row, and ... over arbitrary context axes. OCANNL also commits to the valid-versus-padded interpretation in the surface language: either directly in the specification with a < versus = marker next to the affine +, or implicitly by requiring a use_padding boolean binding in scope of the specification. The SGD example shows the same concision outside a pure einsum-style expression: state-tracking tensors such as { sgd_delta } and { sgd_momentum } are introduced in a computation without spelling their shapes or precisions. This is not just range inference; it is a compact notation for a family of kernels and tensor computations.

The differences are equally important. TC’s tensor signatures name concrete ranks, dimensions, and element types at the boundary: float(B,IP,H,W) or float(N,G,C,H,W). OCANNL leans harder on row variables, broadcasting, and call-site precision inference. A single OCANNL spec can abstract over leading batch axes, activation rows, channel rows, and precision without rewriting the tensor signature. TC’s inference is primarily range inference for iteration variables and output temporaries; OCANNL’s shape inference is also rank inference, broadcast inference, parameter-shape inference, and projection inference. TC’s reduction variables are syntactically those absent from the output; OCANNL agrees at the projection level, but the formal core phrases reduction as non-injectivity or omission in the output projection after local projection solving. TC’s ambiguity response is mostly a where clause. OCANNL’s ambiguity responses are more varied: explicit size annotations such as Shape.set_dim, named row variables, a residual store, staged solve, parameter-hidden-dimension errors, and closing policies.

Tensor comprehensions in SaC provide another useful comparison. SaC has a long history of high-level array programming, shape-invariant comprehensions, and compilation of multidimensional arrays. The tensor-comprehension work in SaC proposes notation for data-parallel operators and includes a formal shape-inference process. This is related to OCANNL both in the surface ambition and in the compiler ambition: a compact array notation should support shape inference and lower to efficient loops.

TACO and sparse tensor algebra compilers are adjacent. TACO’s core contribution is compiling tensor algebra expressions over dense and sparse formats into efficient kernels. It is primarily about index notation, formats, iteration spaces, and sparse code generation, not about broadcast-aware rank-polymorphic shape inference. Still, it is relevant whenever OCANNL talks about lowering tensor algebra to loop nests. More recent work on compiling shape operators on sparse arrays is relevant to OCANNL’s concatenation and reshape interests: it treats shape operators such as reshape and concatenate as first-class compilation problems, especially when layout conversion matters. OCANNL’s current story is dense and shape-inference oriented, but the moment concatenation and layout become compilation objects, sparse-array shape-operator work becomes a useful neighbor.

The broader lesson from this cluster is that index notation is not enough. A compiler must know ranges, reductions, layouts, storage formats, update modes, and aliases between index variables. OCANNL’s shape/projection split is a way to make one part of that knowledge explicit: first determine the tensor shapes, then determine the operation-local projection structure that maps product-space iterators to tensor addresses. TC teaches the same lesson from the opposite direction. Its notation is compact because ranges are inferred, but the compiler story only works because the range inference, update semantics, and lowering pipeline are part of the language design rather than an afterthought.

Production shape propagation: JAX, ONNX, PyTorch, MLIR, StableHLO

Production frameworks and compiler IRs often use “shape inference” to mean propagation of shape metadata through a graph. This is related to OCANNL but lives at a different layer.

JAX export shape polymorphism allows exported functions to work for families of input shapes. Symbolic dimensions are parsed into dimension-expression objects, and arithmetic on those dimensions can be used in many shape positions. The motivation is tied to export: after a function is exported and the Python source is unavailable, the lowered representation still needs to describe a family of valid shapes. OCANNL shares the symbolic-dimension flavor, especially for affine and size arithmetic, but its constraints are generated by tensor operation specs rather than by tracing Python/JAX programs.

ONNX shape inference is graph-level metadata propagation. Its documentation explicitly states that shape inference is not complete: dynamic behavior, such as reshape to a dynamically provided shape, can block inference, and not every operator is required to have shape inference. This is a useful reality check. Production shape inference is deliberately operator-local and partial. It is designed to support tooling and optimization over a shared model format, not to be a formal calculus of rank-polymorphic tensor inference.

PyTorch dynamic shapes are another engineering point. The PyTorch compiler uses symbolic shapes to avoid recompilation across varying dimensions and tracks constraints through internal machinery such as symbolic integers and ShapeEnv. The concern is not to eliminate shape boilerplate from a source DSL, but to compile dynamic PyTorch programs robustly and efficiently. Again, the relation is motivational. OCANNL can cite PyTorch dynamic shapes to show that symbolic shape reasoning is central to modern tensor compilation, while being clear that OCANNL’s solver is smaller, more specialized, and more semantic.

MLIR is perhaps the most important compiler-IR comparison. The Shape dialect contains operations for shape inference, and Linalg operations explicitly carry indexing maps and iterator types. This is close to OCANNL’s projection story, but in an IR rather than a frontend inference system. MLIR/Linalg asks the IR producer to provide indexing maps. OCANNL tries to derive the maps from a compact tensor spec after solving shape constraints. In that sense, OCANNL can be read as a frontend that wants to generate the kind of structured information an IR such as Linalg needs.

StableHLO’s type-inference documentation is also relevant as an engineering practice reference. It frames high-quality shape functions and verifiers as operator-level obligations. OCANNL’s formal core is more ambitious in one dimension and narrower in another: more ambitious because it tries to characterize a solver and projection pass; narrower because it covers the DSL-generated fragment rather than all operators in a general-purpose IR ecosystem.

Python annotations and the documentation end of the spectrum

PEP 646 and libraries such as jaxtyping occupy the user-facing annotation end of the spectrum. PEP 646 introduced variadic generics through TypeVarTuple, enabling array classes generic in an arbitrary-length shape. jaxtyping and torchtyping-style libraries let users write annotations for tensor shape and dtype, sometimes with runtime checking and sometimes with static-tooling support.

This work is useful because it shows demand. Programmers want to write down shape relationships close to code. But it is far from OCANNL’s center. An annotation such as Float[Tensor, "batch channels"] documents and checks a contract. It does not by itself infer a loop nest for "i, f; j, f => i, j, f", decide where rank-polymorphic broadcasting inserts axes, or derive a convolution input projection from stride and dilation.

The annotation ecosystem is still worth citing briefly because it expresses the mainstream Python response to the shape problem: add names, comments, and runtime/static checks around existing tensor code. OCANNL’s response is more radical but narrower: replace a slice of tensor code with a DSL whose syntax is already a shape-and-index specification.

What OCANNL uniquely combines

After walking the literature, OCANNL’s contribution can be stated more carefully.

First, OCANNL treats broadcasting as an order over dimensions and rows, not as a collection of ad hoc special cases. This order includes a claim-free unit 1_{\emptyset}, concrete size atoms, an error bottom, and row markers that determine where rank broadening can occur. That gives a compact semantics for NumPy-style broadcasting and rank polymorphism. It also explains why OCANNL does not inherit algebraic-subtyping machinery wholesale: concrete sizes make the individual-axis lattice non-distributive, and rows add still more structure through sequences and marker placement.

Second, OCANNL has row variables with left and right flanks. This is the key to expressing tensors whose unknown axes are sandwiched between known leading and trailing axes. It matters for einsum ellipses, batch axes, slices, and operations where the user wants to name the ends of a shape while abstracting over the middle. This resembles shape-sequence variables in Remora and Axon, but OCANNL’s marked-row order is tailored to broadcasting and closing.

Third, OCANNL puts arithmetic shape relations in the constraint pipeline. Futhark and dependent-type systems put sizes in types. JAX and PyTorch carry symbolic dimensions through compilation. Star avoids size arithmetic in its type system. OCANNL’s choice is to derive arithmetic constraints from tensor specs and then solve them in a dedicated staged engine. This is what enables convolution and affine operand addressing to feel like generalized contraction rather than like a separate library primitive with hand-coded shape rules, while still leaving room for pragmatic failures when a generated Diophantine-style problem is outside the supported fragment.

Fourth, OCANNL separates shape solving from projection inference. This is perhaps the cleanest conceptual contribution. A shape solution says what sizes exist. A projection solution says how a particular operation maps local iterators to tensor axes. Projection inference is local, canonical, and unique up to iterator renaming. Shape solving is global, may leave residual bounds, and eventually needs a closing policy. This separation is easy to miss because many tensor systems blur shape equality and loop co-iteration.

Fifth, OCANNL’s concatenation story treats concatenation as a coproduct. This lines up beautifully with Star’s variant-indexed concatenation shapes and with ordinary sum types. It also exposes two units: the broadcast/product unit and the concatenation/sum zero. Once concatenation is in the constraint language, however, the metatheory becomes harder. Total element constraints over concatenated rows can encode polynomial equations, so the full language cannot have unrestricted complete inference. That limitation should be stated as a strength of the paper, not a weakness: the honest problem is to characterize the frontend-generated fragment.

Sixth, OCANNL has an implementation-informed formal core. The notes do not merely present an ideal calculus. They record corrections found by trying to prove the theorems and checking them against code: no principal ground model in the presence of residual caps; a necessary rank-cycle check for termination; row equality as flat content rather than marker identity; closing as policy; projection canonicity as a separate theorem. This is a good research story because it shows the formalization doing work.

Seventh, OCANNL combines shape inference with other inference that ordinary tensor notations often leave explicit. The workshop examples rely not only on inferred ranks and extents, but also on call-site numerical precision inference and parameter-shape inference. This is part of the same design goal: remove the bureaucracy that does not carry mathematical intent, while still forcing the user to state genuinely semantic choices such as the number of heads, hidden widths, kernel extent, output channels, and whether a convolution is valid or padded.

How to write the related-work section

For a paper, I would not present the whole survey as a flat list. I would organize it by design choice.

One paragraph should cover rank-polymorphic array languages: APL/J as the programming model, shapely types as the shape/data separation, and Remora as the strongest typed rank-polymorphic predecessor. The contrast: OCANNL is not a full rank-polymorphic language; it embeds rank-polymorphic rows into tensor operation inference.

A second paragraph should cover size-dependent and refinement approaches: DML, Qube, Futhark, Refined Remora, and staged shape-dependent types. The contrast: these systems put shape facts into types, refinements, or staged assertions; OCANNL derives tensor-specific constraints from operation specs, then solves and closes them to feed code generation. Refined Remora deserves a sentence of its own because it is the cleanest typed answer to convolution-size arithmetic: shape operations and side conditions become refinements over dimensions and shape sequences, checked by SMT.

A third paragraph should cover Star. This deserves more than a citation because it identifies the same shape/index split. The contrast should be generous: Star types algebraic shapes with structural records and variants; OCANNL derives shapes and projection maps from tensor specs. Broadcasting-as-meet and concatenation-as-sum are shared insights.

A fourth paragraph should cover deep-learning shape systems and analyzers: Axon, Gradual Tensor Shape Checking, PyTea, Pythia, ShapeTracer, and ShapeIt. The contrast: these systems infer or check shapes for DL graphs or existing Python/OCaml-Torch code; OCANNL controls the frontend and can generate a more structured constraint fragment. Gradual Tensor Shape Checking is especially useful as a foil: it handles incompleteness by inserting runtime assertions and letting annotations improve precision, whereas OCANNL handles incompleteness inside the compiler pipeline by representing residual bounds, propagating them through staged solving, and committing representatives only at closing or other policy sites.

A fifth paragraph should cover tensor-comprehension and compiler-IR work: Tensor Comprehensions, SaC tensor comprehensions, TACO, shape operators on sparse arrays, MLIR/Linalg, JAX, ONNX, PyTorch dynamic shapes, and StableHLO. The contrast: these systems propagate shapes or compile index notation; OCANNL’s specific contribution is the broadcast-aware inference layer that produces both shapes and local projection maps. Tensor Comprehensions deserves more than a name because its range-inference rule — infer obvious maximal rectangular ranges and require where annotations for underconstrained cases — is one of the closest antecedents to OCANNL’s own “infer the boilerplate, but do not guess invisible intent” stance.

A final short paragraph can mention Python annotation ecosystems, but only as context. PEP 646 and jaxtyping are important to users, but they are not central technical predecessors.

Here is a possible compact related-work paragraph, paper-style:

OCANNL is closest to typed rank-polymorphic array languages such as Remora and to size-dependent array languages such as Futhark, but differs in deriving tensor-specific shape constraints from operation specs rather than assigning shape reasoning to a general type system. Remora models APL-style lifting statically with dimension and shape-sequence indices; Futhark tracks size consistency through restricted size-dependent types; Refined Remora and staged shape-dependent systems add refinements or compile-time shape computations. OCANNL instead treats broadcasting as an explicit order over dimensions and rows, uses row variables for rank-polymorphic tensor specifications, and separates declarative constraint derivation from solving, residual stores, and closing policies that commit allocatable shapes. Star’s algebraic shapes are especially close in spirit: both systems can present broadcasting as a meet under the Star-aligned orientation and identify concatenation with a sum/variant, and both separate shapes from their indices. Star types this correspondence; OCANNL computes along it, deriving local projection maps after shape solving. Static analyzers such as Pythia and PyTea, and production mechanisms such as ONNX shape inference, JAX shape polymorphism, PyTorch dynamic shapes, and MLIR shape functions, address shape propagation or bug finding in existing frameworks; OCANNL instead provides a DSL-generated constraint fragment whose solutions feed code generation.

An annotated bibliography

This list is intentionally selective. It is a starting bibliography for a paper or blog series, not an exhaustive survey of every array language.

Jay and Cockett, “Shapely Types and Shape Polymorphism” (ESOP 1994). The early shape/data separation. Useful for the historical point that shape computation can be evaluated separately from data computation when result shapes depend only on input shapes.

Slepak, Shivers, and Manolios, “An Array-Oriented Language with Static Rank Polymorphism” (ESOP 2014); “The Semantics of Rank Polymorphism”; “Rank Polymorphism Viewed as a Constraint Problem” (ARRAY 2018). The core Remora lineage. Essential for rank-polymorphic typing, shape-sequence variables, and constraint-based views of type checking and inference.

Matviichuk and Shivers, “Refined Remora: Constraining Array Shapes” (ARRAY 2026). Recent refinement-style extension of Remora shape types, with SMT-checked constraints over dimensions and shape sequences. Important for three reasons: it shows where ordinary Remora’s index language becomes awkward (append5, convolution, reduce, indices-of); it gives the type-system answer to pointwise shape arithmetic, rank, positivity, divisibility, and stride; and it exposes solver/erasure subtleties that parallel OCANNL’s own formal-core corrections.

Bailly, Henriksen, and Elsman, “Shape-Constrained Array Programming with Size-Dependent Types” (2023), and Henriksen/Elsman, “Towards Size-Dependent Types for Array Programming” (2021). The modern Futhark size-dependent type story. Strong comparison for ML-style array sizes and existential sizes.

Xi and Pfenning, “Dependent Types in Practical Programming” (POPL 1999), and Dependent ML. Foundational restricted dependent typing over index domains. Useful background for the idea of separating index constraints from ordinary runtime terms.

Trojahner and Grelck / Qube. Dependent types for array programming and APL-style shapes. Useful as the older dependently typed array-language branch.

Bachurski, Mycroft, and Orchard, “Structuring Arrays with Algebraic Shapes” (ARRAY 2025). Star. The most conceptually illuminating comparison: structural records and variants for shapes, broadcasting as meet, concatenation as variant/sum, and the shape/index isomorphism.

Collins and Grover, “Axon: A Language for Dynamic Shapes in Deep Learning Graphs” (2022). A DL graph language with symbolic dimension expressions, shape variables, shape appends, and constraint-based shape/rank inference.

Hattori, Kobayashi, and Sato, “Gradual Tensor Shape Checking” (2022/2023), and Migeed et al., “Generalizing Shape Analysis with Gradual Types” (ECOOP 2024). Best-effort shape inference, gradual typing, and dynamic checks. The first is the deeper source for hybrid checking and OCaml-Torch: infer what is practical, insert assertions when static information is too weak, and let annotations remove checks. The second is the broader migration/tooling version of the same gradual-shape agenda. Together they are the best foil for OCANNL’s choice to close shapes before code generation rather than insert dynamic tensor-shape contracts inside generated kernels.

Lagouvardos et al., “Static Analysis of Shape in TensorFlow Programs” / Pythia (ECOOP 2020); Jhoo et al., PyTea (2021); ShapeTracer; ShapeIt (2024). Static or dynamic analysis over existing Python tensor programs. Useful for motivation and contrast.

Vasilache et al., “Tensor Comprehensions” (2018), and Scholz et al., “Tensor Comprehensions in SaC” (2020). Declarative tensor/index notation with range or shape inference and compilation to loops. TC is close on both surface motivation and compiler ambition: implicit index variables, RHS-only reduction indices, range inference, affine subscripts, strided convolution, grouped convolution, and fused production kernels. Its where annotations mark the same boundary OCANNL must mark: some extents are boilerplate and should be inferred; others encode real user intent and must be supplied.

Kjolstad et al., TACO (OOPSLA 2017), and Root et al., “Compilation of Shape Operators on Sparse Arrays” (OOPSLA 2024). Tensor algebra and shape-operator compilation, especially relevant when discussing lowering and future concatenation/layout work.

JAX shape polymorphism, ONNX shape inference, PyTorch dynamic shapes, MLIR Shape/Linalg, StableHLO type inference. Production ecosystems for symbolic shapes, operator shape functions, and indexing maps. These are not formal predecessors in the same sense, but they locate OCANNL in the engineering landscape.

PEP 646, jaxtyping, torchtyping, NumPy typing. User-facing annotation systems. Mention briefly as evidence that programmers want shape information near source code, but keep them out of the core technical comparison.

Conclusion

The literature around shape inference is rich because “shape” is doing many jobs. It can be a type-level index, a refinement, a runtime assertion, a static-analysis fact, a graph metadata field, a symbolic compiler expression, a loop bound, or an index domain. OCANNL’s niche is defined by refusing to collapse those jobs into one. A solved shape is not a projection map. A size equality is not a local co-iteration fact. A principal residual solution is not the same thing as an allocatable closed shape. A concatenated axis is not merely a larger integer range; it is a sum with injections.

That perspective makes the related work easier to read. Remora explains the rank-polymorphic type-theory neighborhood. Futhark and DML explain the size-dependent neighborhood. Star explains the algebraic shape/index split. Axon and gradual tensor checking explain modern DL shape inference. PyTea and Pythia explain the static-analysis pressure from real Python code. Tensor Comprehensions, SaC, TACO, MLIR, JAX, ONNX, PyTorch, and StableHLO explain the compiler and production-IR environment.

OCANNL is not the first system to infer shapes, type shapes, propagate shapes, or compile tensor notation. Its claim should be narrower and stronger: it combines broadcast-aware row inference, tensor-specific size arithmetic, rank-polymorphic operation specs, and canonical local projection inference in one DSL pipeline. From OCANNL’s perspective, shape inference is not a prelude to checking. It is the first half of code generation. The second half is remembering that a shape is not its index.