Regular
(4 tasks)1. Cycle Navigation
L=32Navigate a circular ring of states
using a sequence of forward and backward instructions. Starting at state 0 on a cycle of
num_states positions, each instruction moves the pointer one step clockwise
or counter-clockwise. The output is the final state index. This tests whether a model
can learn modular counting.
2. Even Pairs
L=32Given a sequence of symbols,
determine whether the entire sequence consists of identical adjacent pairs. For example,
a a b b a a is “yes” because every consecutive pair matches,
while a b a a is “no”. Odd-length sequences are always
“no”. A classification task outputting a single binary label.
3. Modular Arithmetic
L=31Evaluate flat arithmetic expressions
(no parentheses) under a modulus. The input is an alternating sequence of operands and
operators (+, -, *) evaluated strictly
left-to-right, and the output is the result mod n. Because the state space
is bounded by the modulus, a finite automaton suffices.
4. Parity Check
L=32Count how many times a specific target symbol appears in the input sequence and output whether that count is even or odd. The model must track the running parity of occurrences — a task that maps directly to a two-state finite automaton.
Context Free
(5 tasks)5. Dyck N (n=4)
L=32Determine whether a sequence of
brackets is properly balanced, using n distinct bracket types. A valid Dyck
word requires every opening bracket to be matched with the correct closing bracket in
proper nesting order. This is the prototypical context-free language — recognizing
it requires a stack to track which bracket types are open.
6. Nested Modular Arithmetic
L=32Like Modular Arithmetic, but expressions now include nested parentheses up to a configurable depth. The model must evaluate inner sub-expressions first and carry intermediate results back to outer expressions, requiring a stack to track nesting structure and pending computations.
7. Reverse String
L=32Given an input sequence of tokens,
produce the same tokens in reverse order. Reversing requires the model to buffer the
entire input before emitting any output — a canonical stack operation. The input
a b c d should produce d c b a.
8. Solve Equation
L=32Find the value of x in
a simple linear equation a * x + b = c, where all values are bounded
integers. The model must implicitly invert the linear relationship — a form of
symbolic reasoning over a constrained algebraic structure.
9. Stack Manipulation
L=32Execute a sequence of explicit
push(x) and pop instructions on a stack, outputting the
top-of-stack value after each step (0 if empty). This directly tests LIFO data structure
semantics, since correct output at each timestep depends on the full history of pushes
and pops.
Context Sensitive
(8 tasks)10. Associative Recall
L=12Given key-value pairs followed by a query key, retrieve the associated value. This tests content-addressable memory: the model must store all pairs, then perform a lookup by matching the query key — the core operation of an associative memory or hash table.
18. Count N
L=30Validate that a sequence consists of
n symbol types each appearing exactly k times in consecutive
blocks: s1k s2k … snk. This is a
classic context-sensitive language — recognizing it requires comparing counts
across multiple symbol groups simultaneously.
19. Deduplicate Inputs
L=32Given a stream where each symbol is
repeated k times consecutively, extract the unique sequence. The model must
learn to skip redundant repetitions — a form of streaming compression that
requires tracking position within each group.
20. Duplicate String
L=32Given an input string
w, produce ww — concatenation of the string with itself.
This is context-sensitive because the model must copy the input verbatim while
remembering where the first copy ends and the second begins, requiring more than a
single stack.
21. Missing Duplicate
L=32Given a sequence where every element appears exactly twice except one singleton, identify the unpaired element. The input is shuffled, so the model must maintain counts or use XOR-like cancellation across the entire input to isolate it.
22. Odds First
L=32Reorder a sequence so all odd-valued elements come first (preserving relative order), followed by even-valued elements (also in order). This is a stable partition — the model must route elements to two groups based on a predicate while preserving within-group ordering.
23. Repeat Copy N
L=10Reproduce an input pattern
N times, where N is provided as the first token. This
generalizes Duplicate String to a variable repeat count, testing whether the model can
learn a counting loop that controls how many times to replay the memorized pattern.
24. Square Root
L=19Compute the integer floor square root of a number given as LSD-first decimal digits. The model must reconstruct the number, compute the square root, and re-encode the result as digits — a multi-step numerical computation.
Binary
(7 tasks)11. Binary Addition (8-bit)
L=8Add two binary numbers in LSB-first
format. Operands are interleaved as [a0, b0, a1, b1, ...] and the output is
the sum bits including carry. The model must learn ripple-carry addition, propagating
carry information across all bit positions.
12. Binary Addition (16-bit)
L=16Add two binary numbers in LSB-first
format. Operands are interleaved as [a0, b0, a1, b1, ...] and the output is
the sum bits including carry. The model must learn ripple-carry addition, propagating
carry information across all bit positions.
13. Binary Addition (32-bit)
L=32Add two binary numbers in LSB-first
format. Operands are interleaved as [a0, b0, a1, b1, ...] and the output is
the sum bits including carry. The model must learn ripple-carry addition, propagating
carry information across all bit positions.
14. Binary Addition (64-bit)
L=64Add two binary numbers in LSB-first
format. Operands are interleaved as [a0, b0, a1, b1, ...] and the output is
the sum bits including carry. The model must learn ripple-carry addition, propagating
carry information across all bit positions.
15. Binary Multiplication (8-bit)
L=8Multiply two binary numbers in LSB-first format with interleaved input encoding. The output is the product bits. This is substantially harder than addition, requiring shift-and-add operations with carries that depend on partial products from all input positions.
16. Binary Multiplication (16-bit)
L=16Multiply two binary numbers in LSB-first format with interleaved input encoding. The output is the product bits. This is substantially harder than addition, requiring shift-and-add operations with carries that depend on partial products from all input positions.
17. Binary Multiplication (32-bit)
L=32Multiply two binary numbers in LSB-first format with interleaved input encoding. The output is the product bits. This is substantially harder than addition, requiring shift-and-add operations with carries that depend on partial products from all input positions.
Data Processing
(3 tasks)25. Mini Shrdlu
L=32A blocks-world planning task. Given an initial grid of numbered blocks stacked under gravity and a set of spatial constraints (above, below, left-of, right-of), produce the target board configuration. The target is reachable via valid moves (pick top block, place atop another column). Tests spatial reasoning and implicit planning.
26. Python Execution
L=32Predict the output of a simple
Python-like program: x = a; for _ in range(n): x = x op b; print(x). The
model must simulate loop execution step by step, tracking the accumulator through
multiple iterations — a test of sequential program trace prediction.
27. Sort
L=32Sort a sequence of integers in ascending order. While conceptually simple, learning to sort from examples alone requires the model to discover comparison-based ordering — an algorithmic capability that must generalize across sequence lengths.
Graphs Geometry
(6 tasks)28. Convex Hull
L=30Given a set of 2D points, output a binary mask indicating which points lie on the convex hull — the smallest convex polygon enclosing all points. The model must determine whether each point is an extreme point that cannot be expressed as a convex combination of others.
29. Delaunay
L=30Given a set of 2D points, output the Delaunay triangulation as triangle vertex triples. In a Delaunay triangulation, no point lies inside the circumscribed circle of any triangle — this maximizes the minimum angle. The model must learn to partition the plane into triangles satisfying this geometric optimality criterion.
30. Graph Traversal
L=30Given an unweighted graph and source
node, output the BFS visit rank of each node. The model must learn level-by-level
expansion — exploring all nodes at distance d before moving to
d+1.
31. Mst Prim
L=30Given a weighted undirected graph, output its minimum spanning tree edges. Ground truth uses SciPy. The model must learn to greedily select the lowest-weight edge connecting a new node to the growing tree without forming cycles.
32. Shortest Path
L=30Given a weighted undirected graph and a source-target pair, output the shortest path as a node sequence. Ground truth uses Dijkstra’s algorithm. The model must learn implicit graph search — propagating distance estimates to reconstruct the optimal path.
33. Tsp
L=30Given 2D cities with integer coordinates, output a tour using the nearest-neighbor heuristic: starting from city 1, always visit the closest unvisited city. The model must learn to simulate this greedy algorithm — selecting the minimum-distance unvisited neighbor at each step.