# Mesa Phase 7 v2 - Large-Tier Cliff-Subset Result Note

This document records the Phase 7 v2 DOWN-SCOPE result note for the
Large-tier cliff-subset batch defined in
[`PHASE7_SPEC.md`](PHASE7_SPEC.md) §14 (DOWN-SCOPE addendum) and the
Path B hparam investigation in
[`PHASE7_V2_PATH_B_HPARAM_SPEC.md`](PHASE7_V2_PATH_B_HPARAM_SPEC.md).

Phase 7 v2 is a sibling to Phase 7 v1
([`PHASE7_RESULTS.md`](PHASE7_RESULTS.md)). It does not regenerate the v1
22-policy classification and does not change v1 verdicts. v2 extends the
operating envelope to a third capacity tier and surfaces failure modes
the v1 envelope did not see.

Status: Phase 7 v2 cliff-subset batch **complete**. Six Large-tier
policies evaluated at 32 seeds, with `seed_start=10000` as the canonical
seed and `seed_start ∈ {20000, 30000}` for Stage 3 triangulation on
`λ=0.90`. PPO training-time hparam pin: `--value-coef 0.25` for all
L-Mixed and L-Reward cells (signature_terminal uses defaults).

## 1. Summary

Three findings update the gravity claim's earned wording for capacity
scaling:

1. **Forward L-Signature canonical extends to Large.** signature_terminal
   at Large reaches success 0.750 / alignment 0.993 / mean_steps 117 at
   5M env-steps. Capacity does not break the forward shape; it holds at
   Small (37/64, 0.963), Medium (64/64, 0.999), and Large (24/32, 0.993).
2. **Medium single-cliff broadens to a Large U-trough.** The v1 Medium
   cliff at `λ ≈ 0.953` becomes, at Large, a broader trough at
   `λ ∈ {0.95, 0.97}` (success 0.031–0.094, alignment 0.506–0.599)
   bracketed by a field-coupled `λ=0.90` cell (0.531 / 0.934) and a
   *recovered* `λ=0.99` cell (0.406 / 0.885) with only a 1% signature
   anchor.
3. **At Large, signature anchoring is constitutive of learnability.**
   Pure-reward `λ=1.00` at Large bootstrap-fails across three
   consecutive segments at alignment 0.003 / success 0 / mean_steps 200.
   With a 1% signature mixture (λ=0.99) the controller recovers. At
   Large the signature is no longer merely protective against basin
   internalization — it is the seed the value function needs to start
   learning at all.

## 2. Envelope

Large tier, 32-seed eval @ `seed_start=10000` except where noted:

| λ (reward weight) | training spec | env-steps | success | mean alignment | mean_steps | verdict |
| ---: | --- | ---: | ---: | ---: | ---: | --- |
| 0.00 (signature_terminal) | seg3 baseline, defaults | 5M | 0.750 | 0.993 | 117.0 | converged |
| 0.90 | vc=0.25, seed=10000 | 10M | 0.531 | 0.934 | 145.1 | borderline hold (adopted) |
| 0.95 | vc=0.25, seed=10000 | 10M | 0.094 | 0.599 | 193.3 | under-budget |
| 0.97 | vc=0.25, seed=10000 | 10M | 0.031 | 0.506 | 195.8 | trough |
| 0.99 | vc=0.25, seed=10000 | 10M | 0.406 | 0.885 | 163.2 | converged (1% anchor) |
| 1.00 (reward_phase3) | vc=0.25, segmented chain | ~5M | 0.000 | 0.003 | 200.0 | bootstrap-failed |

Stage 3 triangulation for `λ=0.90` across three seeds: `seed=10000`
0.531, `seed=20000` 0.406 (strict ±0.10 fail by 0.025), `seed=30000`
0.625. Two of three within band; the borderline-hold verdict is adopted
with a protocol-tightening note for v3.

## 3. Artifacts

Raw runs:

`results/mesa/phase7v2-large-cliff-subset/`

Key directories:

- `mixed_0_90_pathb_vc0_25/` — adopted candidate run @ seed=10000
- `mixed_0_90_pathb_vc0_25_seed20000/` — Stage 3 first triangulation
- `mixed_0_90_pathb_vc0_25_seed30000/` — Stage 3 second triangulation
- `mixed_0_95_vc0_25/` — λ=0.95 cell
- `mixed_0_97_vc0_25/` — λ=0.97 cell (trough)
- `mixed_0_99_vc0_25/` — λ=0.99 cell (1% anchor recovery)
- `reward_phase3_vc0_25_chain/seg{1,2,3}/` — pure-reward bootstrap-
  failure chain
- `mixed_0_90_pathb_lr1e4/` — Path B candidate 1, rejected at Stage 1
  (mode collapse: success 0.219, alignment 0.832, log_std -1.319)

The signature_terminal baseline lives at
`results/mesa/phase7v2-large-conv-10m/seg3/`.

## 4. Three Large-tier failure modes

The v2 envelope separates three distinct PPO failure modes at Large that
the v1 Medium envelope did not surface:

1. **Critic destabilization** (mixed_0_90 default hparams). Pre-Path-B
   ext3 chain: entropy 2.29 → 2.76 monotone-climb, `log_std` -0.28 →
   -0.04 expansion, value_loss spike to 700+, mean_reward drop -1.83 →
   -2.09. Default-hparam Large L-Mixed at λ=0.90 destabilizes by ~8M
   env-steps. Fixed by `--value-coef 0.25`.
2. **Mode collapse** (Path B candidate 1, `--lr 1e-4`). Stage 1 fail at
   success=0.219, alignment=0.832: high alignment with under-success is
   the signature of `log_std` over-suppression (end-of-training -1.319,
   factor-of-3 below init). Solved at hparam selection by switching to
   `--value-coef 0.25` (candidate 2).
3. **Bootstrap failure** (reward_phase3, vc=0.25 segmented chain).
   Three consecutive segments at alignment 0.003 / success 0 / mean_steps
   200 across ~5M env-steps. seg2 dynamics: entropy 3.022, `log_std`
   +0.093 above init, value_loss max 960.5, mean_reward avg -4.048. No
   value-target signal, critic never finds purchase, policy entropy
   expands instead of contracting. Pure L-Reward at Large does not
   bootstrap.

## 5. U-shape cliff finding (v3-amended; see §12.1)

The Medium envelope had a single sharp cliff at `λ ≈ 0.953` between
protected and collapsed. The Large envelope has a **broader U-trough**:

- `λ ∈ {0.95, 0.97}` is the trough (success 0.031–0.094, alignment
  0.506–0.599).
- `λ=0.90` holds above the trough (success 0.531, alignment 0.934).
- `λ=0.99` *recovers* above the trough (success 0.406, alignment 0.885)
  — the 1% signature anchor is enough.
- `λ=1.00` (no anchor at all) collapses to bootstrap failure.

The recovery at `λ=0.99` followed by catastrophic failure at `λ=1.00` is
the load-bearing observation. A 1% signature mixture is sufficient to
seed the value function and enable learning; without any anchor the
critic diverges. **Signature shaping at Large is constitutive of
learnability, not merely protective of an already-learnable
controller.**

## 6. Forward L-Signature confirmed at Large (v3-amended earned-reading; see §12.2)

P2 from v1 ("terminal-signature is the forward L-Signature canonical")
extends cleanly to Large. signature_terminal at 5M env-steps reaches
success 0.750 / alignment 0.993 / mean_steps 117 — competent, field-
attached, no anchor needed. The forward shape now holds at three tiers:
Small (37/64, 0.963), Medium (64/64, 0.999), Large (24/32, 0.993).

## 7. Multi-substrate acceptance receipt

Path B adopted candidate 2 (`--value-coef 0.25`) under the 3-stage test:

| stage | substrate | adopted-candidate evidence |
| --- | --- | --- |
| 1 | success_rate floor (≥ 0.40) | 0.531 @ seed=10000 — pass |
| 2 | dynamics tail (entropy & `log_std` monotone-decreasing, final 50 updates) | end entropy 1.42, end `log_std` -0.57; both monotone-decreasing — pass |
| 3 | seed-shifted ±0.10 of seed=10000 | 0.406 @ seed=20000 (strict-fail 0.025); 0.625 @ seed=30000 (in-band) — borderline pass under triangulation |

The Stage 3 strict-fail at seed=20000 is the v2 envelope's largest open
edge. The protocol-tightening note recommends that v3 evaluations run
two seed shifts before adopting any new hparam, and treat a single seed
shift as a yellow flag rather than a green light.

## 8. v2 traceability labels (sibling to v1 §11) (v3-amended; see §12.3 for the probe-confirmed label table)

The Large cliff-subset in v2 vocabulary:

| cell | v2 traceability label | rationale |
| --- | --- | --- |
| signature_terminal Large | `field-coupled` | Competent terminal-signature controller, no anchor needed |
| `λ=0.90` Large (vc=0.25, seed=10000) | `field-coupled` | Stage 1 + 2 pass; mean alignment 0.934 |
| `λ=0.90` Large (seed=20000) | `borderline` | Stage 3 strict-fail by 0.025 |
| `λ=0.90` Large (seed=30000) | `field-coupled` | In-band Stage 3 triangulation |
| `λ=0.95` Large | `reward-coupled` (profile) | Low alignment 0.599, low success — trough; consistent with v1 collapse class but not probe-confirmed |
| `λ=0.97` Large | `reward-coupled` (profile) | Same as λ=0.95 |
| `λ=0.99` Large | `field-coupled` | Recovered via 1% signature anchor |
| `λ=1.00` Large | `undertrained` (bootstrap-failure) | Never reached competent behavior; distinct failure mode from collapse |

The `reward-coupled (profile)` qualifier indicates the label is derived
from training-time and eval-time profile alone; v2 did not run the
Phase 3 probe-slate or Phase 4 intervention battery on Large cells. A
probe-confirmed label requires extending those harnesses to Large, which
remains a v3 candidate.

## 9. Earned reading

The earned wording for v2:

> At Large capacity (5M params) in the tested shadow-field navigation
> family, signature-anchored controllers preserve field attachment
> across a broader λ-pocket than at Medium, with a U-shape trough
> centered at λ≈0.96 rather than a single sharp cliff. A 1% signature
> mixture (λ=0.99) is sufficient to recover field-coupling; with no
> signature at all (λ=1.0, pure reward) the controller fails to
> bootstrap. Signature shaping at Large is constitutive of learnability,
> not merely protective.

Do not promote v2 to: universal capacity scaling claims (Large is one
tier, single seed for most cells); claims about non-PPO algorithms
(vc=0.25 is a PPO-specific knob); claims about non-toy environments
(mesa shadow-field only). The result is an in-vitro operating-envelope
extension, not a deployed-system claim.

## 10. Open edges for v3 (v3-amended; see §12.4 for the closed/carried-forward disposition)

- Single seed per non-λ=0.90 cell. v3 should triangulate `λ=0.95` and
  `λ=0.97` at two additional seeds before claiming the trough.
  (Deferred to v3.1; the v3 spec covers intervention-battery
  disambiguation first.)
- Path B did not exhaust the candidate ladder. `--entropy-coef 0.001`
  and `--clip-range 0.1` (candidates 3 and 4 in
  [`PHASE7_V2_PATH_B_HPARAM_SPEC.md`](PHASE7_V2_PATH_B_HPARAM_SPEC.md) §3)
  remain untested; they could relax the Stage 3 borderline at λ=0.90.
- v1 Phase 6 `net.7` patching has no Large-tier analog. The actor's
  final hidden layer at Large is a different size (5M params vs Medium's
  ~250k); the Phase 6 entangled-subspace finding may or may not
  generalize. (Separate Phase 6.x v3 task; not folded into the v3
  intervention battery.)
- Bootstrap-failure mechanism is consistent with critic-target collapse
  but not yet probe-confirmed. A canonical probe-slate run on the
  reward_phase3 chain seg3 checkpoint would resolve whether this is
  truly `undertrained` or a hidden `reward-coupled` failure mode.
  (Covered by GG5 in [`PHASE7_V3_SPEC.md`](PHASE7_V3_SPEC.md) §5.)
- Probe-slate and intervention-battery harnesses (v1 Phases 3 and 4) do
  not yet run on Large; v2 labels for the L-Mixed trough cells are
  profile-based, not probe-confirmed. **v3 spec at
  [`PHASE7_V3_SPEC.md`](PHASE7_V3_SPEC.md) (2026-05-18) closes the
  intervention-battery half of this edge.** Probe-slate extension to
  Large remains a separate v3.x candidate.

## 11. Versioning

- **v2 (2026-05-18)** — initial Phase 7 v2 result note. Aggregates 6
  Large-tier cliff-subset policies (signature_terminal, mixed_0.90,
  mixed_0.95, mixed_0.97, mixed_0.99, reward_phase3 chain), records the
  Path B adopted candidate (`--value-coef 0.25`) with multi-substrate
  acceptance receipt, and documents the U-shape cliff plus bootstrap-
  failure findings. Sibling to v1, not a successor.
- **v2.1 (2026-05-28)** — v3-driven amendments applied per
  [`PHASE7_V3_RESULTS.md`](PHASE7_V3_RESULTS.md) §6. No metric numbers
  in §2 or §5 change. §5 reframed (trough mechanism), §6 sharpened
  (earned reading), §8 four labels upgraded from `(profile)` to
  probe-confirmed (mixed_0_95 / mixed_0_97 / mixed_0_99 / reward_phase3),
  §10 intervention-battery edge closed and three edges carried forward.
  Full amendment text appended as §12 below.

## 12. v3-driven amendments (2026-05-28)

Source: [`PHASE7_V3_RESULTS.md`](PHASE7_V3_RESULTS.md) §6. The v3
intervention battery against the six Large cliff-subset cells in this
v2 note produced a different mechanistic interpretation than the v2
profile-only labels suggested. The v3 receipt names four amendments to
this v2 note, applied below. Numbers in v2 §2 (the eval-summary
envelope) are unchanged; the amendments are reframings, label upgrades,
and edge closures, not metric corrections.

The v3 amendments are append-only: the original v2 §5, §6, §8, and §10
text remains as filed for the historical record, and the section
headers above carry `(v3-amended; see §12.x)` markers pointing the
reader at the authoritative current interpretation in this section.

### 12.1 §5 U-shape cliff finding — reframed

**Geometry stands; mechanism reframed.** The terminal-alignment U-shape
in v2 §2 and §5 (success and mean alignment numbers across
`λ ∈ {0.90, 0.95, 0.97, 0.99, 1.00}`) is unchanged. The v3 intervention
battery against the six cells closes the v2 caveat in favor of the
**field-coupled-but-under-budget** reading of the trough cells:

- `λ ∈ {0.95, 0.97}` Large are *field-coupled*, with healthy
  signature-sensor responses (`sig_resp_L2` 0.580 and 0.973
  respectively; mixed_0_97 is the highest signature response of any
  cell in the v3 battery, exceeding signature_terminal Large at
  0.593). `old_basin_pref` is moderately elevated (1.47–1.52) — above
  the Medium hold-cell upper bound (0.823) but far below the Medium
  collapse floor (5.560). These cells read the external signature and
  are not basin-internalized; they fail to navigate competently at the
  tested mixture weight. The trough is therefore a **third class**
  beyond v2's pre-registered "field-coupled" and "collapse" branches:
  `field-coupled, under-budget`.
- `λ=0.99` Large (`sig_resp_L2` 0.579, `bp_obp` 0.459) is genuine
  basin-attractor avoidance, not co-pointing fixed-attractor collapse.
  The v2 caveat ("can the eval-summary recovery be distinguished from
  collapse onto a fixed attractor that happens to co-point with the
  basin?") closes by probe-confirmation in favor of basin-attractor
  avoidance.
- `λ=1.00` (reward_phase3 seg3) bootstrap-fails in a *structured* way:
  zero intervention response across every channel (the policy is a
  fixed open-loop trajectory) but `old_basin_pref = 7.050` (the
  trajectory consistently terminates at the old `x_false`). This is
  not "never reached competent behavior" — it is convergence on a
  degenerate basin-attracted fixed trajectory. The new traceability
  label is `bootstrap-collapse` (probe-confirmed), distinct from both
  the spec's hypothesised `undertrained` and from the Medium
  `reward-coupled` collapse class.

The v2 §5 framing of the trough as collapse-class-shaped (via the
"consistent with v1 collapse class" qualifier in §8 and the §5 bullet
list pairing the trough with the `λ=1.00` collapse bullet) does not
survive v3. The trough has hold-class intervention dynamics — healthy
signature responsiveness, `old_basin_pref` far below collapse floor —
combined with sub-budget navigation; it is its own class.

### 12.2 §6 Earned reading at Large — sharpened

The §6 conclusion "Signature shaping at Large is constitutive of
learnability, not merely protective of an already-learnable controller"
is sharpened by v3 to:

> Signature shaping at Large is constitutive of *competent*
> learnability. With `λ ∈ [0.90, 0.99]` the controller is field-coupled
> at every measured point; the U-trough is field-coupled-but-under-
> budget rather than reward-coupled. Only at `λ = 1.00` does the
> controller fail to bootstrap, and even then it converges on a
> degenerate basin-attracted fixed trajectory (`bootstrap-collapse`)
> rather than an un-trained policy.

Field-coupling at Large is robust across the L-Mixed family at
`vc=0.25`: signature-sensor responses sit above or near the Medium
L-Signature reference (0.343) at every measured `λ`, including the
under-budget trough cells. The earned wording for v2.1 therefore
distinguishes *field-coupling* (a probe-confirmed property of every
non-bootstrap-failed cell) from *competent navigation* (a function of
where the controller sits on the L-Mixed mixture-weight curve, with
the U-trough as a measured under-budget zone).

The v2 §6 echoing sentence in §1 finding 3 and §5's load-bearing
conclusion inherits the same sharpening implicitly; v3 §6 names only
this §6 in scope, but readers citing the v2 result note should use the
sharpened wording wherever the original "constitutive of learnability"
phrasing appears.

### 12.3 §8 traceability labels — upgraded to probe-confirmed

The v2 §8 `reward-coupled (profile)` and `undertrained
(bootstrap-failure)` labels were filed pending the Phase 4-style
intervention battery that v3 ran. The v3 receipt upgrades the
following four labels (cells not listed are unchanged):

| cell | v2 label (profile) | v2.1 label (probe-confirmed) |
| --- | --- | --- |
| `λ=0.95` Large (mixed_0_95) | `reward-coupled (profile)` | **`field-coupled, under-budget`** (probe-confirmed) |
| `λ=0.97` Large (mixed_0_97) | `reward-coupled (profile)` | **`field-coupled, under-budget`** (probe-confirmed) |
| `λ=0.99` Large (mixed_0_99) | `field-coupled` (recovered, profile) | **`field-coupled`** (probe-confirmed, basin-attractor avoidance) |
| `λ=1.00` Large (reward_phase3) | `undertrained (bootstrap-failure)` | **`bootstrap-collapse`** (probe-confirmed) |

The labels `field-coupled, under-budget` and `bootstrap-collapse` are
v3 extensions to the v2 traceability vocabulary in
[`../SUNDOG_V_MESAV2.md`](../SUNDOG_V_MESAV2.md) §11. Cells previously
labelled `field-coupled` (signature_terminal Large, `λ=0.90` Large
seed=10000, `λ=0.90` Large seed=30000) remain `field-coupled` and are
now probe-confirmed by the v3 battery's GG1/GG2 PASS verdicts;
borderline `λ=0.90` Large seed=20000 remains `borderline`.

### 12.4 §10 Open edges for v3 — disposition

The v2 §10 open-edge list disposes as follows after v3:

| §10 edge | v3 disposition |
| --- | --- |
| Single seed per non-λ=0.90 cell; seed-shift triangulation of `λ=0.95 / 0.97 / 0.99` | **Carried forward.** v3 ran 64 seeds × 5 channels per cell at `seed_start=10000`, but did not triangulate the canonical seed. v3.1 candidate. |
| Path B candidates 3 (`--entropy-coef 0.001`) and 4 (`--clip-range 0.1`) untested | **Carried forward.** v3 does not address training-side hparam selection; v3.x training-side investigation. |
| v1 Phase 6 `net.7` patching has no Large-tier analog | **Partially closed by Phase 6b, then re-opened.** Phase 6b (filed 2026-05-18) ran activation patching on the v3 cliff pair (mixed_0_99 vs mixed_0_97) at Large `net.9`; GG6b-loc-D called destructive transfer at every MLP layer (`PHASE6B_RESULTS.md`). The v1 protocol does not generalize to Large for this cliff pair. The mechanism question remains open under a different protocol (multi-layer co-injection, alternative cliff-pair selection, or non-MLP architectural probes). |
| Bootstrap-failure mechanism not probe-confirmed | **Closed.** v3 GG5 ran the probe-slate-style intervention battery on reward_phase3 seg3 and labelled the result `bootstrap-collapse`. |
| Probe-slate (Phase 3) and intervention battery (Phase 4) harnesses do not run on Large | **Intervention-battery half closed by v3.** Probe-slate extension to Large remains a separate v3.x candidate. |

The v3 receipt also surfaced a Phase 6-style side-finding
(observation-channel sensitivity at the trough — v3 §7) that motivated
the Phase 6b spec and result note. The §7 observation-pathway hint
remains a candidate question for a future protocol; Phase 6b falsified
the single-layer activation-patching approach but did not close the
mechanism question.