# Sundog Gravity Ledger

Working hook:

> Sundog is gravity for agents under partial observability: a geometry-derived
> background field that deflects policy without ever being a target.

Sundog Gravity is the staging ledger for the most outlandish public claim in
the program — that signature-driven control sidesteps Goodhart's law because
the agent is moving through a field rather than optimizing a metric — and for
the high-cost horizon proof targets that could falsify it.

This document is not a roadmap. It is a holding pattern for ambition. Each
candidate listed below would, if completed, either ratchet the gravity claim
into earned language or push it back to the "unsupported universal" pile in
`presentation/claims-and-scope.md`. None of them have run yet.

The closest existing anchors are the photometric mirror-alignment experiment in
this repo and the three-body workbench operating-envelope result. Both
demonstrate the structural shape — indirect signal, transformed signature,
bounded control — without making the gravity claim. This ledger is where the
claim is staged for later, expensive defense.

### Promotions

The following candidates have been promoted out of this ledger into roadmap
documents. Their entries below are retained for archival reference and for
their detailed Sundog-expression blocks, but they are no longer first-priority
targets here:

- **Candidate 1 - Formal Separability Theorem** → promoted as Appendix A of
  [`SUNDOG_V_MESA.md`](SUNDOG_V_MESA.md).
- **Candidate 2 - Mesa-Optimization Trap** → promoted as the empirical front
  of [`SUNDOG_V_MESA.md`](SUNDOG_V_MESA.md).
- **Candidate 7 - Causal Intervention Test** → promoted as Phase 4 of
  [`SUNDOG_V_MESA.md`](SUNDOG_V_MESA.md).

The following candidates depend on infrastructure built by
`SUNDOG_V_MESA.md` and are sequenced accordingly:

- **Candidate 4 - Manipulation-Cost Ladder** — inherits Phase 3 probe slate,
  Phase 4 intervention battery, Phase 7 sweep harness.
- **Candidate 5 - Adversarial Signature Benchmark** — inherits controller-
  family architecture and operating-envelope harness.
- **Candidate 6 - Cross-Domain Invariance Battery** — uses mesa controller
  families and probe slates as one domain entry.
- **Candidate 10 - Conservation-Law Domain** — uses the controller-family
  architecture and harness once a specific physical domain is chosen.

## Claim Boundary

This document does **not** claim that Sundog has demonstrated reward-hacking
immunity, adversarial robustness, or general agent safety under hostile
conditions. It claims that:

1. there is a coherent structural argument — the Goodhart sidestep — for why
   signature-driven control should differ from metric-driven optimization in
   adversarial settings;
2. that argument is currently defended by analogy and by the bounded three-body
   operating-envelope result, not by adversarial benchmarks;
3. the proof targets that would test the argument are expensive enough that
   they need to live in a ledger before they live in a roadmap.

If a candidate below is promoted into a full roadmap document like
`SUNDOG_V_BALANCE.md` or the three-body roadmap, it leaves this ledger.

## The Gravity Claim

Stated as a hook:

> Reinforcement learning gave agents a reward to optimize. Sundog gives them a
> field to fall through. The first invites Goodhart. The second has no metric
> to corrupt.

Sister formulations:

> An agent that reads tidal gradients cannot reward-hack gravity. The mass is
> where the mass is.

> Optimization made the agent face the target. Sundog makes the target part of
> the geometry the agent moves in.

Stated as a structural argument:

A reward-trained agent and a signature-trained agent inhabit different threat
models. Reward optimization places the objective inside the agent: a learned
value function, a critic, a numerical target the agent can corrupt by finding
inputs that score high without satisfying the designer's intent. Signature
control places the objective inside the environment's geometry. The tidal
tensor at a point is a function of the masses, not the agent's policy. The
shadow centroid is a function of the light source and the body, not the
controller. The agent is deflected by the field, not pulled toward a score.

The Sundog claim, in its most ambitious form, is that this structural
difference matters. That an agent moving through a field cannot reward-hack the
field the way an agent staring at a metric can corrupt the metric.

## The Goodhart Sidestep

The original unhinged intuition of the Sundog program — preserved in early
manifesto language and only recently reconnected to the controlled work — was
that indirect-signal control should be structurally less susceptible to
specification gaming than direct-reward control. That intuition was set aside
during the discipline pass because the program had no controlled result to
attach it to. The three-body operating-envelope result and the photometric
mirror-alignment experiment provide enough scaffolding to put it back on the
table as a research target rather than a slogan.

The mechanism, named carefully:

- A target metric `R(s, a)` is a function the agent participates in. Mesa
  optimization, reward-model exploitation, and proxy collapse are all variants
  of the same failure: the agent finds policies that score high on `R` while
  failing the designer's actual objective.
- A signature observation `S(x)` is a function of environmental state `x`
  alone. The agent reads `S` and selects actions whose effect on `S` is
  favorable by a fixed geometric definition (move toward lower tidal
  magnitude, hold shadow centroid in band, ride the pressure gradient). The
  agent does not select `S`.
- For Goodhart to bite, the agent must have a path to alter the measurement
  itself. In `R(s, a)` that path is built into the definition. In `S(x)`,
  altering the measurement requires altering the environment's geometry, which
  is expensive or impossible.

This is not a proof of adversarial robustness. It is a structural argument for
why the threat model differs. The candidates below are designed to test whether
that structural difference produces a measurable behavioral difference or a
cleaner formal boundary.

## The Three-Body Wedge

The three-body workbench is the audience-conceptualizable entry point for the
gravity claim because the gravity is literal. A controller reads the tidal
tensor — the local second derivative of a real gravitational potential — and
maintains regime in a high-velocity near-escape pocket. The audience does not
need to grant a metaphor. They watch a controller fall through real gravity
without being told where the masses are, and they understand on first contact
what is meant by "the field is the objective."

The wedge is rhetorical: lead public communication with the three-body
controller, then say "now imagine this is what we mean by 'gravity for agents'
in every other partially-observed environment we name."

The wedge is also methodological. The three-body sensor-tier discipline —
privileged versus accelerometer-proxy versus delayed versus
micro-maneuver — is the template for how each empirical candidate in this
ledger must separate the signature from the simulator state. Without that
separation, a "signature" quietly becomes a "reward" wearing a costume, and
the Goodhart sidestep collapses.

## Falsification Surface

The gravity claim breaks if any of the following are demonstrated:

1. **Field manipulation is cheap.** An adversary can shape the indirect
   signature itself (rearrange masses, paint a fake shadow, jam the tidal
   proxy) cheaply enough to steer a signature-driven controller into a chosen
   bad regime at lower cost than steering an equivalently-trained reward
   agent.
2. **The signature is a reward in costume.** Any candidate signature `S(x)`
   can be decompiled into a target-equivalent scalar that the agent is
   effectively optimizing, restoring the full Goodhart threat model.
3. **Mesa-optimization re-emerges.** A signature-driven agent, trained or
   selected at sufficient scale, develops internal optimization over a learned
   reward proxy and inherits the same failure modes.

Mode (2) is the most dangerous because it is reachable by hostile review
without running an experiment. The defense is geometric: a signature is
acceptable under this ledger only if its value at a point can be written as a
function of environmental state alone, with no dependence on agent policy.

Each candidate below must name which of (1), (2), or (3) it is attacking.

## Evaluation Criteria

A horizon proof target earns a place in this ledger if it satisfies most of the
following:

- **Real-cost target:** the experiment or proof is expensive enough that the
  gravity claim cannot be dismissed as cheap. A speculative claim that costs a
  weekend to test is a slogan, not a target.
- **Falsifiable in finite time:** the target can produce a result that ratchets
  the claim either up or down inside a defensible horizon.
- **Adversarial surface:** there is a named red team or named perturbation
  schedule, not just a passive baseline.
- **Sensor-tier discipline:** the signature is a function of environmental
  state, separable from the simulator's privileged state, in the same way the
  three-body workbench separates accelerometer-proxy tiers from oracle state.
- **Matched comparison:** at least one reward-trained or metric-driven baseline
  is run on the same slate under the same adversarial pressure.
- **Public legibility:** a non-expert can be told what was tested, what was
  measured, and what would have falsified the claim.

## Shortlist Recommendation

Current order of merit, with promotion status marked. The mesa front
(Candidates 1, 2, 7) is now staged in [`SUNDOG_V_MESA.md`](SUNDOG_V_MESA.md);
downstream candidates depend on infrastructure built there.

1. **Formal Separability Theorem** — *Promoted to SUNDOG_V_MESA.md Appendix A.*
   Attacks the hinge of the gravity claim: when is a signature structurally
   different from a reward?
2. **Mesa-Optimization Trap** — *Promoted to SUNDOG_V_MESA.md empirical front.*
   First-priority red-team experiment; attacks falsification mode (3).
3. **Spacecraft Trajectory Under Unmodeled Perturbation** — first-priority
   *physical* horizon experiment. Inherits three-body tooling; cleanest attack
   on falsification mode (1). Does not depend on the mesa roadmap.
4. **Manipulation-Cost Ladder** — *Depends on SUNDOG_V_MESA.md Phases 3–4, 7.*
   Becomes a sweep extension once mesa Phase 7 ships.
5. **Adversarial Signature Benchmark** — *Depends on SUNDOG_V_MESA.md
   architecture and harness.* Primarily an environment-design problem once
   mesa infrastructure is built.
6. **Cross-Domain Invariance Battery** — *Depends on SUNDOG_V_MESA.md as one
   domain entry.* Best pursued after at least one physical or adversarial
   candidate has a real result.
7. **Causal Intervention Test** — *Promoted to SUNDOG_V_MESA.md Phase 4.*
8. **Fluid / Wake Navigation** — strongest new physical signature candidate
   after spacecraft. Independent of the mesa roadmap.
9. **Embodied Robotics Under Denied State** — high-investment public proof
   that field-reading is not just simulation. Independent of the mesa roadmap.
10. **Conservation-Law Domain** — *Soft dependency on SUNDOG_V_MESA.md
    controller-family architecture.* Engineering-heavy; promote only after
    one concrete physical domain is chosen.
11. **Side-Channel Defense (stretch)** — highest stir, longest horizon.
    Independent of the mesa roadmap.

---

## Candidate 1 - Formal Separability Theorem

Working hook:

> Goodhart needs a handle. Sundog proves where the handle is missing.

### Why it is strong

This is the theoremic heart of the gravity claim. The empirical candidates can
show that signature-driven control behaves differently on particular slates,
but the formal separability theorem would name the structural condition under
which a signature is not merely a reward in costume.

The target is a conditional result, not a universal safety proof. A useful
version would define environment state `x`, action `a`, transition dynamics,
reward proxy `R(x, a)` or `R(o, a)`, signature map `S(x)`, and adversary
interventions over the reward channel, observation channel, signature sensor,
and environment geometry. The theorem would then bound the ways an agent or
adversary can change `S` without paying the cost of changing `x`.

That directly attacks falsification mode (2): if the signature can be
decompiled into an agent-selectable scalar objective, the gravity claim loses
its central distinction.

### Why it is weaker

A theorem can become too narrow to matter or too broad to be true. The danger
is publishing a proof of a sanitized toy condition and letting readers infer a
general immunity claim. The theorem has to carry its counterexamples with it:
controllable sensors, policy-dependent signatures, low-cost environmental
rewrites, and learned internal proxies must all be named as exits from the
guarantee.

### Sundog variant

Build a formal note and small executable witness suite:

- Define a class of environments where `S: X -> Sigma` is independent of the
  agent policy except through the real transition dynamics over `X`.
- Define matched reward proxies whose values can depend on agent action,
  policy, learned evaluator state, or measurement channel state.
- Define adversary budgets over reward editing, observation editing, signature
  sensor corruption, and environment-geometry manipulation.
- Prove a separation bound: within the stated class, reward corruption can be
  cheap while signature corruption requires either sensor compromise or
  environmental work.
- Ship counterexamples next to the theorem, so the boundary is visible rather
  than hidden in the assumptions.

### Sundog expression

- **Hidden target:** designer-intended environmental regime.
- **Indirect signal:** a signature map `S(x)` with no direct policy or action
  argument.
- **Transformation:** policy over signature-derived structure, not over a
  learned scalar reward target.
- **Actionable output:** theorem conditions, counterexamples, and executable
  toy witnesses.
- **Failure boundary:** the signature depends on policy, the sensor is cheap to
  corrupt, or the environment geometry can be cheaply rearranged.

### Falsification target

Mode (2): signature-is-reward-in-costume. A null result is "no meaningful
separation can be stated without assuming away the real Goodhart problem." A
positive result is a conditional theorem plus counterexamples that cleanly
separates signature manipulation from reward manipulation.

### Current recommendation

*Promoted to [`SUNDOG_V_MESA.md`](SUNDOG_V_MESA.md) Appendix A.* First-priority
intellectual target. This does not replace experiments; it tells the
experiments what they are trying to earn. The roadmap appendix is the active
working location; this entry is retained for archival reference.

---

## Candidate 2 - Mesa-Optimization Trap

Working hook:

> If Sundog secretly grows a reward inside itself, catch it doing so.

### Why it is strong

The current gravity claim already names mesa-optimization as a falsification
surface, but the first candidate list did not give it a direct experiment.
This candidate does. It asks whether a signature-driven agent, trained or
selected hard enough, eventually learns an internal proxy objective and inherits
the same Goodhart failure modes as reward-trained control.

That makes it the serious red-team companion to the formal theorem. If the
theorem says the external field is not a reward, the mesa trap asks whether the
agent reconstructs a reward internally anyway.

### Why it is weaker

This is methodologically hard. A failed attempt to find a mesa-objective is not
evidence that one cannot emerge. Interpretability tools may be too weak to
distinguish "tracks the signature robustly" from "tracks a learned shortcut
that happened not to break yet." The result must be framed as stress evidence,
not a proof of absence.

### Sundog variant

Construct a family of signature-control tasks and scale the learning pressure:

- **Small agents:** hand-coded or shallow signature trackers with little room
  for internal proxy formation.
- **Medium agents:** learned policies trained only on signature observations
  and success/failure selection.
- **Large agents:** higher-capacity policies trained across richer worlds,
  larger seed sets, and longer horizons.
- **Probe slate:** distribution shifts where common shortcuts decouple from
  the true external signature.
- **Intervention slate:** causal edits to suspected internal proxies while
  holding the external signature fixed.
- **Baseline:** matched reward-trained agents with comparable capacity and
  sample budget.

### Sundog expression

- **Hidden target:** true external regime represented by the environment's
  signature.
- **Indirect signal:** `S(x)` observed through the same sensor-tier discipline
  as the non-learned Sundog controller.
- **Transformation:** learned signature tracking under increasing capacity and
  selection pressure.
- **Actionable output:** action policy, representation probes, causal
  interventions, and failure-mode comparison against reward agents.
- **Failure boundary:** the learned policy follows an internal proxy when that
  proxy splits from the real signature.

### Falsification target

Mode (3): mesa-optimization re-emerges. A null result is "large
signature-trained agents fail under proxy-splitting shifts in the same way
reward-trained agents do." A positive result is not immunity; it is a measured
capacity range where signature tracking remains tied to the external field
more strongly than matched reward training.

### Current recommendation

*Promoted to [`SUNDOG_V_MESA.md`](SUNDOG_V_MESA.md) as the empirical front
(Phases 0–8).* First-priority red-team experiment; staged early so the gravity
claim is stress-tested before public language hardens. The roadmap is the
active working location; this entry is retained for archival reference.

---

## Candidate 3 - Spacecraft Trajectory Under Unmodeled Perturbation

Working hook:

> A controller that holds an orbit family by reading the tidal field cannot be
> spoofed by a falsified ephemeris.

### Why it is strong

The three-body workbench already establishes the sensor-tier discipline and
the operating-envelope shape. The spacecraft variant extends the same harness
into a regime where adversarial perturbation is naturally available: unmodeled
third-body effects, solar-radiation pressure, RF-jamming-degraded GPS, or
deliberately falsified position telemetry. The signature — local tidal field —
is structurally inaccessible to a typical sensor-jamming threat model because
rearranging the gravitational geometry of the solar system is not on the
adversary's action surface.

This makes it a clean attack on falsification mode (1): if the field cannot be
cheaply shaped, signature-driven control should retain regime longer than
ephemeris-driven control under matched jamming.

### Why it is weaker

The engineering burden is significant. A high-fidelity GMAT or STK harness is
defensible but not cheap; a real cubesat deployment is a multi-year program.
The adversarial model also has to be carefully drawn — a jammer that can
degrade *both* the privileged ephemeris and the accelerometer-proxy is not
testing the gravity claim, it is testing sensor redundancy.

### Sundog variant

Construct a matched orbit-holding task in a high-fidelity simulator:

- **Baseline A:** privileged ephemeris controller, full state from the
  simulator's truth model.
- **Baseline B:** ephemeris controller fed through a degraded perception
  pipeline (GPS-equivalent with adversarial jamming or spoofing).
- **Sundog:** accelerometer-proxy guarded TRACK extending the three-body
  workbench architecture into the chosen spacecraft regime.
- **Adversary:** a named perturbation schedule — RF jamming, ephemeris
  spoofing, unmodeled solar-radiation events — applied to all controllers on
  matched seeds.

### Sundog expression

- **Hidden target:** orbit-family geometry under unmodeled perturbation.
- **Indirect signal:** local tidal tensor estimate from accelerometer-array or
  micro-maneuver proxy.
- **Transformation:** SCAN/SEEK/TRACK over tidal magnitude and direction, with
  guard quantiles inherited from the three-body Phase 11 hazard-gate result.
- **Actionable output:** delta-v command toward signature-favorable regime.
- **Failure boundary:** controller saturates if perturbation drives the tidal
  field outside the trained signature range, or if accelerometer noise exceeds
  the Phase 8 calibration envelope.

### Falsification target

Mode (1): field-manipulation cost. A null result is "ephemeris-based control
under jamming holds orbit at least as long as signature-based control on the
matched slate." A positive result is a measured time-to-regime-loss advantage
for the signature controller, with named conditions.

### Current recommendation

First-priority physical horizon experiment after the formal and mesa tracks.
It inherits the most existing discipline and tooling, and remains the closest
path from current Sundog work to a defensible adversarial result.

---

## Candidate 4 - Manipulation-Cost Ladder

Working hook:

> If you can fake the score cheaply but cannot fake the field cheaply, the
> difference is measurable.

### Why it is strong

The Goodhart sidestep is fundamentally a claim about attack cost. A reward can
be exploitable because the agent or adversary has a cheap handle on the proxy.
A genuine environmental signature should require a more expensive intervention:
move the masses, reshape the shadow source, alter the pressure field, or
corrupt the sensor.

This candidate turns that structural argument into a curve. Rather than asking
whether Sundog "wins" one benchmark, it asks how much adversarial budget is
needed to break each control family across reward, observation, sensor, and
geometry interventions.

### Why it is weaker

The ladder is only as strong as its budget accounting. If the experiment
assigns arbitrary costs to attacks, reviewers will read the result as a tuned
benchmark. The adversary model has to be simple enough to audit and physical
enough that the costs mean something.

### Sundog variant

Build a matched environment family with four intervention channels:

- **Reward edit:** alter the scalar score or learned evaluator.
- **Observation edit:** alter what the agent sees without changing the world.
- **Signature-sensor edit:** corrupt the measured signature while leaving
  geometry fixed.
- **Geometry edit:** change the underlying environmental state that generates
  the signature.

Sweep adversary budgets on each channel and measure the budget at which each
controller suffers catastrophic regime loss.

### Sundog expression

- **Hidden target:** intended environmental regime.
- **Indirect signal:** geometry-derived `S(x)`.
- **Transformation:** signature tracking with guard bands and abstention.
- **Actionable output:** failure-threshold curves by attack channel.
- **Failure boundary:** signature corruption becomes as cheap as reward
  corruption, or geometry manipulation becomes cheap inside the chosen domain.

### Falsification target

Modes (1) and (2): field-manipulation cost and reward-in-costume collapse. If
signature-driven and reward-driven controllers fail at the same adversary cost
across the ladder, the gravity claim has not earned its core distinction.

### Current recommendation

*Depends on infrastructure from [`SUNDOG_V_MESA.md`](SUNDOG_V_MESA.md)
Phases 3–4, 7.* High-priority empirical bridge between the formal theorem and
the benchmark suite. Once mesa Phase 7 ships, this becomes a sweep extension
of the existing harness rather than a standalone build. Could still be merged
with Candidate 5 if implementation pressure favors one shared harness.

---

## Candidate 5 - Adversarial Signature Benchmark

Working hook:

> One team trains agents to spec-game. The other team trains agents to fall
> through a field. Score over a shared adversarial slate.

### Why it is strong

This is the cheapest expensive experiment. It does not require a spacecraft or
a high-fidelity orbital simulator. It requires a partially-observed gridworld
or continuous-control environment, a matched red team, and the discipline to
define a signature and a reward that are not the same function in disguise.

It is also the most direct attack on falsification mode (2): if a signature
can be decompiled into a reward, that decomposition should show up here under
adversarial pressure, because spec-gaming the decompiled-reward signature
should be exactly as easy as spec-gaming the matched reward.

### Why it is weaker

The risk is that the chosen environment is too toy to be persuasive. A
reviewer can always say "of course the signature controller held up — your
adversary had no path to the field." The environment design is the
load-bearing part of the experiment, and getting it wrong wastes the whole
run.

### Sundog variant

Build a harness in the style of the three-body operating-envelope runner:

- A partially-observed environment with a separable signature `S(x)` and a
  matched reward `R(s, a)` whose argmax under no adversary coincides with
  `S`'s argmax.
- Two agent families: signature-tracking (Sundog architecture) and
  reward-optimizing (matched RL baseline, comparable parameter count and
  sample budget).
- An adversary that can perturb (a) the observation channel, (b) the reward
  channel, and (c) the environment geometry, with named budgets per channel.
- Primary metric: rate of catastrophic policy failure under each adversary
  channel. Secondary metric: terminal performance under no adversary.

### Sundog expression

- **Hidden target:** designer-intended policy.
- **Indirect signal:** environmental signature `S(x)` separable from agent
  policy.
- **Transformation:** SCAN/SEEK/TRACK on signature gradient.
- **Actionable output:** action selected by signature ascent.
- **Failure boundary:** signature can be reshaped within adversary budget, or
  signature collapses to reward under decompilation.

### Falsification target

Mode (2): signature-is-reward-in-costume. If the signature agent and the
reward agent show indistinguishable failure rates under matched adversary
budgets, the Goodhart sidestep is not earning its keep.

### Current recommendation

*Depends on [`SUNDOG_V_MESA.md`](SUNDOG_V_MESA.md) controller-family
architecture and operating-envelope harness.* Still the cheapest expensive
experiment in absolute terms; primarily an environment-design problem once
mesa infrastructure is built. Shares infrastructure with the manipulation-cost
ladder and benefits from being scheduled alongside it.

---

## Candidate 6 - Cross-Domain Invariance Battery

Working hook:

> The same controller shape survives when the world changes clothes.

### Why it is strong

This is the best defense against the criticism that Sundog is a collection of
clever one-offs. The proof is not that one controller solves everything. The
proof is that the same grammar keeps recurring: hidden target, environmental
signature, transformation, bounded action, explicit failure surface.

The battery would connect the photometric mirror result, three-body workbench,
Pressure Mines, Shadow Fleet, fluid/wake navigation, and at least one
engineering domain under a shared reporting template.

### Why it is weaker

This is more broadcast proof than theoremic proof. A hostile reviewer can
still say that each domain was hand-shaped to fit the pattern. The battery
therefore needs strict admission rules and at least one negative case where the
signature fails.

### Sundog variant

Run a small slate of domain workbenches under a common schema:

- hidden target;
- indirect signature;
- transformation;
- action;
- sensor tier;
- matched baseline;
- observability sweep;
- failure boundary.

The result is a cross-domain table of operating envelopes rather than one
large benchmark score.

### Sundog expression

- **Hidden target:** domain-specific latent state.
- **Indirect signal:** domain-specific trace, field, gradient, wake, or
  distortion.
- **Transformation:** shared SCAN/SEEK/TRACK/REACQUIRE grammar where
  applicable.
- **Actionable output:** domain action plus standardized operating-envelope
  report.
- **Failure boundary:** one or more domains fail the signature test or collapse
  into ordinary metric optimization.

### Falsification target

Mode (2): signature-is-reward-in-costume, plus the broader "one-off gimmick"
critique. A null result is "the grammar does not travel without bespoke
machinery." A positive result is a set of bounded, non-identical domains where
the same structure remains useful.

### Current recommendation

*Uses [`SUNDOG_V_MESA.md`](SUNDOG_V_MESA.md) mesa controller families and
probe slates as one domain entry.* Medium-priority broadcast proof. Best
pursued after at least one more physical or adversarial candidate has a real
result, so the battery has more than one nontrivial domain to compose.

---

## Candidate 7 - Causal Intervention Test

Working hook:

> A reward can be made to lie. A field must be moved.

### Why it is strong

This candidate turns the gravity claim into causal surgery. Instead of only
observing failures, it intervenes on reward values, observation channels,
signature sensors, and environment geometry, then asks which interventions
actually steer the controller.

The output would be unusually legible: a causal graph of where control
authority lives for reward-trained agents versus signature-driven agents.

### Why it is weaker

The experiment can become artificial if the interventions are too clean. Real
attackers do not arrive as labeled causal operators. The intervention slate
must therefore be paired with a real adversary or perturbation schedule.

### Sundog variant

Use the same environment as Candidate 4 or Candidate 5, but add explicit causal
edits:

- hold the world fixed and edit reward;
- hold the world fixed and edit observations;
- hold the signature fixed and edit unrelated state;
- hold observations fixed and move the true geometry;
- corrupt only the signature sensor.

Measure which edits change policy, regime retention, and failure mode.

### Sundog expression

- **Hidden target:** true environmental regime.
- **Indirect signal:** causal child of environment geometry.
- **Transformation:** controller response to the signature rather than to
  reward-channel edits.
- **Actionable output:** intervention-response matrix and causal graph.
- **Failure boundary:** the signature controller follows edited proxies even
  when the external signature is unchanged.

### Falsification target

Modes (1), (2), and (3), depending on which intervention succeeds. The cleanest
failure is a cheap reward-like edit that steers the signature controller without
moving the field or corrupting the sensor.

### Current recommendation

*Promoted to [`SUNDOG_V_MESA.md`](SUNDOG_V_MESA.md) Phase 4 as the Causal
Intervention Battery.* High-value proof instrument; less a standalone
workbench than a battery attached to the mesa harness. The roadmap is the
active working location; this entry is retained for archival reference.

---

## Candidate 8 - Fluid / Wake Navigation

Working hook:

> The obstacle is hidden. The flow remembers it.

### Why it is strong

Fluid wakes are real environmental signatures. A hidden object, source, sink,
or current can be invisible while still shaping local velocity, pressure,
vorticity, or dye traces. This makes the candidate a natural physical cousin
to Shadow Fleet and a more defensible engineering domain than a purely game
native wake metaphor.

The public image is strong: the agent does not see the obstacle, but the water
or air carries the shape of its absence.

### Why it is weaker

Fluid simulation can become expensive and fragile. If the harness is too
simple, reviewers may call it a toy; if it is too realistic, the work may turn
into a computational fluid dynamics project instead of a Sundog proof.

### Sundog variant

Construct a 2D or 3D flow field with hidden obstacles or sources. The agent
receives local flow measurements, pressure samples, or tracer histories and
must navigate, station-keep, or choose safe passages without privileged access
to obstacle geometry.

### Sundog expression

- **Hidden target:** obstacle/source geometry and safe flow regime.
- **Indirect signal:** wake, pressure, vorticity, dye trace, or velocity
  gradient.
- **Transformation:** SCAN/SEEK/TRACK over flow-derived signatures.
- **Actionable output:** steering, station-keeping, abort, or probe.
- **Failure boundary:** turbulence, diffusion, sensor delay, or overlapping
  wakes erase the useful signature.

### Falsification target

Mode (1): field-manipulation cost. If cheap perturbations or fake wakes steer
the controller more easily than matched reward or map-based baselines, the
gravity analogy weakens.

### Current recommendation

Strong second-wave physical candidate. It has better public feel than many
benchmarks and better scientific footing than most game metaphors.

---

## Candidate 9 - Embodied Robotics Under Denied State

Working hook:

> The robot does not know the object. It reads what the object does to the
> world.

### Why it is strong

This is the highest public-legibility move outside spacecraft. A robot
grasping, balancing, pouring, or navigating from tactile slip, vibration,
acoustic echo, shadow, airflow, pressure, or deformation makes the gravity
claim concrete without asking the audience to accept a metaphor first.

It also attacks the assumption that Sundog is only a simulation idea. Physical
worlds naturally generate signatures that are hard to fake without doing real
work.

### Why it is weaker

Hardware turns every clean claim into a maintenance project. Sensor
calibration, repeatability, cost, and safety could dominate the research. A
robotics result also risks being read as "better tactile control" unless the
signature/reward comparison is designed from the beginning.

### Sundog variant

Choose one narrow task:

- grasp unknown objects from tactile slip fields;
- navigate low-light or smoke-like occlusion from acoustic/airflow returns;
- balance contact from pressure gradients;
- pour or transfer material from sound and weight-shift signatures.

Run a signature controller against a vision/classifier or reward-trained
baseline under occlusion, distribution shift, and sensor perturbation.

### Sundog expression

- **Hidden target:** object state, contact regime, or navigable geometry.
- **Indirect signal:** slip, vibration, sound, pressure, deformation, shadow,
  or airflow.
- **Transformation:** signature tracking with confidence gating and abort
  states.
- **Actionable output:** grasp, move, tilt, slow, retry, or abstain.
- **Failure boundary:** sensor noise, contact ambiguity, material variation, or
  cheap spoofing collapses the signature.

### Falsification target

Modes (1) and (2): if the physical signature can be cheaply spoofed or if the
signature controller reduces to a tuned task reward, the grand claim does not
gain support.

### Current recommendation

High-investment public proof, not first-wave. Best after a simulator and
theoremic foundation have already made the comparison precise.

---

## Candidate 10 - Conservation-Law Domain

Working hook:

> The signature is protected by physics.

### Why it is strong

Conservation laws give the gravity claim its most defensible non-gravity
analogue. Mass, momentum, charge, energy, flow continuity, and structural modes
constrain what signatures can be faked cheaply. A controller reading those
signatures is not merely reading a learned proxy; it is reading something the
world must pay to change.

Possible domains include thermal diffusion, leak detection from pressure decay,
electrical grid phase/frequency stability, structural health from vibration
modes, and flow continuity around hidden obstructions.

### Why it is weaker

The candidate can become too diffuse. "Use physics" is not an experiment. It
needs one carefully selected domain with a measurable adversary budget and a
baseline that makes Goodhart pressure visible.

### Sundog variant

Select one conservation-governed system and build a matched task:

- **Thermal:** hidden heat source tracked from diffusion signatures.
- **Pressure:** hidden leak or blockage inferred from pressure decay.
- **Grid:** unstable operating regime detected from phase/frequency signatures.
- **Structure:** damage inferred from modal vibration shifts.

Compare signature-driven control or detection against a metric-trained or
label-trained baseline under spoofing and perturbation.

### Sundog expression

- **Hidden target:** physical regime governed by a conservation constraint.
- **Indirect signal:** conserved-flow residue, diffusion trace, phase drift, or
  modal distortion.
- **Transformation:** regime tracking over physically constrained signatures.
- **Actionable output:** control, flag, throttle, isolate, or abstain.
- **Failure boundary:** adversary can fake the conserved signature at lower
  cost than expected, or the signature is not unique enough to guide action.

### Falsification target

Mode (1): field-manipulation cost. A positive result shows that physics
increases the cost of spoofing the signature relative to manipulating a metric
or label channel.

### Current recommendation

*Soft dependency on [`SUNDOG_V_MESA.md`](SUNDOG_V_MESA.md) controller-family
architecture and harness, once one concrete physical domain is chosen.*
Long-horizon engineering candidate. Keep it in the ledger as a principled
family; the domain pick (thermal, pressure, grid, structural) is the gating
decision before any roadmap promotion. Reusing mesa infrastructure is the
cheapest implementation path once a domain is selected.

---

## Candidate 11 - Side-Channel Defense *(stretch)*

Working hook:

> The detector does not classify attacks. It reads the disturbance the system
> casts when something is trying to corrupt it.

### Why it is strong

High stir. Cybersecurity is an industry currently shipping reward-trained
classifiers that are observably being Goodharted in production by adversarial
example crafting and APT-style behavior masking. A signature-driven detector
that reads syscall residue, EM/timing side-channels, or network flow geometry
rather than labeled-attack examples puts the gravity claim inside a domain
where the failure mode is visible.

### Why it is weaker

Domain-expertise burden is high and the public is already saturated with "AI
for cybersecurity" pitches. The misread risk is severe: collapsing into "we
built better intrusion detection" loses the point entirely.

### Sundog variant

A monitored process exposes a signature derived from syscall sequence
geometry, timing residue, or network flow shape. The signature is defined
without reference to known attack labels. A signature-tracking controller
flags regime departures; a matched supervised baseline classifies on labeled
attack samples. An active red team crafts adversarial patterns against both.

### Sundog expression

- **Hidden target:** intent of the process running in the system.
- **Indirect signal:** geometric residue (syscall n-gram structure, timing
  distribution, flow autocorrelation).
- **Transformation:** signature departure from baseline regime.
- **Actionable output:** flag, throttle, escalate, abstain.
- **Failure boundary:** signature can be flattened by adversary at lower cost
  than the matched classifier can be evaded.

### Falsification target

Modes (1) and (2) jointly: in a domain where the adversary is unusually
sophisticated, both field-manipulation and signature-decompilation pressure
should be high. If the gravity claim is real, signature-driven detection
should outlast labeled-attack classifiers on a matched red-team slate. If
not, this is where the costume falls off.

### Current recommendation

Long-term aspiration, not first-wave. Listed here because the broadcast value
of a positive result would be disproportionate to the cost of ratcheting the
claim, and because the cybersecurity framing is the most natural translation
of the gravity claim into a domain where reviewers care.

---

## Promotion Guidance

A candidate leaves this ledger and earns a `SUNDOG_V_*.md` roadmap document
only when:

- the proof or experiment design names a specific falsification mode from
  §Falsification Surface;
- the signature is structurally separated from privileged state in the
  three-body sensor-tier style;
- for empirical candidates, a matched reward-trained or metric-driven baseline
  is committed to;
- for empirical candidates, a named adversary or perturbation schedule is
  committed to;
- the boundary language in `presentation/claims-and-scope.md` is updated to
  reflect what the candidate, if completed, would ratchet.

Until those are in place, the gravity claim remains the program's most
outlandish published frame and the most carefully boundaried one. Public
communication may use it (see `PROMO_HIGHLIGHTS.md` §The Gravity Claim) but
must mark it speculative and link to this ledger and to the controlled
three-body and photometric results that anchor the analogy.

## Broadcast-Aligned Summary

For public communication, the gravity-family framing summarizes this way:

> The photometric experiment and the three-body workbench demonstrate that
> useful control is possible from indirect environmental signatures rather
> than from privileged state. The gravity claim is the structural argument
> for why this matters: a controller that tracks a property of the
> environment's geometry, rather than optimizing a designer-specified metric,
> inhabits a different threat model than reward-trained control. The
> proof targets that would test this difference are listed in this ledger and
> are deliberately expensive. None has been completed.