BACK TO ALL DEV LOGS
DEVELOPER DIARY2026-05-26

An Operating Console for the Running Game: Six Weeks of MCP

StarGazer now hosts a Model Context Protocol server inside the game process. Twenty tools across read, mutate, capture, commit. Every screenshot in this post was captured by the MCP driving itself.

An Operating Console for the Running Game: Six Weeks of MCP
Solo Developer
Lead Game Designer & Pilot

An Operating Console for the Running Game: Six Weeks of MCP

Solo Developer
Lead Game Designer & Pilot

Dev Log

  • The cold-launch tax died. --screenshot path --at-tick N took ~10 seconds per shot — build, launch, voxel-cook, render, save, exit. The new frame.capture runs against a living game in 23 milliseconds. That's the load-bearing rewrite, and every other tool in this post followed from it.
  • 20 tools across foundation / capture / mutate / commit / replay. They're not a wishlist — every one of them has at least one smoke-test caller. Most have a skill consumer landed or scheduled.
  • One protocol, no inventions. Standard Model Context Protocol over stdio, served by the official ModelContextProtocol .NET SDK 1.3.0. Claude CLI spawns StarGazer --mcp as a subprocess. No custom JSON-RPC framing, no in-process registry to maintain.
  • Compile-gated three ways. #if STARGAZER_MCP, a new DebugMcp csproj configuration, and a --mcp CLI flag. Release builds contain zero MCP code. CI greps the publish artifact to enforce.
  • Three-label determinism contract. Every mutation is tagged sim_observed and side_channel. Lighting tweaks don't flag replay-dirty; weapon damage edits do. Honest labels are what makes the audit trail trustworthy.
  • Every screenshot you're about to see was driven by the MCP. Single dotnet StarGazer.dll --mcp process, one Python script piping JSON-RPC into stdin, captures stamped sub-second each.
Solo Developer
Lead Game Designer & Pilot

Baseline: one capture, sub-second

Baseline frame captured via mcp__stargazer__frame.capture Captured by frame.capture at sim tick 492 against a running game. Capture latency is sub-frame (<33ms at 30fps); the same screenshot used to take ~10 seconds because it required a cold launch. The cost dropped two orders of magnitude.[^latency]

[^latency]: Latency measurements are from the developer machine; the artifacts checked into screenshots/2026-05-26/mcp-blog-demo/ are the outputs of the capture session, not the timing logs. The session.txt and state-snapshot.json files alongside the JPEGs record the session ID, game commit, and sim state at capture time.

Before, the visual-eval skill — the one that catches "did this rendering change actually look right" — burned a full cold launch per shot. Voxel-cook the fleet, allocate render targets, warp the ships in, render two frames, save, exit. Ten seconds of latency on a question that needs five iterations to answer.

Now the skill checks if MCP is available (scripts/mcp-available), and if it is, calls frame.capture against the existing process. The response carries the path, the dimensions, the sim tick the frame was captured at, and a server-side capture timer. The C1 hard gate was "less than one second"; observed dev-machine captures land well under that — comfortably inside a single 60Hz frame.

Solo Developer
Lead Game Designer & Pilot

Sweeping a knob: lighting key intensity

The headline tool is frame.sweep. It takes one mutation key, a list of values, and an output directory; it applies each value in sequence, captures a frame, and reverts the original on the way out. Three shots from a single call:

Lighting sweep at intensity=3.0 — dimmer hull, less warm-key contribution lighting.key.intensity = 3.0 — warm key dimmed to half-default. Notice the cooler magenta-rim cast dominating the top of the hull.

Lighting sweep at intensity=5.75 — the source-baked default lighting.key.intensity = 5.75 — the value Game1.LoadContent seeds at startup. The warm yellow on the hull dorsal returns to its tuned balance.

Lighting sweep at intensity=8.0 — clamp ceiling, hot key light lighting.key.intensity = 8.0 — schema-clamp ceiling. Bright warm fill across the entire dorsal surface, picking up the engine plating that's flat in the dimmer pass.

Three captures, 128 ms total wall-clock for the whole sweep including the revert. The mutation is sim_observed: false, side_channel: true — lighting is a render-only uniform, the sim doesn't observe it, the determinism contract correctly says replay isn't dirty. The LightingMutator writes to both BasicEffect (the legacy ship-draw path) and ShipPbrEffect (the active PBR path) so whichever pipeline draws the ship today, the sweep affects it.

This is the workflow the persona reviews unanimously asked for. UI designers, creative directors, combat designers all converged on "show me three values side by side, let me commit the one I pick." The CLI flow couldn't deliver it — three sweeps means three cold launches. Six minutes to compare three numbers. The MCP variant is fast enough to be the default iteration loop, not a special tool.

Solo Developer
Lead Game Designer & Pilot

A side-channel mutation that almost wasn't

While building this, the diff tool revealed a regression I'd missed: mcp.commit.diff reported tonemap mutations as not pending even after I'd applied them. The fix was a real bug — the rendering pipeline was hardcoding acesTonemap.Exposure = 1.0f every frame, clobbering whatever the MCP had written one tick earlier. After removing the reset, exposure mutations actually persist:

Tonemap exposure dropped to 0.6 — shadows crushed, highlights pulled toward midtone tonemap.exposure = 0.6. The deck floor reads darker, the warm star in the foreground compresses toward orange, the cyan running lights still pop but with less surrounding bloom.

Tonemap exposure pushed to 1.5 — overall brighter, more saturated highlights tonemap.exposure = 1.5. The grid plane lifts noticeably, the magenta rim light on the dorsal becomes more present, the asteroid debris in the background catches more reflected fill.

What's interesting isn't the comparison — it's that the diff tool found the bug. I asked the MCP to show me what had changed in the session; it told me "nothing." I knew I'd just mutated three values. That contradiction surfaced the silent reset. Without the MCP audit-loop, that hardcoded 1.0f would have shipped invisibly until a designer noticed their tonemap tweaks didn't survive a frame.

Solo Developer
Lead Game Designer & Pilot

Spawning a controlled matchup

spawn.matchup resets the simulation with a crafted SimulationConfig. scenario.force composes that with voxel.paint_damage to reach canonical encounter geometries from cold:

Broadside duel scenario — two ships positioned 60 units apart along the X axis, beam exchange in progress scenario.force {kind: "broadside_duel"} — two ships placed at [-30,0,0] and [30,0,0] with yaws facing each other, then sim.step 120 ticks to let combat unfold. Visible: cyan beam crossing center-frame, magenta-rim impact glow on the receiving ship's bow.

This unlocks a workflow that didn't exist before. A combat designer can:

  1. spawn.matchup to set the seed + ship packs deterministically
  2. combat.telemetry for the baseline aggregates over 600 ticks
  3. mutate.set weapon.beam.damage to a candidate value
  4. Re-spawn with the same seed, run again, take new telemetry
  5. Diff the two aggregates programmatically
  6. mcp.commit weapon.beam.damage if the candidate wins

The combat.telemetry tool returns honest data — its response includes a honest_warnings array that explicitly says "damage_dealt_is_beam_hit_count_proxy" and "tactic_phase_ticks_is_flip_count_not_dwell_ticks" so a balance designer can't mistake hit-counts for true damage points. That kind of self-documenting honesty was a persona-review demand: agents will trust labels they can't game, and stop trusting labels they catch lying.

Showcase Agent
Content Publicist

Catching bugs visual-eval can't

Three tools (hud.assert_contrast, hud.assert_no_overlap, hud.assert_hit_targets) walk the HUD element registry and sample post-tonemap pixels. The smoke run for this blog post turned up 22 contrast failures and 1 overlap — real issues no screenshot-then-eyeball flow had caught:

{
  "ui_path": "/hud/buttonpanel/screenshot",
  "x": 1080, "y": 26, "w": 176, "h": 26,
  "ratio": 1.389,
  "fg": [21, 41, 71, 255],
  "bg": [6, 6, 6, 255]
}

That's the "Screenshot" button's label — (21,41,71) on near-black (6,6,6). A 1.389 contrast ratio against the WCAG 4.5 floor. The persona-UI-designer review predicted exactly this category of bug: humans skim past low-contrast text, the asserter samples the pixels and surfaces it. Now it's a tool I can re-run in 74 ms during any UI iteration.

Solo Developer
Lead Game Designer & Pilot

State queries that replace log-tailing

{
  "sim_meta": {
    "tick": 492, "paused": false, "seed": 1,
    "total_ships": 2, "total_beams_active": 0,
    "combat_enabled": true, "weapon_range": 120
  },
  "ships": [
    {
      "id": 0,
      "pack": "voxel:spaceship:0xB705399BB32CE1E4",
      "pose": { "pos": [0,0,0], "fwd": [0,0,1], "vel": [0,0,0], ... },
      "hp_ratio": 0.9954,
      "weapon": { "cooldown_ticks": 0, "beam_active": false }
    },
    ...
  ]
}

That's state.query {scopes:["sim_meta","ships"]}. 35 ms. Before this, the same information meant grepping .run/stargazer.log for FLEET_LOAD / BEAM_FIRED / etc., then mentally reconstructing the world state from log lines. Now it's a structured object.

The query supports nine scopes — sim_meta | ships | voxels | beams | ai | render_pipeline | shader_uniforms | perf | log_tail — each a slice of game state. Skills can pull what they need and ignore the rest. A bug repro can ship as {tick, seed, state.query output} instead of a vague "I think the AI was orbiting weirdly."

Solo Developer
Lead Game Designer & Pilot

Invariants that don't lie

{
  "checks": [
    {
      "name": "determinism", "passed": true,
      "details": "baseline=8010EE1396BEB86C live_hashes=[ship0=0xBFE6788A5D347DF2,ship1=0xD11570367C7E9913]"
    },
    { "name": "voxel_topology", "passed": true },
    { "name": "physics_finite", "passed": true },
    { "name": "no_nan_transforms", "passed": true },
    { "name": "ai_state_well_formed", "passed": true }
  ],
  "all_passed": true
}

invariants.check runs five sanity checks against the live sim: voxel topology consistency, NaN-free transforms, AI state well-formedness, determinism baseline alignment against .last-hash. It's forced-couple with replay.export — exporting a replay refuses to write the file if any invariant fails, unless the caller passes accept_dirty: true. That refusal is what makes a replay a regression test instead of a snapshot of broken state.

Solo Developer
Lead Game Designer & Pilot

What it cost to build, and what it unlocks

Five commits collapsing what the build plan called "six weeks" — a deliberately conservative pace that admits the dispatcher, the determinism contract, and the source-write semantics each need real thought, not just code. Weeks one and two landed in a single commit (the screenshot-refactor prereq folded cleanly into the foundation drop); weeks three through five each got their own. Every commit was preceded by a smoke test and followed by a CI gate. The repo now contains:

  • 20 MCP tools under game/Mcp/Tools/
  • A phase-aware dispatcher with LIFO inverse-mutation log for exception-safe revert (McpDispatcher.RunMultiPhase + IPhaseContext.Hop + OnRevert)
  • Three audit logs at .run/mcp-{calls,mutations,commits}.log — append+flush per line, mutation_seq monotonic counters
  • Schema-driven mutation namespace at game/Mcp/Schema/v1.json — 48 keys with determinism labels, source paths, clamps
  • A migrated stargazer-visual-eval skill that uses MCP when available and CLI as fallback
  • A CI publish-grep gate that fails the build if MCP code leaks into Release artifacts
  • DOC/MCP-SERVER-PLAN.md, DOC/MCP-SKILLS-PLAN.md, DOC/MCP-REPLAY-FORMAT.md — the design history with every persona review, every critique pass, every revision the proposal went through

What v1 doesn't ship and intentionally so:

  • Roslyn rewriter for inline_constant commits. Lighting and tonemap live as Vector3(5.75f, 4.25f, 2.50f) literals inside Game1.cs. The MCP can tune them runtime; it cannot write them back. The fix is a Roslyn SyntaxRewriter that finds the right MemberAccessExpressionSyntax, replaces the value, and re-formats — that's a v2 line item.
  • Runtime-loadable scenarios. scenarios/X.json files exist as skill metadata but the game doesn't parse them yet. Adding a --scenario <path> flag is small; getting the scenario-runner to round-trip against mcp.commit is the v2 work.
  • HTTP+SSE transport. Stdio is enough for Claude CLI's MCP support. Multi-client over HTTP lands when there's a second consumer.
Solo Developer
Lead Game Designer & Pilot

Where this goes

The pattern that emerged across the persona reviews and the implementation is: an MCP that's right is a much wider tool surface than the conservative cut would have given you, gated by a much narrower set of hard constraints. Compile gating prevents the surface from ever shipping to a player. Determinism labels prevent ephemeral tweaks from corrupting replays. The explicit mcp.commit verb means session-local exploration never becomes silent persistence. Inside those four walls, you can ship voxel.set_hp, weapon.beam.damage, material.<pack>.emissive — all the knobs the conservative reviews refused on procedural grounds — without making the game worse.

The blog post you're reading is the smallest possible demonstration: a single python3 script piped into dotnet StarGazer.dll --mcp, eight RPC calls, eight artifacts on disk in under two seconds, then committed alongside the post. Every other dev tool I've built lives on the wrong side of "fast enough to use during real iteration." This one moved over the line.

There's a screenshot for that.

Final scenario shot — broadside duel, two ships closing Captured at sim tick 612 after scenario.force {kind: "broadside_duel"} + sim.step 120. Total time from "spawn this exact matchup" to "JPEG on disk": 1.8 seconds. Cold-launch equivalent: 14 seconds and a CLI argument tower I didn't have to invent.


Capture session pinned at commit 17991ee. MCP session id 5a189370-8736-4ca1-916d-fd74b20e813e. All proofs live at screenshots/2026-05-26/mcp-blog-demo/ in the repo.

Reviewed & Approved by Editor Agent