NYX / ULTRON PROGRAM / ARCHITECTURE REPORT
Computer Use, Micro-VMs, and the Sandbox Substrate
2026-07-03 — written after a multi-hour failure to build/run Organ F (full-desktop computer-use) live on the Mac. Separates what was actually broken from what merely looked broken, decides whether computer-use should exist, and designs the provisioning substrate: fast micro-VMs + LunarFS + node_modules handling.
- Nothing about the computer-use runtime is broken. Every failure was in image provisioning — pulling and building the container image — not in Xvfb, the MCP server, the governor wiring, or teardown. Those are sound.
- The feature should exist, but scoped and relocated. Browser use (Organ A) is the 80% case and already runs natively with zero containers. Full desktop (Organ F) is a genuine but rare last-resort capability. Neither should be built or hosted on the 16GB Mac coordinator.
- The real fix is provisioning discipline, not a fancier VM: bake the image once, pull it once (serialized), keep a warm pool, run it on the Linux/WSL2 fleet — not Firecracker-on-Mac. "Spin up" drops from minutes (build) to ~1s (run cached).
- LunarFS belongs in the coding-worker fan-out, not the desktop image. Its instant copy-on-write fork solves the git-worktree +
pnpm install cost and the node_modules-pollution bug class. The desktop image is baked, not forked, so LunarFS does little for it directly.
- node_modules is the crux, and the answer is to keep it out of the fork path: the container image owns installed dependencies; LunarFS forks only the source tree. node_modules never goes through per-file CoW hydration.
1. Why it was unreliable — the actual root causes
Today's timeline, deduplicated to root causes (all provisioning-layer):
| Symptom | Root cause | Layer |
| 2-hour "still building", no image | docker build hung at FROM python on parallel layer-download stalls | registry pull |
Docker Desktop VM docker info hangs, EIO on bootstrap | Docker Desktop's macOS VM backend is fragile without an interactive GUI session | host VM |
| Colima pull of any image hangs | containerd-snapshotter pull path broken in this Colima build | image store |
Colima build times out at FROM even after classic pull works | buildkit has its own image resolver with a short metadata deadline | builder |
| The thing that finally worked | max-concurrent-downloads: 1 (serialize layers) + pre-pulled base + classic builder | registry pull |
The single recurring villain across both Docker Desktop and Colima was parallel layer downloads stalling — small HTTP requests and even a 10 MB bulk download from the VM ran at 9 MB/s, but dockerd opening several blob connections at once wedged. Serializing fixed it instantly.
The deeper architectural faults these symptoms expose
- A1 — Build-at-use-time. The image is (re)built by hand when needed. There is no registry, no cached artifact, no "build once in CI." A capability that a worker invokes should never trigger a multi-hundred-MB image build. This is the biggest design smell.
- A2 — Docker-as-hard-dependency on macOS. On a Mac, every Linux container runs inside a heavyweight VM (Docker Desktop or Colima/Lima). That VM is the least reliable and most memory-hungry component in the whole system, and it competes with the moderator for the 16GB the operator is already trying to protect.
- A3 — No warm pool. Even with the image present, each session cold-starts a container. Fine at ~1-2s, but there is no pre-warming and no reuse ceiling tuned for latency.
- A4 — Wrong host. Computer-use is a Linux workload. The Mac is the worst place to run it. The Windows laptops (Docker Desktop/WSL2) and any Linux box are dramatically better hosts, and the fleet design already exists to reach them.
2. Should computer-use exist at all?
Yes — as a tiered, last-resort capability, not a headline feature. The honest framing is the resourcefulness hierarchy the Ultron design already names:
direct API / SDK → existing MCP tool → browser (Organ A) → full desktop (Organ F) → park / ask
cheapest, most reliable ―――――――――――――――――――――――――――――――――― most expensive, least reliable
- Organ A (browser)KEEP + INVEST — the 80% of "use a computer" and it runs natively (proven: real Chrome, human cursor, native venv, zero containers). This is where most real value is.
- Organ F (full desktop)ESCAPE HATCH — for the genuine minority case: a native GUI app with no web version and no API. Highest-cost, lowest-frequency, highest-security-surface capability. Reachable, not prominent, never the Mac's job.
Killing Organ F would be wrong (it is the only answer for native-only software), but treating it as a first-class always-ready feature is also wrong. Tier it.
3. The provisioning fix (the part that actually matters)
MOVE 1 — Bake the image once; never build at use-time
Build
nyx-desktop-mcp and
nyx-browser-mcp once (CI, or a one-time local build) and publish to a registry the fleet can pull, or ship a
docker save tarball. Pin
max-concurrent-downloads: 1 in the daemon config the fleet uses. A worker session does
docker run on a cached image — seconds, not the 2-hour cliff.
MOVE 2 — Warm pool + reuse
Keep N pre-started desktop/browser containers idle (memory-capped) so a session attaches to a warm one instead of cold-starting. Tear down on session end as today; replenish asynchronously. Turns ~1-2s into ~0.
MOVE 3 — Relocate off the Mac
Route all containerized computer-use to the fleet worker-hosts (Windows/WSL2 today, Linux later), never the coordinator. Add a
desktop label to the scheduler's routing and pin Organ E/F WorkItems to it. The Mac coordinator stays Docker-free.
4. Faster micro-VMs — honest options
- Pre-baked image + warm pool (do this first) — 95% of the perceived speed win, runtime-agnostic. The 2-hour pain was build, not boot.
- Firecracker / Cloud Hypervisor microVMs (~125ms boot; what Fly.io and AWS Lambda use) — the right long-term substrate for the fleet, but needs a Linux KVM host. Does not run on macOS directly. Excellent on the Linux/WSL2 fleet, irrelevant for the Mac.
- libkrun / krunvm (OCI images as microVMs; on macOS uses Virtualization.framework) — promising but young and operationally rough. Not worth betting the reliability story on right now.
- Persistent Lima/Colima VM + many containers (what we have) — boots slowly once, containers are fast after. With Move 1-2 this is perfectly adequate as the interim substrate.
RecommendationMove 1-3 now (image discipline + warm pool + fleet). Firecracker as the fleet's microVM engine later. Do not try to make the Mac a fast-microVM host.
5. Where LunarFS fits
LunarFS (github.com/Emotions-Research/LunarFS): BLAKE3 content-addressed store; fork_workspace returns an instant copy-on-write mount (32-byte root hash, O(1) regardless of repo size — 13ms to fork the 2GB / 94,695-file Linux kernel vs 7.4s for git worktree add, 548x). Mounts via the OS NFS client on macOS/Linux; files hydrate lazily on first read. MCP tools: fork_workspace, mount, list_workspaces, push, grant_access, destroy. Engine AGPL-3.0, client Apache-2.0.
Its home is the coding-worker fan-out, not the desktop image. Two different problems:
- Coding runtime (strong fit). Today each parallel subtask worker gets its own
git worktree and its own node_modules — the direct source of the recurring node_modules-pollution bug and the per-worker pnpm install setup cost. Replacing worktree-per-worker with fork_workspace-per-worker gives each worker an instant, isolated, deduped CoW copy of the repo. Highest-value place to adopt LunarFS.
- Computer-use sandbox (weak fit). The desktop OS + apps come from a baked Docker image, which LunarFS does nothing for. LunarFS could mount a forked workspace into the container, but that's a nice-to-have layered on top, not a replacement.
The clean pictureTwo orthogonal layers: container image = OS + tools isolation (Docker/microVM) × LunarFS fork = instant repo materialization. A worker is a warm container that mounts a LunarFS fork of the source.
6. node_modules — the crux, and the strategy
node_modules is the classic adversary for any CoW / overlay / network filesystem. Why it hurts LunarFS specifically:
- Millions of tiny files. Lazy per-file hydration over NFS means the first tool that stats the whole tree (a node resolve sweep, tsc, vitest collecting) triggers a cold-read storm with per-file round-trip overhead. "Instant fork" can turn into "slow first build."
- pnpm is itself a content-addressed symlink farm. Nesting one content-addressed store (pnpm) inside another (LunarFS) is redundant and fragile — symlink semantics over NFS are the exact thing that already breaks worktrees today.
- Native modules are ABI/arch-specific. better-sqlite3, node-pty compile to .node binaries. A fork made on Mac arm64 is wrong inside a Linux amd64 container.
- Write/mtime churn. Build caches (vite, tsc incremental) rewrite inside node_modules, generating CoW deltas and defeating dedup.
Strategy — keep node_modules out of the fork path entirely. Ranked:
- Deps live in the container image; LunarFS forks only source (recommended). The worker image is built with
pnpm install --frozen-lockfile already run (correct arch, native modules compiled for the container). Each worker mounts a LunarFS fork of the source tree only. Working tree = baked node_modules ∪ forked source. Deps change ⇒ rebuild the image (infrequent, and now baked-once per Move 1). Dissolves the tiny-file, symlink, and ABI problems at once.
- Shared read-only pnpm store, node_modules materialized per fork via --offline. Point
PNPM_STORE_DIR at a shared warm volume; each fork runs pnpm install --offline (seconds, no network, no compile if store is warm). Not instant, but robust.
- Fork node_modules through LunarFS but pre-warm the hot paths. Weakest option; fights the filesystem instead of avoiding the fight.
The designImage owns dependencies, LunarFS owns source. Composes perfectly with Move 1 (baked images) — the same "build deps once, run many" discipline solves both the computer-use provisioning problem and the node_modules problem.
7. Recommendation & proposed next steps
(for operator decision — not yet queued)
- Keep computer-use, tier it. Organ A (browser, native) is primary; Organ F (desktop, container) is the last-resort escape hatch, routed to the fleet, never the Mac.
- Marathon P1 — Image discipline: bake nyx-browser-mcp + nyx-desktop-mcp once, publish/cache them, pin
max-concurrent-downloads: 1, pull-once/run-cached/never-build-at-use-time. Add a desktop fleet label and pin Organ E/F to it.
- Marathon P2 — Warm pool: pre-warmed container pool for browser/desktop sessions.
- Marathon P3 — LunarFS coding substrate: replace git-worktree-per-worker with fork_workspace-per-worker, node_modules kept in the image, verified against the exact worktree-pollution and ABI-stall bugs it is meant to kill.
- Do not invest in Firecracker-on-Mac or krunvm yet; adopt Firecracker on the Linux fleet after P1-P3 prove the model.
The through-line: the mistake was building/hosting a Linux sandbox lazily on a Mac. Bake once, run cached, on the fleet — and let LunarFS handle the source, not the deps.