# WoW Lab > Simulation and theorycrafting tools for World of Warcraft # Documentation ## Introduction WoW Lab is a combat simulator for World of Warcraft. The engine runs in your browser via WebAssembly for free, and paid plans get slots on a hosted pool for faster runs. ## Overview Three main pieces: 1. **Simulation Engine** - A deterministic Rust engine compiled to WebAssembly for the browser and native for the hosted pool 2. **Realtime Layer** - Centrifugo for WebSocket messaging, Supabase Postgres for jobs and results 3. **Portal** - A Next.js web application for character planning, rotation editing, and result analysis ## Documentation Structure This documentation covers the technical architecture and implementation details of the WoW Lab platform: System design and component overview Simulation mechanics and WASM integration Realtime infrastructure and protocols ![Overview](./images/overview.jpg) ## System Architecture WoW Lab splits simulation between the browser (free tier) and a hosted Fly pool (paid tiers). The portal orchestrates both paths through the same job records. ## Component Overview ```mermaid flowchart TB subgraph Presentation["Presentation Layer"] Portal[Portal Web App] end subgraph Coordination["Coordination Layer"] Sentinel[Sentinel Service] Beacon[Centrifugo / Beacon] end subgraph Simulation["Simulation Layer"] Browser[Browser WASM Engine] Pool[Fly Pool Workers] end subgraph Storage["Storage Layer"] Supabase[(Supabase)] end Portal --> Browser Portal -->|WSS| Beacon Portal -->|HTTPS| Supabase Pool -->|HTTP| Sentinel Pool -->|WSS| Beacon Beacon -->|Callbacks| Sentinel Sentinel --> Supabase Sentinel --> Beacon ``` ## Layer Responsibilities | Layer | Component | Responsibility | | ------------ | --------- | ----------------------------------------------- | | Presentation | Portal | UI, browser simulation, result visualization | | Coordination | Sentinel | Job scheduling, pool worker health, Discord bot | | Coordination | Beacon | WebSocket connections, realtime messaging | | Simulation | Browser | WASM engine for free-tier sims | | Simulation | Pool | Native engine workers on Fly for paid-tier sims | | Storage | Supabase | User data, rotations, jobs, results | ## Simulation Request Free tier: 1. User submits sim in the portal 2. The browser WASM engine runs it locally 3. Results stream back into the UI as the browser completes chunks Paid tier: 1. User submits sim in the portal 2. Portal creates a job record in Supabase 3. Sentinel assigns chunks to pool workers via Beacon 4. Workers run chunks and post results 5. Sentinel aggregates results into the job record 6. Portal subscribes to the job and updates live ## Domain Services | Domain | Service | Purpose | | -------------------- | ---------- | ----------------------------------------- | | `api.wowlab.gg` | Supabase | Portal database, auth, user data | | `sentinel.wowlab.gg` | Sentinel | Pool coordination HTTP API | | `beacon.wowlab.gg` | Centrifugo | WebSocket connections, realtime messaging | ## Design Principles These principles guide how WoW Lab is built. ## Free engine, always The full engine runs in the browser via WebAssembly. No sign-up needed, no setup, nothing to install. Paid plans add hosted pool access for speed, but the engine itself is the same binary either way. ## Stateless services Sentinel and Beacon instances are stateless and interchangeable. Any Sentinel can handle any request. Any pool worker can process any chunk. This enables horizontal scaling without session affinity. Load balancers can route requests to any available instance. ## Centrifugo owns connections Centrifugo manages all WebSocket connections: - Connection lifecycle and reconnection - Presence tracking and subscription management - Message routing and delivery guarantees - Horizontal scaling across multiple instances Services publish messages to Centrifugo rather than maintaining direct connections to clients. ## Idempotency everywhere Every operation must be safe to retry: - Chunk processing is deterministic given the same seed - Progress updates use last-write-wins semantics - Job state transitions are guarded by version checks - Network failures never leave the system in an inconsistent state ## Eventual persistence - Realtime updates flow through Centrifugo - Supabase stores durable records - User-facing data is eventually consistent within seconds - Simulation results are batched for efficient storage ## Roadmap _Dynamic page: content is rendered live at https://app.wowlab.gg/dev/docs/overview/roadmap and is not included here._ ## Simulation Core The simulation engine is written in Rust and compiles to both a native binary (for the hosted pool) and WebAssembly (for the browser). ## Architecture Overview Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. ```mermaid flowchart LR subgraph Input Config[Configuration] Rotation[Rotation] Data[Game Data] end subgraph Engine Sim[Simulation Loop] Events[Event Queue] State[Combat State] end subgraph Output Metrics[Metrics] Timeline[Timeline] end Config --> Sim Rotation --> Sim Data --> Sim Sim --> Events Events --> State State --> Sim Sim --> Metrics Sim --> Timeline ``` ## Core Components | Component | Responsibility | | ------------- | ---------------------------------------------------- | | Configuration | Character stats, talents, gear, encounter parameters | | Rotation | Priority list defining ability usage | | Game Data | Spell definitions, coefficients, scaling formulas | | Event Queue | Priority queue of pending game events | | Combat State | Current buffs, debuffs, resources, cooldowns | | Metrics | DPS, resource usage, ability breakdowns | ## Determinism The engine produces identical results given identical inputs: - Seeded random number generator for all stochastic events - Deterministic floating-point operations - Sorted iteration over collections - No dependency on system time or external state This lets chunks split across any pool worker, or run entirely in the browser, and still aggregate reliably. ## WASM Integration Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident. The engine compiles to WebAssembly using wasm-pack: ```bash cd crates/engine wasm-pack build --target web --out-dir ../../packages/wowlab-engine/wasm ``` The generated WASM module is loaded asynchronously in the browser and provides the same API as the native binary. ## Combat Mechanics The simulation engine implements World of Warcraft's combat mechanics with high fidelity to the live game. ## Damage Calculation Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. ### Base Formula Damage calculation follows the standard formula: ``` Base Damage = Spell Power × Coefficient × Versatility × (1 + Mastery) Final Damage = Base Damage × (1 + Crit Bonus) × (1 + Target Modifiers) ``` ### Coefficient Sources | Source | Description | | ----------- | -------------------------------------------- | | Spell Data | Base coefficient from game data | | Talents | Multiplicative modifiers from talent effects | | Auras | Active buff/debuff modifications | | Set Bonuses | Tier set effect modifiers | ## Resource Systems Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Mana Energy Rage Mana regenerates based on Spirit and in-combat regeneration rules. Base regeneration is 2% of maximum mana per second out of combat. Energy regenerates at a fixed rate of 10 per second baseline. Haste affects energy regeneration rate linearly. Rage is generated through damage dealt and received. Generation rates vary by spec and ability. ## Aura System Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris. Check immunity, apply diminishing returns for CC effects. Execute aura effects: stat modifiers, periodic damage, absorbs. Track remaining duration, handle pandemic refresh rules. Remove aura, trigger on-expire effects, clean up state. ## Proc System Lorem ipsum dolor sit amet, consectetur adipiscing elit. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Real Procs Per Minute (RPPM) uses a bad luck protection system that increases proc chance based on time since last proc. ## Realtime Infrastructure The realtime layer pushes job progress to the portal as it happens and hands chunks to pool workers the moment they're ready. ## Connection Architecture ```mermaid flowchart TB subgraph Clients Portal[Portal Web App] Worker1[Pool Worker 1] WorkerN[Pool Worker N] end subgraph Infrastructure Sentinel[Sentinel Service] Beacon[Centrifugo / Beacon] end Portal -->|WSS| Beacon Worker1 -->|WSS| Beacon WorkerN -->|WSS| Beacon Worker1 -->|HTTP| Sentinel WorkerN -->|HTTP| Sentinel Beacon -->|Proxy callbacks| Sentinel Sentinel --> Beacon ``` ## Channel Structure Centrifugo organizes communication through channels: | Channel Pattern | Purpose | Subscribers | | --------------- | ----------------------- | --------------------------- | | `job:{id}` | Job progress updates | Portal clients watching job | | `node:{id}` | Pool worker assignments | Single pool worker | | `broadcast` | System announcements | All connected clients | ## Message Flow ### Progress Updates 1. Pool worker completes chunk processing 2. Worker publishes progress to `job:{id}` channel 3. Centrifugo delivers to all subscribers 4. Portal updates progress UI in realtime ### Job Assignment 1. Sentinel picks a ready pool worker 2. Sentinel publishes chunk assignment to `node:{id}` 3. Worker receives assignment and begins processing 4. Worker acknowledges receipt via HTTP callback ## Scaling Centrifugo instances share state and can scale behind a load balancer. Each Centrifugo instance handles ~100k concurrent connections. ## Pool Worker Protocol Pool workers are the Fly-deployed binaries that run paid-tier sims. They talk to Sentinel over HTTP for lifecycle and Centrifugo over WebSocket for chunk assignments. Browser WASM workers use a different path and do not speak this protocol. ## Authentication Pool workers authenticate with Ed25519 signatures: Worker generates an Ed25519 keypair on first boot. Private key is persisted to the worker's data volume. Worker sends its public key to Sentinel via HTTP POST. Sentinel returns a signed JWT for the Centrifugo connection. Worker connects to Centrifugo with the JWT and subscribes to its assignment channel. ## HTTP Endpoints | Endpoint | Method | Purpose | | ----------------- | ------ | --------------------------- | | `/register` | POST | Initial worker registration | | `/token` | POST | Refresh Centrifugo JWT | | `/heartbeat` | POST | Health check with metrics | | `/chunk/complete` | POST | Report chunk completion | ## Message Types ### Inbound (to Worker) ```json { "type": "chunk_assign", "job_id": "uuid", "chunk_id": 42, "config": { ... }, "iterations": 1000 } ``` ### Outbound (from Worker) ```json { "type": "chunk_progress", "job_id": "uuid", "chunk_id": 42, "completed": 500, "total": 1000 } ``` ## Error Handling Workers retry failed HTTP requests with exponential backoff. Maximum 5 retries with a 30 second cap. ## Connection Lifecycle ```mermaid stateDiagram-v2 [*] --> Disconnected Disconnected --> Registering: Start Registering --> Connected: Success Registering --> Disconnected: Failure Connected --> Processing: Chunk assigned Processing --> Connected: Chunk complete Connected --> Disconnected: Connection lost Disconnected --> Registering: Reconnect ``` ## Game Data API Public Supabase Edge Functions serving hydrated WoW game data. No authentication required. All endpoints support CORS. Base URL: `https://api.wowlab.gg/functions/v1/data` ## Endpoints ### GET /data/classes Returns all 13 playable classes with their specs nested inside. Cached for 24 hours. ``` https://api.wowlab.gg/functions/v1/data/classes ``` Response: ```json { "classes": [ { "id": 1, "name": "Warrior", "color": "#C69B6D", "fileName": "classicon_warrior", "iconUrl": "https://api.wowlab.gg/functions/v1/icons/large/classicon_warrior.jpg", "specs": [ { "id": 71, "name": "Arms", "role": 2, "orderIndex": 0, "fileName": "ability_warrior_savageblow", "iconUrl": "https://api.wowlab.gg/functions/v1/icons/large/ability_warrior_savageblow.jpg" } ] } ] } ``` Spec roles: `0` = Tank, `1` = Healer, `2` = DPS. ### GET /data/items?ids= Returns items by ID. Max 50 per request. Cached for 1 hour. ``` https://api.wowlab.gg/functions/v1/data/items?ids=19019,32837 ``` Response: ```json { "items": [ { "id": 19019, "name": "Thunderfury, Blessed Blade of the Windseeker", "description": "", "fileName": "inv_sword_39", "iconUrl": "https://api.wowlab.gg/functions/v1/icons/large/inv_sword_39.jpg", "itemLevel": 29, "quality": 5, "requiredLevel": 25, "binding": 1, "classId": 2, "subclassId": 7, "inventoryType": 13, "stats": [{ "type": 3, "value": 2000 }], "effects": [{ "spellId": 21992, "triggerType": 2 }], "setInfo": null, "speed": 2600, "dmgVariance": 0.5 } ] } ``` Item quality: `0` Poor, `1` Common, `2` Uncommon, `3` Rare, `4` Epic, `5` Legendary, `6` Artifact. ## Icons Every object includes an `iconUrl` pointing to a large icon. Swap the size segment in the URL for other sizes: - `/icons/large/` — 56px (default) - `/icons/medium/` — 36px - `/icons/small/` — 18px ## Filtering - Classes are filtered to IDs 1–13 (playable classes only, excludes pets/Adventurer/Traveler) - Specs exclude "Initial" placeholder specs (`order_index = 4`) ## Implementation Source: `supabase/functions/data/index.ts` Queries the `game` schema via supabase-js with the anon key. Hydrates `file_name` fields into `iconUrl` and transforms snake_case DB columns to camelCase. ## MDX Components Components available in MDX files. Use sparingly. Prefer native markdown. ## When to use components | Component | Use for | Not for | | ----------- | --------------------------- | ------------------ | | Alert | Warnings, breaking changes | General info | | Badge | Version numbers, status | Decoration | | Card | Grouped info with title | Single paragraphs | | CardGrid | 2-3 feature cards | More than 3 cards | | Chart | Inline data visualizations | Unstructured dumps | | Cite | Academic-style references | Inline links | | Collapsible | Optional deep-dive | Required reading | | Kbd | Keyboard shortcuts | Code or commands | | Mermaid | Process flows, diagrams | Simple lists | | Steps | Multi-step procedures | Single actions | | Tabs | Alternative implementations | Sequential content | | Term | Project terminology | Common words | ## Alert Warnings and important notices: ```tsx This API changed in v2.0. ``` This API changed in v2.0. ## Badge Status labels: ```tsx Stable Beta Deprecated ``` Stable Beta Deprecated Colors: `green`, `amber`, `red`, `gray` ## Card Grouped content: ```tsx Get up and running in 5 minutes. ``` Get up and running in 5 minutes. ## CardGrid Side-by-side cards (2-3 max): ```tsx For users For developers ``` For users For developers ## Cite Academic-style citations that link to the references page: ```tsx The WebSocket protocol enables real-time communication. The framing spec is in section 5.2. ``` The WebSocket protocol enables real-time communication. The framing spec is in section 5.2. ### Location props Cite specific sections, pages, or lines. Shown in hover card only: | Prop | Renders as | Example | | ------ | ------------ | ----------------------------------------- | | `s` | §4.1 | `...` | | `p` | p. 42 | `...` | | `line` | line 23 | `...` | | `loc` | (raw string) | `...` | Combine multiple: `...` renders as "§3.2, p. 15" ### Archived references References can have screenshots shown in hover cards and on the References page. Use `.withArchive()` in `src/content/references.ts` and place images in `src/content/docs/images/references/`: ```ts import { websocketImg } from "./docs/images/references"; websocket: ref("I. Fette, A. Melnikov", "The WebSocket Protocol", "RFC 6455, IETF", 2011) .url("https://datatracker.ietf.org/doc/html/rfc6455") .withArchive(websocketImg, "2024-12-15"), ``` Citations must have a corresponding entry in `src/content/references.ts`. ## Collapsible Optional detail: ```tsx Additional configuration details... ``` Additional configuration details here. ## Kbd Keyboard shortcuts only: ```tsx Press Ctrl + S to save. ``` Press Ctrl + S to save. ## Mermaid Diagrams with tabs showing both the rendered output and source code. Use standard markdown code blocks: ````tsx ```mermaid flowchart LR A[Input] --> B[Process] --> C[Output] ```; ```` ```mermaid flowchart LR A[Input] --> B[Process] --> C[Output] ``` Supported: flowchart, sequenceDiagram, stateDiagram-v2 ## Charts Charts can be rendered directly from MDX and wrapped in a numbered figure block: ```tsx
``` Use `type="line" | "bar" | "area" | "pie" | "distribution"` to select the chart renderer. `line` and `distribution` use the same analysis popover + overlay stats used in simulation result charts. ## Steps Multi-step procedures: ```tsx Run npm install. Edit the config file. Start the server. ``` Run npm install. Edit the config file. Start the server. ## Tabs Alternative implementations: ```tsx npm pnpm npm install wowlab pnpm add wowlab ``` npm pnpm `npm install wowlab` `pnpm add wowlab` ## Term Project-specific terminology with hover definitions: ```tsx Our Centrifugo instance is called . You can also use custom text: the Beacon server. ``` Our Centrifugo instance is called . You can also use custom text: the Beacon server. Terms must have a corresponding entry in `src/content/terms.ts`: ```ts export const terms: Record = { beacon: term( "beacon", "Beacon", "Our Centrifugo-based real-time messaging server...", "/dev/docs/networking/realtime-infrastructure", // optional docs link ), }; ``` ## Content Style Guidelines for writing MDX content. ## Writing style - **Be concise.** Short sentences. No filler. - **Be accurate.** Verify claims against code. - **Be direct.** Tell the reader what to do. - **Use present tense.** "The engine compiles" not "The engine will compile". ## Prefer native markdown | Element | When to use | | --------------- | ----------------------------------------------- | | `## Heading` | Major sections. H2 for main, H3 for subsections | | `**bold**` | Key terms on first use | | `` `code` `` | Function names, variables, file paths | | `- list` | 3+ related items | | `[link](/path)` | Navigation, references | | Tables | Comparing 3+ items | | Code blocks | Examples (always include language) | ## Code blocks Always specify language: ```rust fn main() {} ``` Supported: `rust`, `typescript`, `json`, `bash`, `yaml`, `jsx`, `sql` ## Links - Internal: `[text](/dev/docs/section/page)` - External: `[text](https://example.com)` - Anchor: `[text](#heading)` ## Citations Use the `` component for academic-style references: ```mdx The WebSocket protocol enables real-time communication. ``` Add reference entries to `src/content/references.ts` before using a citation ID. ## Do NOT - Add placeholder content ("TODO", "Coming soon") - Make claims without verifying against code - Use components for decoration - Nest components excessively - Write walls of text without structure - Use em dashes - Use AI-sounding phrases ("It's important to note", "Furthermore") ## File naming - Docs: `{order}-{slug}.mdx` (e.g., `00-quickstart.mdx`) - Blog: `{date}-{slug}.mdx` (e.g., `2025-12-hello.mdx`) ## Frontmatter Required and optional fields for documentation pages: ```yaml --- title: Page Title # Required description: Brief summary # Optional, used for SEO nextSteps: # Optional, array of doc paths - 01-overview/00-architecture - 02-engine/00-simulation-core --- ``` ## Review checklist All code compiles and runs correctly. All internal and external links resolve. Technical claims verified against source code. Concise, direct, present tense. No AI artifacts. ## Branding Everything you need to use the WoW Lab brand in your own work. ## The mark A gear with a lightning-bolt arrow striking out of it, set inside a circular cutout. The gear and bolt share the same amber-to-orange gradient. The arrow exits the top-right of the gear, breaking the circle. The lockup uses no wordmark, the icon stands alone. ## Colors The mark is a two-stop gradient. The dark surface used in the social images and app icons is the same near-black navy used as the cutout fill in the SVG. | Token | Hex | Where it shows up | | ----------------- | --------- | ----------------------------------------------- | | Amber (gradient) | `#FDC20B` | Top-left stop of the gear and bolt gradient | | Orange (gradient) | `#F46B03` | Bottom-right stop of the gear and bolt gradient | | Surface | `#020611` | Cutout fill in the mark, social image backdrop | The amber and orange map to the app's `--primary` token, which sits in the same hue range across light and dark themes. ## Typography WoW Lab uses [Geist](https://fonts.google.com/specimen/Geist) by Vercel for body text and [Geist Mono](https://fonts.google.com/specimen/Geist+Mono) for code. Regular (400) and Bold (700) cover everything on the site. Grab the variable font if you want every weight in one file. ## Source files The `/branding` folder at the repo root holds three subfolders. | Folder | Contents | | ---------- | -------------------------------------------------------------- | | `source/` | Master logo in SVG, AI, EPS, PDF, PSD, and transparent PNG | | `favicon/` | Browser favicons, Apple touch icon, and PWA manifest icons | | `social/` | Profile and cover images for social media, on the dark surface | For most uses, grab `source/wowlab-logo.svg`. For a raster version on a transparent background, use `source/wowlab-icon-transparent.png`. ## Where to get it The full [`/branding` folder](https://wowlab.gg/go/github/tree/main/branding) lives on GitHub. Clone the repo or download the folder directly. # Bible ## Introduction Unlike our [documentation](/dev/docs) this so aptly named Bible is meant to provide an in-depth overview of the whole project. If you are looking for general usage instructions on how to configure WoW Lab's features look no further than the [introduction](/dev/docs/introduction) of the documentation. If you are looking to contribute to WoW Lab or simply want to understand how it's architecture or simulation internals work you are in the right place. ## Prelude Given the length of this bible, it's natural to skip content. However, I would recommend at least reading the introduction to understand how to find what you are interested in. In contrast to the docs which are written as a user facing guide, this more closely follows the structure of a paper. While I in no way claim accuracy in all areas, the idea behind it is to create a sustainable and lasting resource on how to run combat simulations in World of Warcraft. This includes both the technical aspects behind the infrastructure powering WoW Lab, as well as the relevant game mechanics including parsing and interpreting all required data from the World of Warcraft DBC files. It is worth highlighting that almost all of the algorithms used and problems solved apply to a wide range of discrete simulations. None of the problems we face are new. They are well documented in simulation literature and game engine design. While I do believe that I have a solid grasp on most subjects, it would be foolish to think any one person knows best. I invite any reader to contribute to the bible on [GitHub](https://wowlab.gg/go/github/tree/main/packages/shared/src/content) and question how we do things and why. If one was forced to summarize the idea behind the bible in one sentence, let it be this: > 'Tis but a humble attempt to combine well-researched solutions to well-known problems. It is my firm believe that knowledge belongs to everyone and I hope most people recognize WoW Lab and in extend this bible for what it is meant to be: A way for the community to preserve this knowledge in a lasting and open way. ## Finding content First and foremost the entire bible is searchable. Just hit + K or use the search bar on the top of the navigation to the right. If you are not reading this on the [WoW Lab website](http://wowlab.gg/dev/bible/introduction) I would highly recommend doing so, due to most of the markdown components used being custom made. As mentioned earlier the bible roughly follows the structure of a scientific paper. Therefore it starts with the motivation behind the project and follows up with the problems that it tries to solve. Both the infrastructure behind WoW Lab and the actual game specific challenges have been given dedicated sections. Finally it demonstrates my humble solutions to these problems and most importantly why I chose these solutions. The discrete simulation space is vast and for every problem there are multiple solutions. This makes the roads not taken as important as the ones we settled on. I encourage all interested readers to study the [references](/dev/bible/references) for more in-depth knowledge. Just to re-iterate: The absolute minoriy of things we do here are new and I do believe every single cited source in the bible is worth a read. Following are the most important sections: | Section | Description | | ----------------------------------------------------- | ------------------------------------------------------------------------------------------------ | | [Overview](/dev/bible/overview/architecture) | Motivation, related work, the system architecture, and the rotation language | | [Game Data](/dev/bible/game-data/dbc-overview) | DBC files, spell and talent data, item scaling, data resolution, and code generation | | [Engine](/dev/bible/engine/discrete-event-simulation) | Discrete-event simulation, the event system, the rotation compiler, the cast pipeline, mechanics | | [Distribution](/dev/bible/distribution/orchestration) | Orchestration, the WASM boundary, realtime, hosted compute, deployment, and the database | | [Portal](/dev/bible/portal/architecture) | The Next.js app: architecture, state, the rotation editor, the simulation UI, and the content | | [Compendium](/dev/bible/compendium/death-knight) | Per-class and per-spec theorycraft research and implementation notes | ## Motivation Why simulate WoW combat at all, what value does it provide to players, the gap between theorycrafting spreadsheets and full discrete simulation, history of WoW simulation tools (SimulationCraft, Raidbots), what WoW Lab aims to do differently (open source, community-driven, sustainable). ## Problem Statement The core challenges: extracting and maintaining game data from undocumented DBC files across patches, accurately modeling combat mechanics with thousands of interacting spells/auras/procs, achieving statistical significance in reasonable time (performance), making simulation accessible to non-technical players, scaling compute between the browser (free tier) and a hosted pool (paid tier). ## Solution Approach High-level overview of the solution: a Rust-based discrete event simulation engine compiled to both native (for the hosted pool) and WASM (for browser), a data pipeline from DBC files through transformation to Postgres, a Next.js portal for configuration and visualization, a sentinel-scheduled pool of trusted workers on Fly for paid users, real-time coordination through Centrifugo.
```mermaid flowchart TB subgraph data [Game Data] DBC[WoW DBC Files] --> Parse[Rust Parser] Parse --> Transform[Transform Layer] Transform --> DB[(Supabase Postgres)] end subgraph engine [Simulation Engine] Engine[Rust Engine] --> WASM[WASM Build] Engine --> Native[Native Build] end subgraph portal [Portal] DB --> App[Next.js App] WASM --> App App --> Config[Simulation Config] end subgraph compute [Hosted Pool] Config --> Sentinel[Sentinel Scheduler] Sentinel --> Pool[Fly Worker Pool] Pool --> Native Pool --> Results[Results] Results --> App end ```
## Related Work The history of WoW combat simulation, the two fundamentally different approaches that emerged, and why it matters for understanding the design decisions behind WoW Lab. ## The Two Schools - Two fundamentally different approaches to answering "what gear should I wear": discrete event simulation and stat weighting - Both try to solve the same problem but make very different trade-offs in accuracy, speed, and complexity - Understanding the difference is essential context for why WoW Lab exists and the choices it makes ## Discrete Event Simulation - Simulate combat second by second (or more precisely, event by event), tracking every spell cast, buff tick, proc trigger, and cooldown - No shortcuts. Model the actual game loop. Roll the dice. Let mechanics interact naturally - Inherently accurate when modeled correctly because it mirrors what actually happens in-game - Downside: computationally expensive, requires thousands of iterations for statistical significance - This is what SimulationCraft pioneered and what WoW Lab does ### SimulationCraft - The gold standard for years. Open source C++ engine, community maintained - Action Priority Lists (APL) for rotation logic, same concept WoW Lab uses - Raidbots made it accessible by wrapping SimC in a web UI with cloud compute - Limitations: single-threaded C++ codebase, difficult to extend, no browser execution, aging architecture - SimC proved the approach works. The question was whether the tooling around it could be modernized ### Early Ask Mr. Robot - Early versions used a discrete simulation approach similar to SimC - Provided gear optimization on top of simulation results - Later pivoted away from discrete simulation entirely (covered below) ## Stat Weights and Weighted Engines - The alternative approach: instead of simulating combat, assign a numerical weight to each stat point - "1 point of Crit is worth 0.8 DPS, 1 point of Haste is worth 0.95 DPS" and so on - Score gear by multiplying each stat by its weight and summing. Higher score means better gear - Fast. Trivially fast. No simulation needed at all once you have the weights - The problem: weights are only accurate at the exact gear level they were computed for ### Where Stat Weights Break Down - Stat interactions are non-linear. Haste makes Crit better because you cast more spells. Crit makes Haste better because each spell hits harder on crit - At different gear levels the relative value of stats shifts, sometimes dramatically - Stat weights are a linear approximation of a non-linear system. Works okay near the measurement point, gets worse the further you move from it - Breakpoints, tier set interactions, trinket procs, and talent synergies make this even messier - You end up needing to re-simulate to get new weights anyway, which defeats the purpose ### QE Live - Stat weight based optimization tool for WoW - Fast results, no waiting for simulation runs - Trade-off: accuracy suffers in exactly the situations where players need the most help (comparing very different gear sets, evaluating tier pieces, trinkets with procs) ### Newer Ask Mr. Robot - Pivoted to a stat weight and analytical model approach - Faster than discrete simulation but inherits the fundamental accuracy limitations of the approach - Made the deliberate trade-off of speed over simulation fidelity ## Why Discrete Simulation Wins - When specs have 10+ interacting buffs, procs, and cooldowns, there is no closed-form solution - The only way to know for sure is to simulate it and let the mechanics play out - Stat weights can lie. Simulation results converge to truth given enough iterations - The real challenge is not whether to simulate but how to make simulation fast and accessible enough that players don't need to settle for approximations - That is the problem WoW Lab sets out to solve ## Architecture A simulation is a pure function: gear and talents and a rotation go in, a damage distribution comes out. Everything on this page exists to run that function in two places, inside your browser tab, and across a fleet of machines other people lend us, and to make sure both places give you the same answer. That single idea is the easiest way to hold the platform in your head. The browser is the free tier: it owns the engine as a WebAssembly module and runs it in a worker, no account required. The hosted tier is for jobs too large to wait on a laptop for. Those get split into pieces and farmed out to community compute. The only thing that differs between the two is where the engine runs and where the result lands, because both paths call the exact same Rust entry point: ```rust pub async fn simulate_intent( sim_config: &str, chunk: &ChunkAssignment, seed_base: u64, resolver: &DynDataResolver<'_>, progress: &dyn ProgressSink, ) -> Result { ``` Everything else on this page is the plumbing that gets a TOML config and a resolver to that function and gets the result back.
```mermaid flowchart LR subgraph Browser["Browser"] Studio["studio (app.wowlab.gg)"] Engine["Engine (WASM)"] end subgraph Edge["Edge (Cloudflare)"] EdgeWorkers(["studio / landing workers"]) end subgraph FlyLHR["Fly (lhr)"] Sentinel(["sentinel"]) Beacon(["beacon (Centrifugo)"]) Nodes(["Nodes"]) Infra(["Infra: nats / redis / headscale / alloy"]) end GameData[(Game Data - Supabase)] Studio -->|served by| EdgeWorkers Studio -->|run locally in worker| Engine Engine -->|PostgREST reads| GameData Studio -->|create job| GameData GameData -->|NOTIFY pending_job| Sentinel Sentinel -->|publish chunks / jobs| Beacon Beacon -->|WSS push| Nodes Beacon -->|WSS push progress| Studio Nodes -->|engine via simulate_intent| Engine Nodes -->|POST /chunks/complete| Sentinel Sentinel -->|finalize results| GameData Beacon -->|broker / presence| Infra ```
The boxes in that figure are the canonical units of this whole reference. Every later section zooms into exactly one of them: the Game Data box opens up in [data resolution](/dev/bible/game-data/data-resolution), the Engine box in [the simulation core](/dev/bible/engine/discrete-event-simulation), the beacon box in [realtime](/dev/bible/distribution/realtime), and the Nodes box in [hosted compute](/dev/bible/distribution/hosted-compute). I will name the parent box whenever a figure expands one, so you can always trace a detail back up to this map. The names are worth fixing now, because they recur verbatim: - **Browser**: the `studio` app at `app.wowlab.gg`, plus the **Engine** compiled to WebAssembly and run in web workers. - **Edge (Cloudflare)**: the Cloudflare Workers (OpenNext) that serve `studio` and the `landing` marketing site. - **Game Data (Supabase)**: the Postgres database. It holds both the read-only `game.*` schema (spells, items, specs) and the `public` operational tables (jobs, nodes). - **Engine**: the Rust simulation engine. The same code, whether compiled to WASM for the browser or to a native binary on a node. - **sentinel**: the scheduler. One Fly.io process that listens for new jobs, splits them into chunks, assigns them to nodes, ingests completions, and finalizes results. - **beacon (Centrifugo)**: the realtime pub/sub server. Everything that needs to push a message to a browser or a node goes through it. - **Nodes**: the community compute. Worker processes (or browser tabs) that subscribe to a channel, run the engine, and report back. - **Infra**: `nats` (the beacon's broker), `redis` (the beacon's presence store), `headscale` (the mesh control plane for burst nodes), and `alloy` (metrics shipping). ## The two execution paths The reason there are two paths is a deliberate trade-off, not an accident of growth. I wanted the tool to be genuinely free to use without an account, and a modern laptop can run a small simulation in seconds. So the browser owns a full copy of the engine. But a stat-weight scan or a gear tournament can be hundreds of thousands of iterations, and at that scale you want more cores than one tab has. Rather than rent a server farm, the platform lets the community contribute compute and pools it. The cost of this choice is real: there are now two runtimes to keep in lockstep, two telemetry-decode paths, and a whole distribution layer that the free path doesn't need. The mitigation is that both paths compile the _same_ engine and call the _same_ entry point, so the simulation result cannot diverge between them. ### Free / WASM path In the browser, the engine and the shared `wowlab-common` crate are compiled to WebAssembly with `wasm-pack` and loaded into web workers, wrapped with Comlink. The worker calls `runSimulation` / `runSimulationWithProgress`, thin `#[wasm_bindgen]` wrappers that build a `ChunkAssignment` and hand it to `simulate_intent`. Game data is fetched on demand through a JavaScript resolver, a three-layer cache of in-memory `Map`, IndexedDB, and Supabase PostgREST that the Rust side calls back into across the boundary. One constraint shapes this path: WASM has no LLVM, so the in-browser engine always uses the rotation interpreter rather than the JIT. The full boundary is the subject of [the WASM boundary](/dev/bible/distribution/wasm-boundary). ### Hosted node path A hosted job takes the long way around. The browser writes a job row to Supabase; a database trigger fires `NOTIFY pending_job`; the sentinel, which is listening, wakes up, splits the job into chunks, and publishes each chunk to a node over the beacon. The node fetches its work context, runs `simulate_intent` once per iteration, signs the protobuf result, and POSTs it back. The sentinel aggregates, and when the last chunk lands it writes the final result and pushes a "completed" message to the browser. The native node uses the LLVM JIT, the `SupabaseResolver` over a three-layer cache, and a tokio worker pool. The detail lives in [orchestration](/dev/bible/distribution/orchestration) and [hosted compute](/dev/bible/distribution/hosted-compute); the map is the next figure. ## The job lifecycle The clearest way to see the hosted path is to follow one job from the moment you click "simulate" to the moment results tick in. Every arrow below is a real call: an endpoint, a Postgres channel, or a Centrifugo channel, not an idealized sketch.
```mermaid sequenceDiagram participant Studio as studio participant DB as Supabase participant Sentinel as sentinel participant Beacon as beacon participant Node as node Studio->>DB: rpc create_job (INSERT job, status pending) DB->>Sentinel: NOTIFY pending_job Sentinel->>DB: scheduler_fetch_pending_jobs.sql Sentinel->>Beacon: publish jobs:{id} (running) Beacon->>Studio: push jobs:{id} Sentinel->>Beacon: publish chunks:{nodeKey} Beacon->>Node: push chunks:{nodeKey} Node->>Sentinel: GET /jobs/{id}/work_context Sentinel->>Node: 200 base sim config and payload Node->>Node: simulate_intent (per iteration) Node->>Sentinel: POST /chunks/complete (protobuf) Sentinel->>DB: jobs_finalize.sql (last chunk) Sentinel->>Beacon: publish jobs:{id} and jobs:all (completed) Beacon->>Studio: push jobs:{id}, render results ```
A few things in that sequence are load-bearing and easy to miss. The Postgres channel is `pending_job`, singular. The unit of distribution is a chunk: the sentinel splits a job into `ChunkAssignment`s and publishes each to exactly one node's channel, `chunks:{nodePublicKey}`. The node never gets the full sim config over the channel. It gets a small assignment, then pulls the real work context separately, authorized against an in-memory claim. Completions come back as a protobuf `BatchChunkCompletion` to `POST /chunks/complete`, Ed25519-signed over the request body. And critically, the authority for in-flight jobs is an **in-memory store on the sentinel**, not the database. Only the final `result_pb` and `timeline_pb` are ever written to Postgres, via `jobs_finalize.sql`. That choice keeps the hot loop off the database, at the cost of losing all in-flight state if the sentinel restarts; on restart it deliberately fails any job still marked `running`. Browser tabs participate in this exact same loop. A tab that has joined as a worker subscribes to its own `chunks:{publicKey}` channel and POSTs signed completions back, identically to a native node. To the sentinel, a browser node and a Fly node are the same thing. ## The Rust workspace The Engine box in the system context is not one crate. It is about two dozen, and the split is the part most worth understanding, because it is the reason the same engine can run in a browser and on a server without conditional compilation creeping into the simulation logic.
```mermaid flowchart LR centrifuge --> common common --> engine-macros engine["engine"] --> common engine --> engine-adapter-data engine --> engine-application engine --> engine-combat engine --> engine-content engine --> engine-domain engine --> engine-ports engine --> engine-sim engine --> manifest-schema engine-adapter-data --> engine-ports engine-adapter-data --> supabase engine-application --> engine-adapter-data engine-application --> engine-content engine-application --> engine-domain engine-application --> engine-ports engine-application --> engine-sim engine-combat --> engine-domain engine-combat --> engine-ports engine-combat --> engine-sim engine-content --> engine-combat engine-content --> engine-domain engine-content --> engine-ports engine-content --> engine-sim engine-domain --> buffer-contract engine-domain --> engine-ports engine-macros --> buffer-contract engine-ports --> common engine-ports --> supabase engine-sim --> engine-ports node["node"] --> centrifuge node --> engine-adapter-data node --> engine-application node --> engine-content node --> engine-ports sentinel --> centrifuge sentinel --> common ```
The figure above expands the **Engine** box of the system-context diagram. The layering reads bottom-up. `engine-ports` and `engine-domain` are the foundation: ports are the trait boundaries (the `DataResolver` for game data, the `SpecHandler` for combat logic, the `Event` and `SimState` types), and the domain holds the core simulation types and the rotation engine. `engine-sim` is the scheduler, the timing-wheel event queue and the run loop, and it depends on ports plus the shared `common` crate, but on no combat, content, or domain crate, so it knows nothing about combat formulas or spell data. `engine-combat` and `engine-content` layer the actual mechanics and per-spec handlers on top. `engine-application` is the orchestration layer that wires data resolution to handler construction to the run loop; it is what exposes `simulate_intent`. The host shells, the `node` worker, the `sentinel` scheduler, and the WASM `engine` cdylib, sit at the very top and contribute the I/O. The payoff of this is direct: the simulation core (`engine-sim`, `engine-domain`, `engine-ports`) depends on no I/O at all. Game data arrives through the `DataResolver` port, so whether the bytes came from Supabase, a local CSV file, or a JavaScript callback in a browser worker is invisible to the simulation. That is what lets the identical engine run in both execution paths. The cost is the usual one for clean architecture: more crates, more trait indirection, and a dependency graph you have to actually read to navigate. I think it has paid for itself; the [resolver layer](/dev/bible/game-data/data-resolution) is the clearest example of it doing so. What runs where, and why, is the platform in one table: | Component | Where it runs | What it is | Why there | | ------------------- | ------------------------------------------ | --------------------------------------------- | ---------------------------------------------------------------------- | | studio | Cloudflare Workers (OpenNext), global edge | The auth app at `app.wowlab.gg` | Edge for low-latency global delivery | | landing | Cloudflare Workers, global edge | The marketing site at `wowlab.gg` | Same edge runtime as studio | | Engine | Browser (WASM) + native nodes | The Rust simulation engine | Same code both paths; WASM for the free tier, native for hosted | | sentinel | Fly.io, `lhr` | Scheduler, HTTP API, Discord bot, cron, MCP | Co-located with beacon, nats, and Supabase for the NOTIFY/publish loop | | beacon (Centrifugo) | Fly.io, `lhr` | Realtime pub/sub server (Centrifugo v6) | Near sentinel and nats for the realtime hot path | | Nodes | Fly.io `lhr`, Latitude.sh, or browsers | Community compute that runs the engine | Where spare cores are; browser tabs and rented metal both qualify | | nats | Fly.io, `lhr` | The beacon's pub/sub broker | Must be co-located with beacon | | redis | External | The beacon's presence store | Online-node roster, separate from the broker | | headscale | Fly.io, `lhr` | Self-hosted mesh control plane | Secure mesh for burst/external nodes | | alloy | Fly.io, `lhr` | Metrics and log shipper to Grafana | Scrapes the internal Fly DNS of sentinel and beacon | | Game Data | Supabase cloud | Postgres: `game.*` data + `public` jobs/nodes | Single source for both read-only game data and operational state | One detail in that table answers a question people ask: the beacon is backed by **both** NATS and Redis, for different jobs. NATS is the broker that fans out publications; Redis is the presence manager that tracks which nodes are online. They are not redundant. They back different subsystems. ## Where the simulation actually is Everything above is plumbing around a `for` loop. The engine is a [discrete-event simulation](/dev/bible/engine/discrete-event-simulation): a DES that pops time-ordered events from a queue, advances a virtual clock to each event's timestamp, and asks a per-spec handler what to do next. It is a pure scheduler. The run loop computes no damage; all combat logic lives in the handler. Each iteration is one Monte-Carlo combat run with its own deterministic RNG, and a chunk runs many of them sequentially, accumulating a distribution. Run enough iterations and the noise averages out into a stable DPS estimate with confidence intervals. That loop, the timing wheel that drives it, the rotation compiler that decides each action, and the combat formulas that resolve each cast are the subject of [the engine section](/dev/bible/engine/discrete-event-simulation). The data those formulas read, every spell coefficient and item scaling curve and talent, comes from the [game data layer](/dev/bible/game-data/dbc-overview), which is where the reference goes next. ## Language Evaluation Languages evaluated for the simulation engine and why each was considered and ultimately rejected or chosen. The key requirements: raw performance for tight simulation loops, WASM compilation for the browser-side engine, memory safety without garbage collection pauses, ergonomic type system for modeling game mechanics, ecosystem for async networking (the hosted pool). Why not C++ (memory safety, build system complexity, WASM story), why not Go (GC pauses in tight loops, no WASM at the time, less expressive type system), why not TypeScript/JavaScript (performance ceiling, no native compilation for nodes), why not Zig (ecosystem maturity, async story), why not Java/C# (GC, WASM story). Why Rust: zero-cost abstractions, trait system for spec polymorphism, fearless concurrency with Rayon, first-class WASM target via wasm-pack, LLVM as an embedded JIT compiler via inkwell, strong ecosystem (tokio, serde, prost), single language for engine + pool workers + sentinel + CLI, compile-time guarantees (no null pointers, no data races), performance parity with C/C++.
```mermaid flowchart TB start[Language Selection] --> perf{Raw Performance?} perf -->|Yes| tsjs[TypeScript / JS] tsjs --> tsjs_fail["Fail: V8 overhead ceiling,\nno native compilation for pool workers"] perf -->|Yes| go[Go] perf -->|Yes| cpp[C++] perf -->|Yes| zig[Zig] perf -->|Yes| java[Java / C#] perf -->|Yes| rust[Rust] go --> gc{No GC Pauses?} gc -->|Fail| go_fail["Fail: GC pauses in tight\nsimulation loops"] java --> gc2{No GC Pauses?} gc2 -->|Fail| java_fail["Fail: GC pauses,\npoor WASM story"] cpp --> safety{Memory Safety?} safety -->|Fail| cpp_fail["Fail: manual memory mgmt,\nbuild system complexity"] zig --> eco{Mature Ecosystem?} eco -->|Fail| zig_fail["Fail: ecosystem immaturity,\nweak async story"] rust --> wasm{WASM Target?} wasm -->|wasm-pack| jit{Embedded JIT?} jit -->|LLVM via inkwell| types{Expressive Type System?} types -->|Traits, enums, generics| async_net{Async Networking?} async_net -->|tokio| concurrency{Safe Concurrency?} concurrency -->|Rayon, Send/Sync| chosen[Rust Chosen] classDef md-red fill:#ffdbdc,stroke:#e5484d classDef md-green fill:#e6f6eb,stroke:#30a46c class tsjs_fail,go_fail,java_fail,cpp_fail,zig_fail md-red class chosen md-green ```
The trade-offs accepted: steep learning curve, longer compilation times, borrow checker friction in graph-like game state, less hiring pool than mainstream languages. ## Rotation Language The rotation language problem: users need to express complex priority-based decision logic (if cooldown ready and buff active and resource > 50 then cast X), it must be evaluable millions of times per simulation, it must be authorable by non-programmers, it must be validatable before execution, it must be serializable for storage and sharing. Scripting languages tried and why they failed: Lua (embedding overhead, sandbox escapes, GC pressure in hot loop, poor WASM story), Rhai (too slow for millions of evaluations per second, dynamic typing overhead), custom interpreted DSL (still too slow, interpretation overhead dominates at scale), expression trees with eval (better but still branch-heavy). Why APL (Action Priority List): simple mental model (ordered list of conditions → actions), natural fit for WoW combat (priority-based not scripted), established convention from SimulationCraft, flat structure maps well to UI (drag-drop reordering, condition editors). The JIT solution: JSON AST → LLVM IR (via inkwell) → native machine code, a few nanoseconds per evaluation vs ~300ns interpreted, eliminates branch prediction misses, compiles once per simulation run, schema validation catches errors before compilation, same JSON format stored in database and evaluated in engine, WASM fallback for browser-side validation (no JIT, tree-walk interpreter).
```mermaid flowchart LR json["JSON APL\n(Supabase DB)"] --> parse[Rotation::from_json] parse --> ast["AST\n(Actions, Exprs,\nVariables, Lists)"] ast --> resolve[SpecResolver\nNames to IDs] resolve --> resolved[Resolved AST] resolved --> validate[validate_rotation\nSchema + Type Check] validate --> valid{Valid?} valid -->|No| errors[ValidationErrors\nReturned to UI] valid -->|Yes| target{Target?} target -->|Native| schema[SchemaBuilder\nContext Layout] schema --> llvmir[LLVM IR\nvia inkwell] llvmir --> opts["Optimize\n(OptimizationLevel::Aggressive)"] opts --> jitmod[ExecutionEngine\nFinalize] jitmod --> native["Native fn ptr\n~3ns / eval"] target -->|WASM| interp["Tree-Walk\nInterpreter"] interp --> wasmeval["Interpreted Eval\n~300ns / eval"] native --> simloop[Simulation Loop\npopulate_context\n+ call fn ptr] wasmeval --> browser[Browser-Side\nValidation] classDef md-green fill:#e6f6eb,stroke:#30a46c classDef md-yellow fill:#fff7c2,stroke:#e9c162 class native md-green class wasmeval md-yellow ```
```mermaid flowchart LR lua[Lua Embedding] -->|Rejected| lua_why["Embedding overhead,\nsandbox escapes,\nGC in hot loop,\npoor WASM story"] lua_why --> rhai rhai[Rhai Scripting] -->|Rejected| rhai_why["Too slow for M evals/sec,\ndynamic typing overhead"] rhai_why --> dsl dsl[Custom Interpreted DSL] -->|Rejected| dsl_why["Interpretation overhead\ndominates at scale"] dsl_why --> expr expr[Expression Trees + Eval] -->|Rejected| expr_why["Better but still\nbranch-heavy"] expr_why --> apl apl["JSON APL\n(Action Priority List)"] --> jit[LLVM JIT] jit --> result["~3ns per eval\nNative machine code\nSchema-validated\nSerializable JSON"] classDef md-red fill:#ffdbdc,stroke:#e5484d classDef md-neutral fill:#f9f9fb,stroke:#cdced6 classDef md-green fill:#e6f6eb,stroke:#30a46c class lua,rhai,dsl,expr md-red class lua_why,rhai_why,dsl_why,expr_why md-neutral class apl,jit,result md-green ```
Roads not taken: full scripting language (too complex for users, security concerns), visual node graph (Unreal Blueprint-style, too heavyweight), SimC APL text format (parsing fragility, no structured editing). ## DBC Overview Everything the engine knows about a spell, an item, or a talent tree ultimately comes from World of Warcraft's own client database. The client ships its game data as a large set of tables, historically called DBC files, and every number I simulate (a cooldown, a coefficient, an aura duration) is a column in one of those tables. The simplest way to think about this whole section is: the game tells us the numbers, and the data layer's only job is to find them, reshape them, and hand them to the engine. I do not parse the binary client files directly. By the time data reaches this repository it has already been extracted to CSV, one file per table. The loader reads those CSVs into a single in-memory bundle, and the transforms turn that bundle into the flat types the engine consumes. This page covers that first hop: CSV to bundle. The pages that follow cover the reshaping, and [Data Resolution](/dev/bible/game-data/data-resolution) covers how the bundle, a Postgres mirror, and a browser cache all end up behind one trait. ## The bundle: `DbcData` The whole CSV side of the data layer collapses into one struct. `DbcData` is a flat record of roughly 130 lookup tables: `spell_name`, `spell`, `spell_misc`, `spell_effect`, `spell_power`, `chr_specialization`, `trait_node`, `item`, `item_sparse`, `item_bonus`, `curve`, `curve_point`, `rand_prop_points`, `power_type`, `expected_stat`, and so on. Each one is an `IntMap`, or a grouped `IntMap>` for one-to-many relations, keyed by the row's primary id or a foreign key. There is nothing clever about the bundle. It is deliberately a dumb container: load once, look up by id, never mutate. All of the interpretation, joining `spell` to `spell_misc` to `spell_effect`, deciding which effect carries the damage coefficient, happens later in the transform layer, never in the loader. ## Loading from CSV `DbcData::load_all` reads each table from `{data_dir}/data/tables/{Table}.csv`, so `Spell.csv`, `SpellName.csv`, and `ItemSparse.csv` all live side by side under one directory tree: ```rust pub fn load_all(data_dir: &Path) -> Result { let tables_dir = data_dir.join("data").join("tables"); ``` The function is one long sequence of per-table load calls. There is no schema registry and no reflection driving it, just an explicit list. The CSV directory is located through the `WOWLAB_DATA_DIR` environment variable. The engine CLI reads it and falls back to `{HOME}/Source/wowlab-data`, and `forge` does the same against its own `default_data_dir`. Rotations sit alongside the tables at `{data_dir}/rotations/{id}`. One design choice worth naming, because it is easy to misread as a bug: a missing CSV file is **not** an error. `read_csv_bytes` returns `None` for a file that is not present, and the loader turns that into an empty `IntMap`: ```rust fn read_csv_bytes(path: &Path, table_name: &str) -> Result>, DbcError> { let file_path = path.join(format!("{}.csv", table_name)); if !file_path.exists() { return Ok(None); } let data = fs::read(&file_path).map_err(|e| DbcError::Io { path: file_path.display().to_string(), source: e, })?; Ok(Some(data)) } ``` So pointing the loader at an incomplete or empty data directory yields a bundle full of empty tables rather than a hard failure. I chose this because partial data sets are useful during development and because the transforms downstream already tolerate missing rows. The catch is that a typo in the data directory surfaces as "spell not found" much later, not as "directory missing" up front. Under the hood the loader uses three generic readers, parameterized by small marker traits derived on the row structs: | Loader | Keyed by | Shape | | ---------------- | -------------------- | ----------------------------------------------- | | `load_by_id` | primary `ID` column | `IntMap` | | `load_by_fk` | a foreign-key column | `IntMap>` (two-pass count + fill) | | `load_one_by_fk` | a foreign-key column | `IntMap` (first row wins) | The row structs themselves sit next to the loader, and their field names match the CSV column headers exactly, so deserialization is a straight serde mapping with no manual column indexing. ## Where the bundle goes `DbcData` is the input to the transform layer and, by extension, to the local resolver. `LocalCsvResolver` loads it lazily on first access and caches the `Arc`. The CSV path is the source of truth for the local resolver and, at snapshot time, for the rows that get written into Supabase. The remaining pages in this section follow that data forward: first the spell table, the largest and most important one, then talent trees, items and scaling, and the tooltip parser, before the two resolution pages tie the CSV path, the Postgres path, and the browser path together. ## Spell Data A spell in WoW is not one row. The client splits a single ability across a dozen tables: its name, its timing, its costs, its school, and a variable number of effects each with their own coefficients. The transform layer joins all of that back together into one struct, `SpellDataFlat`, so the rest of the system can treat a spell as a single value. The simplest mental model: `SpellDataFlat` is "everything the engine could ever want to know about one spell, flattened into one record." This is the largest and most consulted type in the data layer, so it is worth walking through its field groups rather than dumping the whole struct. Because `SpellDataFlat` derives serde snake_case, the exact same struct deserializes from a CSV-derived transform, a Supabase JSON row, or a JS-bridge value. That single-shape property is what lets three very different backends feed one engine; I return to it in [Data Resolution](/dev/bible/game-data/data-resolution). ## Field groups Rather than 80-odd fields in arbitrary order, the struct is organized into logical groups. These are the groups the engine actually reads during `resolve_game_data` (covered in [Codegen](/dev/bible/game-data/codegen)). | Group | Representative fields | | -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- | | Identity / text | `id`, `name`, `description`, `aura_description`, `is_passive`, `knowledge_source` | | Timing / cost | `cast_time`, `recovery_time`, `category_recovery_time`, `start_recovery_time` (GCD), `power_costs`, `charge_recovery_time`, `max_charges` | | Range / AoE | `range_max_0/1`, `range_min_0/1`, `cone_degrees`, `radius_max`, `radius_min` | | School / coefficient | `defense_type`, `school_mask`, `bonus_coefficient_from_ap`, `effect_bonus_coefficient`, `min_scaling_level`, `max_scaling_level` | | Interrupts | `interrupt_aura_0/1`, `interrupt_channel_0/1`, `interrupt_flags` | | Duration / empower | `duration`, `max_duration`, `can_empower`, `empower_stages` | | Vec columns | `attributes`, `effect_trigger_spell`, `implicit_target`, `learn_spells`, `effects` | | Aura props | `max_stacks`, `periodic_type`, `tick_period_ms`, `refresh_behavior`, `pandemic_refresh`, `tick_may_crit`, `tick_on_application` | | RPPM / labels | `rppm_base_rate`, `rppm_flags`, `rppm_mods`, `labels` | A couple of fields carry contracts that are not obvious from their names, and getting them wrong silently produces wrong cooldowns: - **Cooldown is two columns.** The engine must use `max(recovery_time, category_recovery_time)`. Picking either one alone is wrong for spells that put their cooldown on a shared category. - **`start_recovery_time` is the GCD**, not a cooldown. - **A non-zero `interrupt_channel_0` means the spell is a channel**, which changes how its "cast time" is interpreted. For channels the duration field is the channel length, not the cast bar. The resolution pass spells the cooldown rule out, with one extra wrinkle: a single-charge spell can hide its real cooldown in `charge_recovery_time` instead: ```rust let mut cooldown_ms = spell.recovery_time.max(spell.category_recovery_time); if spell.max_charges <= 1 && spell.charge_recovery_time > cooldown_ms { cooldown_ms = spell.charge_recovery_time; } ``` ## Effects The payload of a spell lives in its effects. `effects` is a `Vec`, and each `SpellEffect` is one DBC effect row carrying its own type, aura sub-type, base points, and the two coefficients that drive damage: ```rust pub struct SpellEffect { /// 0-indexed; `$s1` etc. are 1-indexed in descriptions. pub index: i32, pub effect: i32, /// Aura type if this is an apply-aura effect. pub aura: i32, /// The `$s` value. pub base_points: f64, /// Aura tick period in ms - the `$t` value. pub period: i32, /// The `$x` value. pub chain_targets: i32, pub trigger_spell: i32, /// School, mechanic, etc. depending on effect type. pub misc_value_0: i32, pub misc_value_1: i32, /// For the `$a` value. pub radius_min: f32, /// For the `$a` value. pub radius_max: f32, pub coefficient: f32, pub variance: f32, /// Bonus coefficient from spell power. pub bonus_coefficient: f64, /// Bonus coefficient from attack power. pub bonus_coefficient_from_ap: f64, pub amplitude: f32, pub pvp_multiplier: f32, } ``` A few of those fields carry conventions worth calling out. `index` is 0-based in the flat row even though the same effect is `$s1` (1-based) in descriptions. `effect` is the effect type id and `aura` is the aura sub-type id. `coefficient` and `variance` are the direct-damage roll, while `bonus_coefficient` is the spell-power coefficient and `bonus_coefficient_from_ap` is the attack-power coefficient. The coefficient naming is the one trap here. When the resolution pass builds the engine's damage view it reads `bonus_coefficient` as the SP coefficient and `bonus_coefficient_from_ap` as the AP coefficient. An effect whose direct coefficients are both zero but which redirects to a trigger spell is followed down a bounded chain to find the real payload. I cover that trigger-chain walk in [Codegen](/dev/bible/game-data/codegen). There is a subtle index mismatch to keep in your head. The flat `SpellEffect.index` is 0-based, but the resolver's `get_spell_effect(spell_id, effect_index)` takes a **1-based** index because that is the author-facing convention used in manifests and overrides. The local resolvers translate between the two via `validate_effect_index`, which subtracts one and treats index `0` as "not found": ```rust pub(crate) fn validate_effect_index( spell_id: SpellId, effect_index: u8, effects: &[SpellEffect], ) -> Result { if effect_index == 0 { return Err(ResolverError::SpellEffectNotFound { spell_id, effect_index, }); } effects .get((effect_index as usize) - 1) .cloned() .ok_or(ResolverError::SpellEffectNotFound { spell_id, effect_index, }) } ``` Mixing the two conventions is the single easiest way to read the wrong effect. The remaining sub-types are small: `PowerCostEntry { power_type, cost, cost_pct, optional_cost }` describes one resource cost, `EmpowerStage { stage, duration_ms }` one empower tier, and `LearnSpell { learn_spell_id, overrides_spell_id }` the learn/override linkage. None of them need their own page; they exist only to keep the per-spell record self-contained. ## Talent Trees A talent tree is a graph: nodes connected by edges, where each node grants one or more spells and may have several ranks. A loadout, the string you paste from the game or from Wowhead, is just a compressed list of which nodes you bought and how many ranks. The data layer's job is two halves: describe the tree once, and decode a loadout against it into a concrete set of selections the engine can turn into spell ids. ## The tree: `TraitTreeFlat` `TraitTreeFlat` is the flattened description of one spec's entire trait graph. Like every other flat type it deserializes identically from CSV-transform output or a Supabase row: ```rust #[derive(Debug, Clone, Default, Serialize, Deserialize)] #[cfg_attr(feature = "schema", derive(schemars::JsonSchema))] pub struct TraitTreeFlat { pub spec_id: i32, pub spec_name: String, pub class_name: String, pub tree_id: i32, pub all_node_ids: Vec, pub nodes: Vec, pub edges: Vec, pub sub_trees: Vec, pub point_limits: PointLimits, } ``` The fields split into three groups. `spec_id`, `spec_name`, `class_name`, and `tree_id` are identity. `all_node_ids` is the full node membership, and `nodes` plus `edges` are the graph itself, with `sub_trees` holding the hero-talent subtrees. `point_limits` is the spend caps, defaulting to class 31, spec 30, and hero 10. A `TraitNode` carries its grid position, `max_ranks`, a `node_type`, and a `Vec`. The entries are the important part. Each one holds the data the engine ultimately cares about: ```rust #[derive(Debug, Clone, Default, Serialize, Deserialize)] #[cfg_attr(feature = "schema", derive(schemars::JsonSchema))] pub struct TraitNodeEntry { pub id: i32, pub definition_id: i32, pub spell_id: i32, pub name: String, pub description: String, pub icon_file_name: String, } ``` A normal node has one entry; a choice node has several, and the loadout records which one you picked. Edges are minimal, just `id`, `from_node_id`, `to_node_id`, and `visual_style`, and exist mostly for the planner UI. The engine cares about purchased nodes, not the topology connecting them. ## Decoding a loadout A loadout string is a base64-encoded bitstream in the same format the game client and Wowhead use. Decoding it is a pure function over the bytes, with no game data required. `decode_trait_loadout` reads a version byte, a 16-bit spec id, a 16-byte tree hash, and then a per-node run of selection bits: selected, purchased, partially-ranked, a 6-bit rank count, and for choice nodes a 2-bit choice index. The result is a `DecodedTraitLoadout`, the header plus a `Vec`, one entry per node in tree order. That decoded list is positional. It says "node 0 has 2 ranks, node 1 is unselected", so it only becomes meaningful once paired with the tree that defines what each position is. That pairing happens in `apply_decoded_traits`, which walks the tree's nodes alongside the decoded selections and produces a `TraitTreeWithSelections`: the tree flattened together with a `Vec`. Each `TraitSelection` is `{ node_id, selected, ranks_purchased, choice_index }`, the engine-facing form. The resolver wires these two functions together. `DataResolver::decode_traits` has a default body that fetches the tree with `get_trait_tree`, decodes the string with `decode_trait_loadout`, and applies it with `apply_decoded_traits`. Because the body is default, every backend gets loadout decoding for free as long as it can return a `TraitTreeFlat`. ## From selections to spell ids The engine does not simulate selections directly; it simulates the spells those selections grant. `get_trait_spell_ids` bridges the gap: ```rust fn get_trait_spell_ids( &self, spec_id: i32, trait_string: &str, ) -> impl core::future::Future, ResolverError>> { async move { let tree_with_selections = self.decode_traits(spec_id, trait_string).await?; ``` It decodes the loadout, then for every selection with `ranks_purchased > 0` it finds the matching node, reads the chosen entry (`choice_index` defaults to `0`), and collects `entry.spell_id`. The result is sorted and deduplicated. Those ids are exactly what bootstrap feeds into the resolution pass as extra spells to fetch, so a talented spell ends up in the engine's game-data view the same way a baseline spell does, a thread I pick up in [Codegen](/dev/bible/game-data/codegen). The honest limitation here is that the WASM resolver cannot decode loadouts at all: its `get_trait_tree` is a stub that returns `TraitTreeNotFound`, so the default `decode_traits` body fails before it begins. In the browser, talent decoding therefore happens JS-side before the engine is invoked, not across the WASM boundary. I cover that asymmetry in [Data Resolution](/dev/bible/game-data/data-resolution). ## Items and Scaling An item in WoW is rarely the item you actually equip. The base row gives you a template, a name, an inventory type, a set of stat slots, but the real values depend on the item level, and the item level depends on a list of _bonus IDs_ applied on top of the base. Resolving an item means taking the template, applying the bonuses, and scaling everything through a set of curves to arrive at one effective state. The data layer carries the template and the scaling tables; one function does the resolution. ## The template: `ItemDataFlat` `ItemDataFlat` is the base item record. It holds identity (`id`, `name`, `file_name`), classification (`item_level`, `quality`, `class_id`, `subclass_id`, `inventory_type`), and the variable parts: `stats: Vec`, `effects: Vec`, `sockets`, set membership, and drop sources. The two sub-types that matter most downstream are `ItemStat { stat_type, value }`, a single stat slot where `value` is an allocation budget rather than a final number, and `ItemEffect { spell_id, trigger_type, charges, cooldown, category_cooldown }`, which is how a trinket or weapon proc points at the spell it casts. The custom `Default` is worth noting because it encodes WoW conventions rather than zeroes: a missing item gets `file_name = "inv_misc_questionmark"`, `stackable = 1`, and `allowable_class = allowable_race = -1`, meaning "no restriction". So a default-constructed item behaves like a generic unrestricted item, not like an empty one. ## The scaling tables: `ItemScalingData` The numbers that turn a budget into a stat live in seven separate DBC tables, and `ItemScalingData` is the bundle that holds all of them as pre-grouped maps, each field an `IntMap` keyed for fast lookup: ```rust pub struct ItemScalingData { /// Item bonuses grouped by `parent_item_bonus_list_id`. pub bonuses: IntMap>, /// Curves by ID. pub curves: IntMap, /// Curve points grouped by `curve_id`, sorted by `order_index`. pub curve_points: IntMap>, /// Rand prop points by item level (id = item_level). pub rand_prop_points: IntMap, /// Midnight item-level scaling configs by id. pub item_scaling_configs: IntMap, /// Midnight per-config offset curves by id. pub item_offset_curves: IntMap, /// Per-expansion item-squish curves by id. pub item_squish_eras: IntMap, } ``` `bonuses` is item bonuses grouped by `parent_item_bonus_list_id`. `curves` and `curve_points` are the scaling curves and their sampled points, points sorted by `order_index`. `rand_prop_points` is the per-item-level stat budgets. The last three, `item_scaling_configs`, `item_offset_curves`, and `item_squish_eras`, are the configuration that maps an item onto a curve and handles stat squishes. What makes this bundle the convergence point of the whole data layer is its constructor. `ItemScalingData::from_flat` takes the seven raw `Vec<…Flat>` lists and does the grouping and sorting in one place: ```rust /// Build from flat collections (used by both CSV and Supabase resolvers). // #t(fn: large_fn_params) data-plumbing constructor mirrors all flat scaling collections verbatim pub fn from_flat( item_bonuses: Vec, curves: Vec, curve_points: Vec, rand_prop_points: Vec, item_scaling_configs: Vec, item_offset_curves: Vec, item_squish_eras: Vec, ) -> Self { ``` Both backends call exactly this function with exactly these seven arguments. The CSV path builds the seven vectors from `DbcData` via `transform_all_item_bonuses`, `transform_all_curves`, and friends; the Supabase path fetches the same seven tables over the network and feeds the rows into the identical call. One grouping function, two completely different sources, and that is the design that keeps the backends honest. | Field | Source table | Keyed by | | ---------------------- | --------------------------- | ---------------------------------- | | `bonuses` | `game.item_bonuses` | `parent_item_bonus_list_id` | | `curves` | `game.curves` | curve id | | `curve_points` | `game.curve_points` | curve id (sorted by `order_index`) | | `rand_prop_points` | `game.rand_prop_points` | item level | | `item_scaling_configs` | `game.item_scaling_configs` | config id | | `item_offset_curves` | `game.item_offset_curves` | id | | `item_squish_eras` | `game.item_squish_eras` | id | ## Resolution: `resolve_item` `resolve_item` is the single function that turns a template plus bonuses into an effective item: ```rust /// Resolve an item's full effective state from its bonus IDs (the `weapon` field is left for the sim). // #t(fn: cyclomatic_complexity) one exhaustive 21-variant match over every BonusType // #t(fn: max_fn_lines) the exhaustive parsed-mutation pass and assembly live in one function pub fn resolve_item( base: &ItemDataFlat, bonus_ids: &[i32], scaling_data: &ItemScalingData, player_level: Option, drop_level: Option, ) -> ResolvedItem { ``` It is the resolution shared by the sim and the tooltip. The same code path produces the numbers the engine simulates and the numbers a tooltip displays, so the two can never drift. The flow is: collect the applicable bonuses for the given `bonus_ids`, resolve the effective item level from those bonuses, then walk every bonus through an exhaustive match over the bonus-type variants, quality overrides, socket additions, stat changes, and so on, mutating a working `ParsedItem` and recording an `AppliedBonus` diagnostic for each. The output is a `ResolvedItem`: the single effective state of the item after its bonuses are applied. The one thing `resolve_item` deliberately does **not** compute is weapon damage. The `weapon` field is left for the sim, because weapon damage depends on the damage-scaling table and weapon speed, which the combat layer owns. `resolve_item` is also exported across the WASM boundary so the browser can resolve and display items without a round-trip. It lives in `wowlab-common`, which the engine bundle re-exports. The resolver trait's `get_item` only ever returns the base `ItemDataFlat`; turning that into a `ResolvedItem` is a separate, pure step layered on top, which is why it can run identically on a node, in the CLI, and in a browser tab. ## Spell Descriptions The text in a WoW tooltip is not plain text. A description like "Deals $s1 Fire damage over $d" is a small template language: `$s1`means "effect 1's value,"`$d`means "duration," and there are conditionals, color codes, pluralization, and cross-spell references besides. Showing a readable tooltip means parsing that template and substituting the real numbers. This is a self-contained sub-parser in the data layer, and I want to be upfront about its scope: it exists to render tooltips for the UI, not to feed the simulation. The engine reads coefficients straight from`SpellDataFlat`; it never goes through the description text. The implementation lives under `crates/common/src/parsers/spell_desc/` and is a textbook three-stage pipeline: lex, parse, render, with a fourth analysis stage layered on top. ## Lex `lex` and `tokenize` turn a description string into a stream of `Token` values. The lexer is built on the `logos` crate, so the token grammar is regex-driven rather than hand-rolled: literal text, `$`-variables, color codes, and the braces that open expression blocks each get their own token. A second lexer, `lex_expr`, handles the expression sub-language inside `${…}` blocks. ## Parse `parse` consumes the token stream and produces a typed AST. It returns a `ParseResult` that carries the tree and a list of errors side by side: ```rust pub struct ParseResult { pub ast: ParsedSpellDescription, pub errors: Vec, } ``` Errors are collected rather than thrown, so a malformed description still yields a best-effort tree. The AST itself is genuinely rich. The node enum covers plain text, color codes, and a whole family of variable nodes, plus pluralization, gender, conditionals, and a small arithmetic expression grammar: ```rust pub enum SpellDescriptionNode { Text(TextNode), Variable(VariableNode), ExpressionBlock(ExpressionBlockNode), Conditional(ConditionalNode), Pluralization(PluralizationNode), Gender(GenderNode), ColorCode(ColorCodeNode), } ``` Each variant fans out further. The variable node alone splits into effect values, spell-level values, player state, cross-spell references, and enchant variables, and the expression grammar has binary and unary operators and function calls. The breadth is dictated by the source format. WoW's description strings really do use all of it, not by ambition on my part. ## Render Rendering walks the AST and substitutes values, but it does not know any values itself. Instead `render_with_resolver` takes a resolver and asks it for each piece of data it needs. The resolver is split into three traits, `EffectValueResolver`, `PlayerStateResolver`, and `SpellTextResolver`, composed into one super-trait: ```rust pub trait SpellDescResolver: EffectValueResolver + PlayerStateResolver + SpellTextResolver {} ``` A blanket impl means anything implementing all three sub-traits qualifies for free. The crate ships a `NullResolver` that resolves nothing and a `TestResolver` for tests. This is the same dependency-inversion shape as the rest of the data layer: the renderer depends on a trait, and the caller supplies the backend that actually has the numbers. ## Analyze There is one extra entry point. `analyze_dependencies` walks a parsed description to find which other spells and effects it references, without rendering it. That is what lets a tooltip pre-fetch the cross-referenced spells it will need before it renders. The whole pipeline, tokenize and render and analyze, is also exported across the WASM boundary so the browser can render tooltips client-side. Those bindings are the `wasm_*` functions in the module's wasm submodule. I am keeping this page short on purpose. The description language is large and has many corner cases, but it sits to the side of the simulation data flow. If you are tracing how a number reaches the engine, the relevant path is the flat spell and item types of the previous pages, not the tooltip parser. The next page returns to that main line: how all four data sources end up behind one resolver trait. ## Data Resolution The engine never asks "where does this spell's data live?" It asks "give me spell 12345," and one trait answers, regardless of whether the bytes come from a CSV file, a Postgres mirror, a binary cache, or a JavaScript object in a browser tab. That trait is the resolver, and this page is the hinge of the whole data layer: it is where the four sources of the previous pages converge into one interface, and where that interface gets folded into the single read-only view the simulation runs on. The figure below expands the **Game Data** box of the system-context diagram in [Architecture](/dev/bible/overview/architecture). Read it left to right: sources on the left, the `DataResolver` trait in the middle erasing their differences, and the resolution pass on the right collapsing everything into `ResolvedGameData`.
```mermaid flowchart LR subgraph sources [Sources] CSV[(data/tables/*.csv)] PG[(game.* tables - Supabase)] JS["JS object (getSpell/getItem/...)"] end CSV -->|DbcData::load_all + transforms| RCSV["DataResolver (LocalCsv)"] PG -->|get_json path, game| CACHE["GameDataCache (L1/L2/L3)"] CACHE -->|feeds| RSUPA["DataResolver (Supabase) wraps the cache"] JS -->|Reflect::get + JsFuture + serde_wasm_bindgen| RJS["DataResolver (JsResolver)"] RCSV --> OVERLAY["OverlayResolver (rotation/spell overrides)"] RSUPA --> OVERLAY RJS --> OVERLAY OVERLAY -->|resolve_game_data introspects spec, fetches every spell/aura| BUILDER["ResolvedGameDataBuilder"] BUILDER -->|.build| RGD["ResolvedGameData (Arc, read-only)"] RGD -->|Arc clone into engine| ENGINE["simulation / combat"] ```
## The port: `DataResolver` `DataResolver` is an async trait. Every backend implements the same handful of required methods, `get_spell`, `get_spell_effect`, `get_item`, `get_scaling_data`, `get_power_types`, `get_spec`, `get_trait_tree`, `get_rotation_script`, and inherits default bodies for the rest. `get_spells` batches over `get_spell`, `decode_traits` composes the loadout functions from [Talent Trees](/dev/bible/game-data/talent-trees), and so on. Making an async trait usable as a trait object is otherwise impossible, so two attribute macros do the work: ```rust #[cfg_attr(target_arch = "wasm32", allow(async_fn_in_trait))] #[cfg_attr(not(target_arch = "wasm32"), trait_variant::make(Send + Sync))] #[dynosaur::dynosaur(pub DynDataResolver = dyn(box) DataResolver, bridge(dyn))] pub trait DataResolver { ``` `trait_variant::make(Send + Sync)` produces a `Send + Sync` variant on native targets, while on wasm32 the futures are intentionally left `!Send` because the browser is single-threaded and demanding `Send` there buys nothing. Then `dynosaur::dynosaur` generates `DynDataResolver`, the boxed, dyn-compatible wrapper that the rest of the engine passes around. `DynDataResolver::new_box`, `new_arc`, and `from_ref` are the three ways a concrete resolver gets erased. Failures flow through one `#[non_exhaustive]` `ResolverError` enum with a variant for each kind of miss, `SpellNotFound`, `ItemNotFound`, `TraitTreeNotFound`, and so on, plus transport errors for IO, the JS bridge, and Supabase. Because it is non-exhaustive, adding a new backend's error mode does not break existing matches. ## The six backends There are six implementations, and the choice among them is entirely a function of _where the process runs_, not of any runtime configuration the user sees. | Impl | Source | Where it runs | File | | -------------------- | ----------------------------------------------- | ------------------------------- | -------------------- | | `SupabaseResolver` | `game.*` over PostgREST, via `GameDataCache` | nodes, server-side | `remote/supabase.rs` | | `LocalCsvResolver` | `DbcData` from CSV, transformed lazily | CLI, forge, dev | `local/csv.rs` | | `JsResolver` | a JS object's `getSpell`/`getItem`/… promises | browser (WASM) | `bridge/js.rs` | | `OverlayResolver` | decorates any base `R` with in-memory overrides | everywhere overrides are needed | `bridge/overlay.rs` | | `InMemoryResolver` | hand-built maps | tests | `in_memory.rs` | `OverlayResolver` is the one that is not a source at all. It is a decorator that wraps another resolver and shadows specific spells, effects, or rotation scripts from in-memory maps before delegating the rest to the base. That is how a custom rotation or a tweaked spell gets injected without touching the underlying data, which the CLI, forge, and the WASM path all rely on. Its whole state is the base plus three override maps: ```rust pub struct OverlayResolver { base: R, spell_overrides: IntMap, effect_overrides: HashMap<(i32, u8), SpellEffect>, rotation_overrides: FastMap, } ``` Two honest naming and behaviour notes, because the code does not match every casual description of it: - The Supabase backend is `SupabaseResolver`, not "RemoteSupabaseResolver". There is no type by the latter name. It is a thin wrapper whose only field is a `GameDataCache`. - Its rotation queries use a different schema from everything else. Every game-data fetch goes through `get_json(path, "game")`, setting an `Accept-Profile: game` header, but `get_rotation_script` and `get_assisted_rotation` use plain `client().get(path)` with no profile header, so they read the `rotations` table from the default (public) schema. The split is deliberate, rotations are user content, not game data, but it is easy to miss when reading the resolver as "all PostgREST." The WASM backend is also deliberately partial. `JsResolver`'s `get_spec`, `get_trait_tree`, and `get_spell_effect` are stubs that return `Not Found` rather than calling into JS, so spec- and trait-aware data is simply absent across the boundary. The browser side handles those concerns before invoking the engine; I trace the JS object protocol in [WASM Boundary](/dev/bible/distribution/wasm-boundary). ## The three-layer cache `SupabaseResolver` is fast only because it almost never hits the network. Its `GameDataCache` is a read-through cache with three layers, and the state machine below expands the **GameDataCache** box of the resolution figure above.
```mermaid stateDiagram-v2 [*] --> CheckL1 CheckL1 --> Done: "moka hit (L1)" CheckL1 --> CheckL2: "moka miss" CheckL2 --> PromoteL1: "disk hit (L2)" PromoteL1 --> Done: "insert into moka" CheckL2 --> Fetch: "disk miss" Fetch --> Persist: "PostgREST ok (L3)" Persist --> Done: "write disk + insert moka" Fetch --> Error: "SupabaseError" Done --> [*] state Init { [*] --> PatchCheck PatchCheck --> ClearDisk: "patch mismatch" PatchCheck --> Keep: "patch matches" } ```
Spells, traits, items, and specs all flow through one generic path, `get_cached(mem_cache, disk_category, key, fetch)`. The body is the whole three-layer story in eleven lines: ```rust if let Some(v) = mem_cache.get(&key) { return Ok(v); } if let Some(v) = self.read_disk::(disk_category, key) { mem_cache.insert(key, v.clone()); return Ok(v); } let v: V = fetch(key).await?; self.write_disk(disk_category, key, &v)?; mem_cache.insert(key, v.clone()); Ok(v) ``` Try the **L1 Moka** in-memory cache, fall back to the **L2 disk JSON** file and promote any hit back into L1, and only on a double miss issue the **L3 PostgREST** request, then write the result to disk and L1. The Moka caches are capacity-bounded; spells cap at 50,000 entries. Invalidation is by patch version. On construction the cache reads `{cache_dir}/patch_version`, and if it does not match the requested patch it clears the entire disk cache. A patch bump throws away everything stale in one step rather than trying to diff individual rows. Two paths sidestep the generic machinery, and they are worth knowing: - **Scaling data is fetched as seven parallel requests, not four.** `get_scaling_data` issues one `futures::try_join!` over all seven scaling tables, `item_bonuses`, `curves`, `curve_points`, `rand_prop_points`, `item_scaling_configs`, `item_offset_curves`, `item_squish_eras`, and folds them with the `ItemScalingData::from_flat` constructor from [Items and Scaling](/dev/bible/game-data/items-and-scaling). The crate's own CLAUDE notes say four; the code does seven. The result is cached behind an `RwLock` because it is a single large bundle rather than per-id rows. - **Power types are not cached at all.** `get_power_types` issues a fresh request every call. There are few power types and they are read rarely, so the cache bookkeeping was not worth it, a small and deliberate exception. ## Folding it all into `ResolvedGameData` The resolver answers per-spell questions, but the engine wants one consolidated, immutable view of every number for the spec it is about to simulate. `resolve_game_data` builds it. The key idea is that it does not fetch a fixed list of spells. It asks the spec what it uses, then fetches exactly that. It introspects the spec handler to discover every spell, aura, and auto-attack id it references, unions in the talent and item ids that bootstrap collected, and then walks that set: for each spell it fetches `SpellDataFlat`, derives costs and gains and the corrected cooldown, and records the props; for each effect it records the AP/SP coefficients; for each aura it records duration, stacks, and pandemic behaviour. The output is `ResolvedGameData`, an `Arc`-wrapped, cheap-to-clone, read-only value. Because the build is driven by what the spec actually needs, an unused spell never costs a fetch, a property that matters most over the network. The full pass, and how generated code reads back from this value, is the subject of [Codegen](/dev/bible/game-data/codegen). ## Codegen A spec, say Arcane Mage, is defined in two places, and the split is the whole point. The numbers (a cooldown, a coefficient, an aura duration) live in the game data and reach the engine through `ResolvedGameData`. The _behaviour_ is hand-written in a small TOML manifest: which spells exist, which auras they apply, which talents matter. Code generation stitches the two together: it reads the manifest and emits Rust that, at build time, pulls the omitted numbers out of `ResolvedGameData`. A manifest is therefore almost all structure and almost no numbers; anything it leaves out is resolved from data. This figure expands the **Engine** box of the system-context diagram in [Architecture](/dev/bible/overview/architecture), showing the build-time half of that picture: manifests in, generated Rust out.
```mermaid flowchart LR TOML[(manifests/*.toml)] ITEMS[(manifests/items.toml)] TOML -->|toml::from_str::| MAN["manifest-schema::Manifest"] ITEMS -->|toml::from_str::| IMAN["manifest-schema::ItemsManifest"] MAN -->|generate_spec_file| SPECRS["specs/.rs"] IMAN -->|generate_item_file| ITEMRS["items/.rs"] SPECRS -->|write_build_fn| BUILD["build_combat_system + aura_*/spell_* fns"] SPECRS -->|write_descriptor| DESC["define_spec_descriptor! (inventory submit)"] BUILD -->|fs::write output| GEN[(engine-content/src/generated)] DESC -->|fs::write output| GEN ITEMRS -->|fs::write output| GEN GEN -->|reads at build_combat_system time| RGD["ResolvedGameData accessors"] ```
## The schema: `manifest-schema` `manifest-schema` is the canonical, serde-only definition of what a manifest may contain. It is shared by both the generator and forge's audit tooling, so there is exactly one description of the format. A per-spec manifest deserializes into `Manifest`; items deserialize into `ItemsManifest`. Every manifest carries a `schema_version`, checked against a `CURRENT_SCHEMA_VERSION` constant so a format change fails loudly rather than generating wrong code. The per-spec `Manifest` mirrors a spec's combat definition: ```rust #[derive(Debug, Deserialize)] pub struct Manifest { #[serde(default = "default_schema_version")] pub schema_version: u32, pub spec: SpecSection, pub resource: ResourceSection, #[serde(default)] pub secondary_resource: Option, #[serde(default)] pub auras: IndexMap, #[serde(default)] pub spells: IndexMap, #[serde(default)] pub auto_attacks: IndexMap, #[serde(default)] pub talents: BTreeMap, #[serde(default)] pub metric_keys: Option, #[serde(default)] pub rotation_schema: Option, #[serde(default)] pub hero_talents: IndexMap, #[serde(default)] pub spell_groups: Vec, } ``` Each section deserializes into its own struct. The recurring theme is that almost every field on a spell or aura is `Option`, and an absent value means "resolve from game data." | Section | Type | Holds | | --------------------------------- | ---------------------------------- | -------------------------------------------------------------- | | `spec` | `SpecSection` | WoW spec id, pet flag, custom handler, precombat/stealth auras | | `resource` | `ResourceSection` | name, max, regen, starts_at, type id | | `secondary_resource` | `Option` | name, max, type id only, no regen | | `auras` | `IndexMap` | aura defs; declaration order = `LocalAuraIdx` | | `spells` | `IndexMap` | spell defs; declaration order = `LocalSpellIdx` | | `auto_attacks` | `IndexMap` | swing timers and AP coefficients | | `talents` / `hero_talents` | `BTreeMap` / `IndexMap` | talent name-to-id maps | | `metric_keys` / `rotation_schema` | optional sections | telemetry keys and rotation field schema | A `SpellDef` carries `id` and then a long list of optional overrides: ```rust #[derive(Debug, Deserialize)] pub struct SpellDef { pub id: u32, #[serde(default)] pub cooldown: Option, #[serde(default)] pub cost: Option, #[serde(default)] pub gain: Option, #[serde(default)] pub damage: Option, #[serde(default)] pub off_gcd: bool, #[serde(default)] pub cast_time_ms: Option, #[serde(default)] pub gcd_ms: Option, #[serde(default)] pub charges: Option, #[serde(default)] pub charge_cd: Option, #[serde(default)] pub applies_aura: Option, #[serde(default)] pub reduces_cd: Option>, #[serde(default)] pub reduces_cd_chance: Option, #[serde(default)] pub resets_cd_while: Option, #[serde(default)] pub hook: Option, #[serde(default = "default_breaks_stealth")] pub breaks_stealth: bool, #[serde(default)] pub aoe: Option, #[serde(default)] pub channel: Option, #[serde(default)] pub requires_aura_id: Option, #[serde(default)] pub requires_aura_min_stacks: Option, #[serde(default)] pub cost_free_when_aura: Option, } ``` Cooldown, cost, gain, damage, cast time, GCD, charges, the applied aura, AoE and channel sub-sections, cooldown-reduction rules, and a `hook` for custom behaviour are all here, and all optional. In a real manifest most of them are absent: the Arcane Mage manifest declares ten spells and fourteen auras where almost every spell gives only its `id` and lets the generator pull cooldown, cost, cast time, and damage from data. The manifest says _what the spec does_; the data says _by how much_. ## The generator: `codegen-cli` The `codegen` binary is invoked as `cargo codegen` and writes into `crates/engine-content/src/generated/`. Its driver, `generate_all`, reads every `*.toml` in the manifests directory, treats `items.toml` specially, parses each spec file into a `Manifest`, version-checks it, and calls `generate_spec_file`. It then emits the items, the barrel `mod.rs` files, and finally writes every `(filename, source)` pair to disk. A `--check` mode regenerates in memory and diffs against the committed files, so CI can prove the generated code is in sync with the manifests without a writable tree. `generate_spec_file` emits, in order: the id constants (the `SPELL`/`AURA`/`TALENT` modules), the local-index constants, one `aura_*` and one `spell_*` builder function per declaration, the `build_combat_system` function, the handler factory, and the `define_spec_descriptor!` invocation. The descriptor registers the spec at link time through the `inventory` crate, which is how `find_descriptor` later finds a spec by id without any central registry to maintain. ## The build-time override versus run-time accessor loop The connective idea is worth stating precisely, because it is the reason the two halves stay consistent. When a manifest field is present, the generator emits a literal. When it is absent, the generator emits a _call into `ResolvedGameData`_ instead. The generated Arcane Mage code, for instance, fills a spell's cooldown with `data.cooldown_s(...)` and its damage by passing `data.damage_def(...)` into the builder's `s.damage_auto(...)`, both reading the very same `ResolvedGameData` that the resolution pass built. `cooldown_s` and `damage_def` are `ResolvedGameData` accessors; `damage_auto` is the `CombatSystemBuilder` method that consumes the resolved `DamageDef`. Each accessor returns an `Option`, and the generated code converts a `None` into `BuilderError::MissingSpellData`, so a spell whose data never resolved fails the build of the combat system rather than silently simulating zeros. That closes the loop with [Data Resolution](/dev/bible/game-data/data-resolution). At build-combat-system time, the generated builder functions read coefficients, costs, cooldowns, and durations from `ResolvedGameData`; at resolution time, `resolve_game_data` populated that value by introspecting the spec the generated code defined. The manifest is the override layer on top, and the game data is the default underneath. Add a number to the manifest and it wins; omit it and the data fills in. And because both halves agree on the spell ids (the generated id constants are exactly the ids the resolver fetched), the override and the default always line up. This is the boundary between the data layer and the engine: from here on, every page assumes the engine is holding a fully resolved `ResolvedGameData` and a registered spec, and asks what it does with them. ## Discrete Event Simulation The engine never ticks. It keeps a queue of future events sorted by time, pops the earliest one, jumps the clock straight to that event's timestamp, and asks a handler what to do. Nothing happens between events because, by construction, nothing _can_ happen between events. That single sentence is the whole mental model. Everything below earns the detail. ## Why not a fixed time step The naive way to simulate combat is a fixed-step loop: advance a millisecond, check what's due, advance another millisecond. It is simple and wrong in two directions at once. Most milliseconds have nothing due, so you burn cycles on empty ticks; and anything finer than your step (a tick that lands at 1.5 ms past a step boundary) gets quantised to the grid. A combat sim spends almost all of its wall-clock time idle between a cast finishing and the next global cooldown expiring. For a fixed-step loop that idle time is pure overhead. Discrete event simulation inverts the loop. Instead of asking "what time is it now, and what is due?" it asks "what is the next thing that happens, and when?" Time advances in one jump to that event. A 300-second encounter that resolves in a few thousand events touches a few thousand timestamps, not 300,000 milliseconds. This is the standard event-driven formulation of simulation; none of it is novel here, and I lean on the simulation literature rather than reinventing it. The trade-off is that you give up the implicit ordering a tick loop gives you for free. With discrete events you have to keep the event set sorted yourself, and you have to be deliberate about ties (two events at the same millisecond). The engine solves ordering with a [timing wheel](/dev/bible/engine/event-system) and breaks ties FIFO by insertion sequence. ## The engine is a pure scheduler The second design commitment is that the loop knows nothing about combat. It understands eight [event variants](/dev/bible/engine/event-system) and a `SimState` clock, and it dispatches to a `SpecHandler` trait object. The fields of `SimEngine` say it plainly: a queue, a clock, telemetry, an RNG, the handler, a sink, and the encounter end. Nothing about damage, auras, resources, or cooldowns: ```rust pub struct SimEngine<'a> { queue: &'a mut EventQueue, state: SimState, telemetry: &'a mut TelemetryAccumulator, rng: SimRng, handler: &'a mut dyn SpecHandler, sink: &'a mut TelemetrySink, encounter_end_ms: u32, } ``` All the combat logic lives behind that `SpecHandler` trait in `engine-combat`. The loop's only jobs are: pop the next event, advance the clock, call the matching handler method, and drain whatever the handler scheduled back into the queue. This split is what lets the same loop drive any spec. The handler is the spec; the loop is the clock. It also keeps the loop trivially testable in isolation. A mock handler that schedules a few events exercises every branch without a single line of game data. `SimTime` is the unit of that clock: a `u32` of **milliseconds**, nothing more. ```rust pub struct SimTime(pub u32); ``` Not seconds, not floats. Integer milliseconds, so event timestamps are exact and the timing wheel can index them directly. The conversion to real seconds only happens when a rate like DPS or regen needs it, and it divides by a thousand right there at the read site. ## One iteration, end to end A single Monte-Carlo run is one combat from `t = 0` to the encounter end. The host (CLI, browser worker, compute node, or the `forge` profiler) does not call the loop directly. It calls `simulate_intent`, which parses the sim config, resolves all game data, builds the handler, and only then drives the loop once per iteration.
```mermaid flowchart TB Host(["Host (CLI / WASM / node / forge)"]) -->|simulate_intent| Req["simulate_intent_request"] Req -->|bootstrap_chunk| Boot["bootstrap"] Boot -->|parse + validate v1| Cfg["validated SimConfigIntent"] Boot -->|find_descriptor| Desc["SpecDescriptor"] Boot -->|compute_stats_from_gear| Stats["CombatStats"] Boot -->|resolve_game_data| GD[(ResolvedGameData)] Desc --> Handler["Box dyn SpecHandler"] GD --> Handler Stats --> Handler Req -->|run_chunk loop| Loop["run_chunk_into_accumulator_impl"] Handler --> Loop Loop -->|per iteration| Engine["SimEngine.run"] Engine -->|events via| Queue[(EventQueue)] Queue --> Engine Engine -->|record_iteration| Tele["telemetry"] Tele -->|encode proto| Report["ChunkReport"] ```
This figure expands the **Engine** box of the system-context diagram (see [Architecture](/dev/bible/overview/architecture)). Reading it left-to-right and top-to-bottom: - `simulate_intent` is a thin convenience wrapper; it packs its arguments into a `SimRequest` with default overrides and forwards to `simulate_intent_request`. - `bootstrap` is the heavy lifting: parse the TOML, require `intent_version == "v1"`, resolve the spec, find its descriptor in the inventory registry, turn gear into stats, and resolve every spell and aura the spec touches into a `ResolvedGameData` snapshot. Game-data resolution is its own subject, covered under [Data Resolution](/dev/bible/game-data/data-resolution). - The descriptor's factory builds the handler from the resolved data and stats. This is where the [generated spec code](/dev/bible/engine/spec-handlers) meets the runtime. - `run_chunk` then runs the iteration loop, each pass invoking `SimEngine::run` against the shared `EventQueue`, accumulating into one `TelemetryAccumulator`, and finally encoding a `ChunkReport` of protobuf telemetry bytes. The unit of work here is the **chunk**: a `ChunkAssignment` declares some number of iterations, and the chunk runs them sequentially in one thread, reusing one handler, one queue, and one sink across all of them. The chunk is also the atomic unit of distribution. The [orchestration](/dev/bible/distribution/orchestration) and [hosted-compute](/dev/bible/distribution/hosted-compute) pages pick up the story from there. The next page opens the **SimEngine.run** box: the eight event variants, the dispatch loop, the timing wheel, and the deterministic RNG that makes a run reproducible from its seed. ## Event System The run loop is a `while` over a queue: pop the earliest event, advance the clock to it, call the handler method for its variant, drain whatever the handler scheduled, repeat. Every event carries a timestamp; the queue keeps them sorted; the loop never looks ahead. That is the entire executor. This page expands the **SimEngine.run** box of the [sim-pipeline figure](/dev/bible/engine/discrete-event-simulation). I cover the events themselves, the dispatch state machine, the queue that orders them, and the RNG that makes a run reproducible. ## The eight events `Event` is the central type the whole loop is built around. It is `Copy`, matched exhaustively in-workspace, and every variant carries a `t: SimTime` so the queue can sort on it without inspecting the payload. | Variant | Payload | Meaning | | --------------- | ------------------- | ----------------------------------------------------------------- | | `PlayerReady` | `t` | Rotation wake: ask the handler for the next action | | `OffGcdReady` | `t` | Off-GCD rotation wake; dispatched identically to `PlayerReady` | | `CastStart` | `t`, `spell_id` | Cast begins. The loop computes cast time and schedules completion | | `CastComplete` | `t`, `spell_id` | Cast lands. Damage, auras, cost, and cooldown all resolve here | | `AuraTick` | `t`, `aura_id` | Periodic DoT/HoT tick, or a channel tick (channels reuse the id) | | `AuraExpire` | `t`, `aura_id` | Aura expiry check (may be a no-op if the aura was refreshed) | | `CooldownReady` | `t`, `cooldown_key` | A cooldown or charge has recharged | | `AutoAttack` | `t` | A melee swing is due | Because `t` sits in every variant, the queue never has to know which variant it holds. `Event::timestamp` collapses all eight into one match and hands back the `t`: ```rust pub fn timestamp(&self) -> SimTime { match self { Event::PlayerReady { t } | Event::OffGcdReady { t } | Event::CastStart { t, .. } | Event::CastComplete { t, .. } | Event::AuraTick { t, .. } | Event::AuraExpire { t, .. } | Event::CooldownReady { t, .. } | Event::AutoAttack { t } => *t, } } ``` One detail is worth flagging because the code contradicts its own doc comment. The doc on `CastStart` says cooldown and resource cost are "paid here". They are not. In the actual loop, the `CastStart` arm only asks the handler for the cast time and schedules a `CastComplete` at `t + cast_ms`: ```rust let cast_ms = self.handler.cast_time_ms(SpellIdx::from_raw(spell_id)); let complete_time = SimTime::from_millis(t.as_millis().saturating_add(cast_ms)); self.queue.push(Event::CastComplete { t: complete_time, spell_id, }); ``` All of the cost, cooldown, damage, and aura work runs at `CastComplete`, inside [`process_cast`](/dev/bible/engine/cast-pipeline). The doc comment is stale relative to the code, and the code wins. ## The dispatch loop The handler talks back to the loop through `SpecAction`, returned only from `on_player_ready`. It has two cases: ```rust pub enum SpecAction { Cast { spell_id: SpellIdx }, Wait { until_ms: SimTime }, } ``` The loop turns a `Cast` into a `CastStart` event and a `Wait` into a future `PlayerReady`; everything else the handler wants to schedule it pushes through `flush_scheduled`, which the loop drains after every arm.
```mermaid stateDiagram-v2 [*] --> PlayerReady: "on_sim_start; push PlayerReady(0)" PlayerReady --> CastStart: "Cast{spell}; push CastStart" PlayerReady --> PlayerReady: "Wait{until}; push PlayerReady(until)" PlayerReady --> Idle: "action None" CastStart --> CastComplete: "cast_time_ms; push CastComplete(t+cast)" CastComplete --> PlayerReady: "process_cast schedules ready(gcd_end)" CastComplete --> CooldownReady: "start_cooldown" CastComplete --> AuraTick: "apply_aura schedules tick" CastComplete --> AuraExpire: "apply_aura schedules expire" AuraTick --> AuraTick: "reschedule next tick before expiry" AuraExpire --> Removed: "expire_aura_by_id" CooldownReady --> Recharged: "check_recharge" AutoAttack --> AutoAttack: "reschedule next swing" CastComplete --> [*]: "t >= encounter_end" PlayerReady --> Budget: "event count > 500000" Budget --> [*]: "EventBudgetExceeded" ```
This figure expands the **SimEngine.run** box of the [sim-pipeline figure](/dev/bible/engine/discrete-event-simulation). The states are the event variants; the transitions are real scheduling edges: - The loop seeds itself: after `on_sim_start`, it pushes `PlayerReady { t: 0 }` to kick off the rotation. - `PlayerReady` (and the identical `OffGcdReady`) call `on_player_ready`. A `Cast` becomes a `CastStart`; a `Wait` becomes a clamped future `PlayerReady`; `None` schedules nothing. - `CastStart` schedules `CastComplete` at `t + cast_ms`. - `CastComplete` runs `process_cast`, which is where the fan-out happens. It schedules the next `PlayerReady`, starts cooldowns (`CooldownReady`), and applies auras that schedule their own `AuraTick` and `AuraExpire` events. - `AuraTick` reschedules itself against live haste until the next tick would land past expiry; `AutoAttack` reschedules the next swing. There are two terminals. The normal one: a popped event whose `t >= encounter_end_ms` breaks the loop, the fight is over, stop. The failure one: a hard budget of `MAX_EVENTS = 500_000`. If a run processes that many events without finishing, the loop returns `SimRunError::EventBudgetExceeded`. That cap is a guard against a pathological rotation scheduling itself into a tight non-advancing loop; it does not normally fire. ## The timing wheel The queue is the part that earns its keep. With discrete events you push and pop constantly, and both have to respect time order. The obvious structure is a binary heap, O(log n) push and pop, simple to reason about. The engine uses a **timing wheel** instead, trading the heap's clean asymptotics for O(1) amortised push and pop. The standard reference for the structure is Varghese and Lauck.Hashed and Hierarchical Timing Wheels The shape is fixed: 32768 slots, each spanning 32 ms (`WHEEL_SHIFT = 5`, so `1 << 5 = 32`), for a wheel span of about 17.5 minutes. Each slot is the head of an arena-allocated linked list, kept sorted by `(time_ms, seq)`. A 512-word bitmap (one bit per slot) lets the pop path skip empty slots with a `trailing_zeros` scan instead of walking them one at a time.
```mermaid flowchart TB Push["push(event)"] -->|delta < span?| Decide{"in wheel span?"} Decide -->|yes| Slot["insert_into_wheel: slot = (time_ms >> 5) & mask"] Decide -->|no far future| Overflow[(overflow bucket)] Slot -->|alloc_node| Arena[(arena Vec + free list)] Slot -->|set bit| Bitmap["slot_bitmap (512 u64)"] Pop["pop()"] -->|head non-null?| Head{"current slot empty?"} Head -->|no| Emit["pop_from_slot; advance clock"] Head -->|yes| Scan["find_next_slot via bitmap scan"] Scan -->|found| Emit Scan -->|none + overflow non-empty| Rotate["rotate_wheel_base"] Rotate -->|reinsert in-span entries| Slot Rotate --> Overflow Scan -->|none + overflow empty| Done["return None"] ```
This figure expands the **EventQueue** box of the [sim-pipeline figure](/dev/bible/engine/discrete-event-simulation). The mechanics: - **push** computes the timestamp's slot. If the event lands beyond the wheel's span (`time_ms - wheel_base_ms >= WHEEL_SPAN_MS`), it goes into an overflow bucket instead. Otherwise it is inserted into the slot's linked list at the right sorted position, a fast tail append in the common case, a list walk otherwise. Nodes come from an arena with a free list, so steady-state pushing does not allocate. - **pop** reads the current slot's head; if empty it scans the bitmap for the next non-empty slot. If no slot has anything but overflow does, it rotates. - **rotate_wheel_base** advances the wheel base by one full span, resets the slot cursor, and drains the overflow bucket, reinserting every entry that now falls within the new span and leaving the rest in overflow. This is how a 30-minute DoT survives a 17.5-minute wheel: it sits in overflow until rotation brings it into range. Ordering is ascending `time_ms`, FIFO by an insertion `seq` on ties. The FIFO tiebreak is the only thing the wheel adds over a plain heap to make same-millisecond events deterministic, and it matters: two procs firing at the same instant must resolve in a fixed order or the run is not reproducible. The honest cost of this choice: the wheel is bigger and more code than a heap, and its O(1) is amortised, not worst-case. A burst of far-future events all landing in overflow, then a rotation, pays for itself across many ops rather than per-op. For a workload that is overwhelmingly near-future scheduling with the occasional long DoT, that trade is worth it. ## Deterministic RNG A simulation result has to be reproducible from its seed, or you cannot debug it and you cannot trust a regression. The engine uses `SimRng`, a `u64` xorshift64 generator (shifts 13/7/17), seeded by an FNV-1a hash over `seed_base || chunk_id || iteration_index`. The whole generator is one word of state: ```rust #[derive(Debug)] pub struct SimRng { state: u64, } ``` The seed pipeline is split deliberately. `seed_prefix(seed_base, chunk_id)` pre-hashes the per-chunk part once: ```rust pub fn seed_prefix(seed_base: u64, chunk_id: &str) -> u64 { let mut h = FNV_OFFSET; h = fnv1a_bytes(h, &seed_base.to_le_bytes()); h = fnv1a_bytes(h, chunk_id.as_bytes()); h } ``` and `from_prefix(prefix, iteration_index)` finishes it per iteration, so the chunk loop does not re-hash the whole key on every Monte-Carlo pass: ```rust let rng = SimRng::from_prefix(seed_prefix, i); ``` On top of the raw generator sit the stochastic primitives in `crates/engine-sim/src/stochastic.rs`: `proc_chance` (a flat roll), `roll_tier` (cumulative thresholds), `shuffle_pick` (partial Fisher-Yates), and `roll_rppm`, real-procs-per-minute with same-time guarding, a 3.5-second elapsed cap, and bad-luck protection that ramps the chance up the longer a proc has gone without firing. [Procs](/dev/bible/engine/procs) covers the RPPM model in depth. One subtlety the code clears up: `SimEngine` holds a `SimRng` field, but the loop only touches it with a `let _ = &mut self.rng` at `CastComplete`. The RNG that actually drives combat rolls is the handler's own `SimRng`, reseeded per iteration. The engine-held one is effectively vestigial. With the clock, the queue, and the RNG in place, the only remaining question is what the handler does when asked for the next action. That answer comes from the [rotation compiler](/dev/bible/engine/rotation-compiler). ## Rotation Compiler When the loop asks the handler "what now?", the answer comes from a rotation: a priority list of actions with conditions, authored as JSON and compiled once at bootstrap. At runtime the handler calls `evaluate(&mut buffer, now_secs)` and gets back a single `EvalResult`: cast this spell, wait this long, pool to this resource level. That is the contract. Everything below is how it is honoured fast and identically across two backends. This page expands the **Engine** box of the system-context diagram (see [Architecture](/dev/bible/overview/architecture)) along a different axis than the [event system](/dev/bible/engine/event-system): the rotation execution path rather than the clock. ## Two backends, one lowerer There are two ways to evaluate a rotation, and the engine ships both. A JIT backend compiles the rotation to native machine code through LLVM (via the `inkwell` crate). An interpreter backend runs the same logic without codegen. They are selected at compile time by the `jit` feature, with one runtime override: - Native builds with the `jit` feature use the JIT. The published figure for a JIT evaluation is on the order of 1.5 ns, because the rotation collapses to straight-line native code reading a flat buffer. - WASM has no LLVM, so browser builds always use the interpreter. This is not a fallback we are ashamed of. It is the only option in the sandbox, and it keeps the in-browser preview honest about the same rotation logic. - Attaching a decision-trace sink forces the interpreter even on native, because the JIT cannot record per-decision traces. `set_decision_trace` on a JIT engine recompiles it through the interpreter on the spot. The thing that makes two backends maintainable is that they are not two implementations. Both call the **same** `lower::lower_rotation`. The lowerer is generic over a `RotationBackend` trait whose associated types are the backend's notion of a boolean, integer, and float: ```rust pub trait RotationBackend { type Bool: Copy; type Int: Copy; type Float: Copy; ``` The trait's methods are the primitive operations the lowerer composes: load a field, compare, add, branch, return a cast. The JIT implements those primitives by emitting LLVM IR; the interpreter implements them by computing values directly. The priority-list logic, which condition gates which action and what counts as a terminator, is written exactly once. A cross-backend parity test asserts the two produce identical results. The interpreter is worth being precise about because it is not what the name suggests. It is **not** a separate AST walker. Its `evaluate` constructs an `InterpBackend` over the buffer and runs the very same `lower_rotation`: ```rust let mut backend = InterpBackend::new(buffer, now_secs); lower::lower_rotation( &mut backend, &self.rotation, &self.schema, &self.resolver, &self.table, ); backend.finish() ``` The only difference from the JIT is that the primitive ops compute instead of emit. "Interpreter" here means "the lowerer driven eagerly," not "a second engine."
```mermaid flowchart TB JSON["rotation JSON"] -->|parse_and_validate| AST["Rotation AST"] AST -->|lower::prepare| Schema["ContextSchema + DescriptorTable"] Schema --> Decide{"jit feature? trace sink?"} Decide -->|jit, no trace| JitPath["JitBackend: build LLVM module"] Decide -->|wasm or trace sink| InterpPath["InterpBackend"] JitPath -->|lower_rotation SHARED| Lower["lower::lower_rotation"] InterpPath -->|lower_rotation SHARED| Lower Lower -->|JIT path| Verify["module.verify + JIT engine (Aggressive)"] Verify -->|raw fn ptr| JitEval["evaluate: call fn -> packed u64 -> decode"] Lower -->|interp path| InterpEval["evaluate: InterpBackend.finish()"] JitEval --> Result["EvalResult"] InterpEval --> Result Buffer[(DenseBuffer)] -.->|read at evaluate now| JitEval Buffer -.->|read at evaluate now| InterpEval ```
This figure expands the **Engine** box of the system-context diagram (see [Architecture](/dev/bible/overview/architecture)). The pipeline: - `parse_and_validate` is serde plus a validation pass over the action tree. - `lower::prepare` builds the `DescriptorTable`, registers every field the rotation reads and every user variable into a `SchemaBuilder`, validates that referenced resources exist, and returns the `ContextSchema` that fixes the buffer layout. - The decision node picks a backend, but both arrows converge on the **same** `lower_rotation`. The JIT arm then verifies the module and creates an execution engine at `OptimizationLevel::Aggressive`, grabbing the raw function pointer; the interpreter arm just stores the rotation and table. - At runtime, `evaluate(buffer, now)` either calls the native function (which returns a packed `u64` decoded into an `EvalResult`) or runs the interpreter to `finish()`. A note on the road not taken: the obvious alternative JIT backend in the Rust ecosystem is Cranelift, which is simpler to embed and compiles faster. The engine uses LLVM through `inkwell` instead. It is slower to compile but better at optimising the kind of branch-heavy, read-only numeric code a rotation lowers to, and the rotation is compiled once per sim and then evaluated millions of times, so compile time is amortised to nothing. ## The EvalResult ABI `EvalResult` is the value the rotation returns and the handler acts on. The in-memory form is a `#[repr(C)]` 12-byte triple, with a `const_assert_eq!` pinning the size at 12: ```rust #[derive(Debug, Clone, Copy, PartialEq)] #[repr(C)] pub struct EvalResult { pub kind: u8, /// Spell ID for `KIND_CAST`, or gear-slot repr for `KIND_USE_ITEM`. pub spell_id: u32, /// Wait seconds for `KIND_WAIT`, pool target for `KIND_POOL`. pub wait_time: f32, } ``` The `kind` byte is one of five constants: `KIND_NONE`, `KIND_CAST`, `KIND_WAIT`, `KIND_POOL`, `KIND_USE_ITEM`. The `spell_id` and `wait_time` fields mean different things per kind, which is why their doc comments hedge: for a use-item result `spell_id` carries the `GearSlot` repr instead of a spell id, and for a pool result `wait_time` is the target level. The JIT does not return that struct directly. A native function returns a scalar, so the rotation function returns a `u64` with the same three fields bit-packed: `[kind:8][spell_id:24][wait_time:32]`. `pack_eval_result` builds it; `decode_eval_result` is the inverse, called right after the JIT call: ```rust pub fn decode_eval_result(packed: u64) -> (u8, u32, f32) { // #t(block: lossy_cast, magic_numbers) packed u64 bit extraction. let kind = (packed >> KIND_SHIFT) as u8; let spell_id = ((packed >> SPELL_ID_SHIFT) & SPELL_ID_MASK) as u32; let wait_time = f32::from_bits(packed as u32); (kind, spell_id, wait_time) } ``` Note the packed spell id is 24 bits, narrower than the struct's `u32`. That is fine for live spell ids, and it is the only place the two representations differ. This packing lives in `buffer-contract`, the crate that holds the ABI both the lowerer and the backends agree on. ## The dense buffer The rotation reads game state, and how that state is laid out is the difference between a 1.5 ns evaluation and a slow one. State lives in a `DenseBuffer`: one contiguous `Vec`, where `SlotChunk` is a `repr(align(8))` 8-byte newtype, so the buffer is 8-byte-aligned where a plain `Vec` would not be. The buffer is viewed as raw bytes and divided into slots, read and written through raw pointer casts. No hash lookups during evaluation, no boxing, no indirection. The rotation function is handed a `*mut u8` and a known set of byte offsets, and it loads `f64`s and `i32`s straight out.
```mermaid flowchart TB Buf[(DenseBuffer: Vec of SlotChunk, 8-aligned)] --> Offsets["BufferOffsets"] Offsets --> Singletons["singletons: player, combat, pet"] Offsets --> Keyed["keyed maps"] Offsets --> Standalone["standalone: user rotation variables"] Keyed --> Cd["cooldowns by SpellIdx"] Keyed --> Aura["auras by AuraKey"] Keyed --> Res["resources by ResourceType"] Keyed --> Spell["spells by SpellIdx"] Keyed --> Hist["history by SpellIdx"] Keyed --> Unit["units by role"] Keyed --> Swing["swings by hand"] Singletons -->|repr C 8-aligned slot| SlotShape["slot at byte offset, raw pointer cast"] Keyed -->|repr C 8-aligned slot| SlotShape SlotShape -->|inventory FieldDescriptor| FD["(domain, name) -> offset, EvalKind, FieldType"] FD -->|lower::prepare reads| Lowerer["lower_rotation field loads"] ```
This figure expands the **DenseBuffer** box of the [rotation-compile figure](/dev/bible/engine/rotation-compiler) above. The model has three families of slot: - **Singletons**, one each: `player`, `combat`, `pet`. The player slot holds GCD end, cast/channel end, haste, crit, mastery, attack power, level, and the boolean state flags (moving, alive, in combat, stealthed). - **Keyed maps**, many of each, indexed by an integer or string key: cooldowns and spells and history by `SpellIdx`, auras by `AuraKey`, resources by `ResourceType`, units by role string, swings by hand. Each key maps to a byte offset into the same buffer. - **Standalone**, the user's own rotation variables, defaulted on reset. Every slot is a `repr(C)` struct generated by a `define_slot!` macro that also emits its field offsets, its size, and a set of `FieldDescriptor`s collected through the `inventory` crate. A `FieldDescriptor` records `(domain, name) -> (field_offset, eval_kind, field_type)`. That descriptor table is exactly what `lower::prepare` consults to turn a rotation's `Read { field: "cooldown.fireball.remaining" }` into a concrete byte load with the right `EvalKind`. `EvalKind` is where the buffer's expressiveness lives. One stored field can expose several named rotation expressions with different evaluation semantics: ```rust pub enum EvalKind { Direct, TimestampReady, TimestampRemaining, TimestampActive, TimestampElapsed, TimestampInactive, /// Field `ready_at` (f64 +0); reads `current_charges` (i32 +16), `max_charges` (i32 +20). CooldownReady, /// Field `expires_at` (f64 +0); reads `base_duration` (f64 +8); active iff `remaining < 0.3 * base_duration`. AuraRefreshable, PositiveFloat, /// Field `current` (f64 +0), `max` (+8). ResourceDeficit, /// Field `current` (f64 +0), `max` (+8). ResourcePct, /// Field `current` (f64 +0), `max` (+8). ResourceDeficitPct, /// Field `current` (+0), `max` (+8), `regen` (+16). ResourceTimeToMax, /// Field `health` (+0), `max_health` (+8). UnitHealthPct, /// Field `health` (+0), `max_health` (+8). UnitHealthDeficit, /// Cross-slot spell usability; extra offsets resolved at JIT compile time via a side-table. SpellUsable, } ``` An aura slot stores a single `expires_at` deadline, but the rotation can ask for `is_active` (`TimestampActive`), `remaining` (`TimestampRemaining`), `elapsed` (`TimestampElapsed`), or `is_refreshable` (`AuraRefreshable`, true when `remaining < 0.3 * base_duration`). A resource slot stores `current`, `max`, and `regen`, and the rotation reads `deficit`, `pct`, `deficit_pct`, or `time_to_max` off them. The deadline-and-now arithmetic happens at evaluation; the buffer stores only the raw state. Two correctness guards keep this honest. At compile time, every slot's declared layout is checked against its actual `repr(C)` layout by `assert_repr_c_layout`, which recomputes offsets, size, and alignment from the field list, asserts they match, and rejects any field whose alignment exceeds the slot's 8-byte alignment. At runtime, the slot accessor macros assert pointer alignment before the cast. Misalignment is undefined behaviour in release, so the macro uses a plain `assert!` (not `debug_assert!`) that fires in release builds too. The buffer is fast because it is flat and unsafe; it is correct because the contract is verified at the boundary. The buffer is what the combat system reads and writes during a fight. The next page, the [cast pipeline](/dev/bible/engine/cast-pipeline), is what actually mutates those slots when a spell lands. ## Cast Pipeline When a `CastComplete` event pops, the handler runs `process_cast`. This is the function that turns "the cast finished" into all of its consequences: the resource is spent, the cooldown starts, the damage is dealt, the aura is applied, post-cast hooks fire. It runs once per landed cast, top to bottom, no surprises. The very first thing it does is refuse to trust its input. An unknown spell id is logged and dropped, never a panic: ```rust let Some((spell_local, spell_ref)) = ctx.state.spell_data(spell_id) else { tracing::warn!(spell_id, "UNKNOWN_SPELL_CAST"); return; }; ``` Despite the doc comment on `CastStart` claiming cost and cooldown are "paid" at cast start, none of that happens until here at `CastComplete`. The `CastStart` arm of the loop only schedules this completion; this is where the spell actually does anything. ## The thirteen steps The pipeline is deliberately linear, a sequence of steps, not a graph, which makes it readable and makes the per-step attribution exact.
```mermaid flowchart TB Start["process_cast(spell_id)"] --> Lookup{"spell_data found?"} Lookup -->|no| Warn["warn UNKNOWN_SPELL_CAST; return"] Lookup -->|yes| Emit["emit_cast telemetry (gcd_ms)"] Emit --> Res["process_resources: spend + gain"] Res --> Chan{"is_channel?"} Chan -->|yes| Channel["process_channel_cast: schedule N AuraTicks + PlayerReady; return"] Chan -->|no| Ready["schedule PlayerReady(max(gcd_end, now))"] Ready --> Cd{"has_cooldown?"} Cd -->|yes| StartCd["start_cooldown + emit_cooldown_start"] Cd -->|no| Dmg StartCd --> Dmg["match damage: None / Flat / Ap / Sp -> deal_damage"] Dmg --> Aura{"applies_aura?"} Aura -->|yes| ApplyAura["apply_aura"] Aura -->|no| Cdr ApplyAura --> Cdr["process_cdr_effects"] Cdr --> Hook["fire_cast_hook"] Hook --> PlayerHook["fire_player_cast_hooks"] PlayerHook --> Stealth{"breaks_stealth?"} Stealth -->|yes| Break["break_stealth_if_active"] Stealth -->|no| Hist Break --> Hist["record last_used + update_history"] ```
This figure expands the **SimEngine.run** box of the [sim-pipeline figure](/dev/bible/engine/discrete-event-simulation). Specifically, it is the `on_cast_complete` callback the loop makes there. Step by step, with call sites: 1. **Lookup.** Fetch the spell's static `SpellData` by id. If it is missing, warn `UNKNOWN_SPELL_CAST` and return: a cast for an unknown spell is a no-op, not a panic. 2. **Cast telemetry.** Compute the effective GCD and emit a cast event carrying it. This is the GCD's only role in `process_cast`. It is reported, not enforced here, because the GCD gate already happened in `on_player_ready` before the cast was returned. 3. **Resources.** `process_resources` spends the primary and secondary cost and applies any energise gain, detailed below. 4. **Channel branch.** If the spell is a channel, `process_channel_cast` schedules the channel's ticks and the final rotation wake, then returns early. Channels do not run the rest of this pipeline the same way. 5. **Schedule the next wake.** Push a `PlayerReady` at `max(gcd_end, now)` so the rotation is asked for its next action when the GCD clears. 6. **Cooldown.** If the spell has a cooldown, start it and emit a cooldown-start event. 7. **Damage.** Match on the spell's `DamageDef`: `None` does nothing; `Flat` emits a fixed-amount damage event and fires impact procs; `ApCoefficient` and `SpCoefficient` route to `deal_damage_ap` / `deal_damage_sp`, which run the full [damage formula](/dev/bible/engine/combat-formulas). 8. **Aura.** If the spell applies an aura, `apply_aura` runs the [aura state machine](/dev/bible/engine/auras): fresh apply, pandemic refresh, or snapshot. 9. **Cooldown reduction.** `process_cdr_effects` walks the spell's CDR effects, resolving each condition (`Always`, `ProcChance`, `WhileAuraActive`, `ResetWhileAura`) and reducing or resetting a target cooldown. 10. **Cast hook.** `fire_cast_hook` runs the spell-specific post-cast hook, if any. 11. **Player cast hooks.** `fire_player_cast_hooks` runs every registered global cast hook. 12. **Break stealth.** If the spell breaks stealth, expire the stealth aura. 13. **Record.** Stamp the spell's `last_used` and update the history slot's prev-GCD flags so the rotation can reason about what was cast last. The order is not arbitrary. Resources spend before damage so a starved cast still pays its cost. The cooldown starts before damage so a cooldown-reducing impact proc cannot reduce a cooldown that has not begun. Hooks fire after damage and auras so they observe the post-cast state. The history update is last so it reflects a completed cast. ## Resource accounting The resource step has more nuance than "subtract the cost." `process_resources` calls `process_single_resource` twice, once for the primary resource and once for the secondary. For each, if there is a cost it is spent and a `Spend` event emitted. If there is a gain it is granted and a `Gain { wasted }` event emitted, where `wasted` is the overflow past the resource cap. Two exceptions change the primary cost before the spend, not after. A cost-bypass aura zeroes the cost while it is active, the mechanic behind "your next spell is free" procs. And a channel pays per tick rather than up front, so its per-cast primary cost is zero. Both checks live at the top of `process_resources`: ```rust fn process_resources(ctx: &mut CombatCtx<'_>, spell: &crate::state::SpellData, now: SimTime) { let bypass = spell.cost_bypass_aura_id != 0 && ctx .state .is_named_aura_active(ctx.buf, spell.cost_bypass_aura_id); let primary_cost = if bypass || (spell.is_channel && spell.channel_tick_cost > 0.0) { 0.0 } else { spell.resource_cost }; process_single_resource(ctx, primary_cost, spell.resource_gain, false, now); process_single_resource( ctx, spell.secondary_resource_cost, spell.secondary_resource_gain, true, now, ); } ``` The handler also keeps the resource current with the clock. Before evaluating the rotation, `on_player_ready` calls `sync_resource`, which regenerates the primary resource up to `now`, scaling regen by haste for resources that haste affects. Resource regeneration is continuous in the game but the sim only needs the value at decision points, so it lazily catches up the resource at each wake instead of scheduling a tick for every point of energy. This is the same idea as the discrete-event loop itself: compute state when it is read, not on a fixed grid. The remaining mechanics, the [damage multiplier chain](/dev/bible/engine/combat-formulas), the [aura lifecycle](/dev/bible/engine/auras), [procs](/dev/bible/engine/procs), and [resources](/dev/bible/engine/resources), are the subject of the following pages. What ties them together is covered under [spec handlers](/dev/bible/engine/spec-handlers): how a generated spec becomes the `SpecHandler` this pipeline lives inside. ## Combat Formulas Every damage number in the engine comes out of one function: `DamageCalc::calculate`. It takes the spell's base amount and a handful of stat inputs and walks them through a fixed multiplier chain: weapon roll, raw damage, crit, versatility, armor, then the situational multipliers. The order matters, and the engine commits to one. This is the [cast pipeline](/dev/bible/engine/cast-pipeline)'s damage step, zoomed in. The figure below expands the **SimEngine.run** box of the simulation pipeline (the Zoom-1 `sim-pipeline` figure); concretely it is what `process_cast` reaches when it hits the damage branch.
```mermaid flowchart TB Start["DamageCalc::calculate(rng)"] --> Weapon["weapon_roll = min + rng()*(max-min)"] Weapon -->|rng call 1 only if weapon_max>0| Raw["raw = base + weapon_roll*weapon_mult + coef*attack_power"] Raw --> Crit["is_crit = rng() < crit_chance"] Crit -->|rng call 2| AfterCrit["after_crit = raw * (is_crit ? crit_mult : 1.0)"] AfterCrit --> Vers["after_vers = after_crit * (1 + versatility/100)"] Vers --> Armor["after_armor = after_vers * armor_mitigation(armor, K)"] Armor --> Mult["final = (after_armor * damage_mult * mastery_mult).max(0)"] Mult --> Result["DamageResult#123;raw, is_crit, final_amount#125;"] ```
## The chain, step by step `DamageCalc` is a plain struct of inputs. Everything the formula needs is a field on it: ```rust #[derive(Debug, Clone)] pub struct DamageCalc { pub base: f64, pub coefficient: f64, pub attack_power: f64, pub crit_chance: f64, pub crit_multiplier: f64, pub versatility: f64, pub target_armor: f64, pub armor_k: f64, pub damage_multiplier: f64, pub mastery_mult: f64, pub weapon_min: f64, pub weapon_max: f64, pub weapon_multiplier: f64, pub school: DamageSchool, } ``` `calculate(rng)` consumes them in this exact order, drawing from the RNG at most twice: 1. **Weapon roll.** `weapon_roll = weapon_min + rng()*(weapon_max - weapon_min)`, but only when `weapon_max > 0`. For a spell with no weapon component the roll is skipped entirely, including the RNG call, so the first random draw belongs to crit instead. 2. **Raw.** `raw = base + weapon_roll*weapon_multiplier + coefficient*attack_power`. This folds the flat base, the rolled weapon contribution, and the attack-power-scaled portion into one number before any multiplier touches it. 3. **Crit.** `is_crit = rng() < crit_chance.clamp(0,1)`; the factor is `crit_multiplier` on a crit, else `1.0`. The default crit multiplier is `2.0`. 4. **Crit applied.** `after_crit = raw * crit_factor`. 5. **Versatility.** `after_vers = after_crit * (1 + versatility/100)`. 6. **Armor.** `after_armor = after_vers * armor_mitigation(target_armor, armor_k)`. 7. **Multipliers.** `final_amount = (after_armor * damage_multiplier * mastery_mult).max(0)`, where the floor at zero is the only clamp on the result. The output is a `DamageResult { raw, is_crit, final_amount }`. Note that crit, versatility, and the late multipliers are all multiplicative against the same `raw`; there is no additive bucketing here. That is a simplification, since real WoW splits modifiers into additive and multiplicative buckets, but for the spells the engine models it keeps the formula auditable. ## Armor mitigation Armor only applies to physical damage, and it uses the standard ratio: ``` armor_mitigation(armor, K) = 1 - armor / (armor + K) ``` In code that ratio is clamped to `[0, 1]` and short-circuits to `1.0`, meaning no mitigation, when armor is non-positive: ```rust pub fn armor_mitigation(armor: f64, armor_k: f64) -> f64 { if armor <= 0.0 { return 1.0; } let mitigated = armor / (armor + armor_k); (1.0 - mitigated.clamp(0.0, 1.0)).max(0.0) } ``` The constant `K` is the armor coefficient for the target's level, supplied as `armor_k = game_data.armor_k()`, which is `armor_constant * armor_constant_mod` resolved from the expected-stats table. Non-physical schools skip the armor term: `prepare_damage_setup` only reads target armor for physical hits. ## Where the inputs come from `DamageCalc` is assembled in `prepare_damage_setup`, which reads the player's crit and versatility, the target armor for physical hits, the spell's base points, and then two aggregates: the buff totals and the mastery multiplier. ### Buff totals Active auras contribute their stat and damage modifiers through `accumulate_buffs`, which sums every active aura's `BuffEffect`s, each scaled by its current stack count, into a single `BuffTotals`. The `BuffEffect` enum is the vocabulary of what an aura can change: | Variant | Effect | | ---------------------------------------- | ----------------------------------------- | | `Haste(f64)` | additive haste percent | | `Crit(f64)` | additive crit percent | | `Mastery(f64)` | additive mastery percent | | `Versatility(f64)` | additive versatility percent | | `PrimaryStat(f64)` | additive primary stat | | `DamageMult(f64)` | flat damage multiplier on all damage | | `DamageMultSchool(f64, DamageSchool)` | damage multiplier scoped to one school | | `DamageMultSpells(f64, &[u32])` | damage multiplier scoped to a spell list | | `Cleave(f64, u8)` | extra cleave hits at a fraction of damage | | `DamageMultStacking{initial, per_stack}` | multiplier that grows per stack | `BuffEffect` is `#[non_exhaustive]`, and its `scaled(stacks)` method is where the stack count actually applies. Additive stats scale linearly; the `DamageMult*` family compounds via `powf(stacks)` instead: ```rust #[inline] pub fn scaled(self, stacks: f64) -> Self { match self { Self::Haste(v) => Self::Haste(v * stacks), Self::Crit(v) => Self::Crit(v * stacks), Self::Mastery(v) => Self::Mastery(v * stacks), Self::Versatility(v) => Self::Versatility(v * stacks), Self::PrimaryStat(v) => Self::PrimaryStat(v * stacks), Self::DamageMult(v) => Self::DamageMult(v.powf(stacks)), Self::DamageMultSchool(v, school) => Self::DamageMultSchool(v.powf(stacks), school), Self::DamageMultSpells(v, spells) => Self::DamageMultSpells(v.powf(stacks), spells), Self::Cleave(frac, targets) => Self::Cleave(frac, targets), Self::DamageMultStacking { initial, per_stack } => { Self::DamageMult(1.0 + initial + per_stack * (stacks - 1.0)) } } } ``` `BuffTotals` also keeps a per-school multiplier array `school_damage_mult: [f64; 8]` so school-scoped buffs land on the right hits. ### Mastery Mastery is not one formula. It is per-spec, so the engine carries a `mastery_category` on `CombatState` and applies it in two shapes: - `MasteryCategory::UniformMult`: mastery multiplies all of the spec's damage equally. - `MasteryCategory::SchoolMult`: mastery multiplies only a specific school. The category is set at build time from the manifest (`mastery_category(params.mastery.category)`), and the resolved mastery percent comes from the stat recompute, scaled by the spec's mastery coefficient. This is deliberately coarse: most specs in WoW have a bespoke mastery, and the engine only models the two that fit a multiplier. Specs whose mastery does something structurally different (e.g. adds a proc, changes resource generation) need a hook, not a category. ## Dealing the hit `DamageCalc::calculate` is the arithmetic; `deal_damage` is the orchestration around it. For a single-target cast it builds one `DamageCalc`, calls `.calculate(rng)`, adds the result to `state.total_damage`, emits a `DamageEvent` to the [telemetry sink](/dev/bible/engine/metrics), and fires impact procs via `fire_impact_procs`. When the spell is flagged AoE it loops `run_single_hit` per target with a per-target multiplier (split, chain, or square-root falloff), and a single-target physical hit can still cleave extra hits when a `Cleave` buff is active. Snapshot DoTs are the exception: `deal_damage_with_snapshot` feeds `DamageCalc` the AP/SP/crit/vers/mastery captured when the DoT was applied instead of the live stats. That mechanism is the subject of the [auras](/dev/bible/engine/auras) page. ## Auras An aura is a timed effect on the player or target: a buff, a debuff, or a damage-over-time. Applying one is more than setting a flag. The engine has to decide whether this is a fresh application or a refresh, carry over the right amount of remaining duration, schedule periodic ticks, and queue an expiry. All of that lives in one function, `apply_aura`. This figure expands the **SimEngine.run** box of the [simulation pipeline](/dev/bible/engine/discrete-event-simulation) (the Zoom-1 `sim-pipeline` figure): it is the state an aura moves through when `process_cast` reaches its aura step, and on every later `AuraTick`/`AuraExpire` event the loop delivers.
```mermaid stateDiagram-v2 [*] --> Inactive Inactive --> Active: "apply_aura fresh, sets expires_at and stacks, emits Apply" Active --> Active: "apply_aura refresh, carries up to 30 percent of base, bumps stacks, emits Refresh" Active --> Active: "AuraTick runs apply_periodic_effect, reschedules next tick before expiry" Active --> Snapshotted: "is_snapshot captures AP, SP, crit, vers, mastery, damage mult" Snapshotted --> Snapshotted: "refresh keeps the higher damage mult, a rolling DoT" Snapshotted --> Active: "tick runs deal_damage_with_snapshot" Active --> Inactive: "AuraExpire removes the AuraSlot, emits Expire" Snapshotted --> Inactive: "AuraExpire" ```
## What an aura is, statically An aura's immutable definition is `AuraData`, a `Copy` struct: an `aura_id`, where it lands (`on: AuraOn`, player or target), a `base_duration_ms`, a `max_stacks`, a `pandemic` flag, up to four `BuffEffect`s, an optional `PeriodicData`, an optional spell-group membership, and an `is_snapshot` flag. ```rust #[derive(Copy, Clone, Debug)] pub struct AuraData { pub aura_id: u32, pub name_idx: u16, pub on: AuraOn, pub base_duration_ms: u32, pub max_stacks: u8, pub pandemic: bool, pub effects: [Option; MAX_AURA_BUFF_EFFECTS], pub periodic: Option, pub spell_group: Option<(u8, SpellGroupRule)>, pub is_snapshot: bool, } ``` The runtime side, `expires_at`, current `stacks`, next tick time, and captured snapshot stats, lives in the `AuraSlot` of the [DenseBuffer](/dev/bible/engine/rotation-compiler), keyed by aura. The split is deliberate: the static `AuraData` is shared and never mutated; only the per-slot buffer state changes during a sim. The numeric properties (duration, max stacks, tick period, pandemic eligibility) are resolved from game data into `AuraProps` at bootstrap and folded into the static defs by the generated builder code, so the manifest only needs to override what differs from the DBC. ## Fresh application vs refresh `apply_aura` is the whole state machine. The first branch is spell-group exclusivity (below); after that it splits on whether the aura is already active. A permanent aura, `base_duration_ms == 0`, gets a sentinel far-future expiry from `PERMANENT_AURA_EXPIRES_AT_S` and never schedules an expire event. Otherwise: - **Fresh** (slot inactive): set `expires_at = now + base_duration`, `stacks = 1`, and emit an `Apply` event to telemetry. - **Refresh** (slot already active): this is where pandemic applies. ### Pandemic When you refresh a DoT that still has time left, WoW lets you carry over a slice of the remaining duration instead of clipping it. The engine models this as a 30% carry: ``` new_expires_at = now + base_duration + min(remaining, base_duration * 0.30) ``` In the refresh branch that is `PANDEMIC_CARRY_PERCENT = 30`, clamped against the time actually remaining: ```rust let carry_ms = if aura.pandemic { remaining_ms.min(base_dur_ms * PANDEMIC_CARRY_PERCENT / HUNDRED_U32) } else { 0 }; let new_expires = now .saturating_add(base_dur) .saturating_add(SimTime::from_millis(carry_ms)); a.expires_at = new_expires.as_secs_f64(); ``` On refresh the engine also bumps the stack count (up to `max_stacks`) and emits a `Refresh` event. The same 0.3 threshold is what the rotation compiler exposes to scripts as the `is_refreshable` field via `AURA_PANDEMIC_THRESHOLD`, so a rotation can ask "is this DoT inside its pandemic window" and get an answer consistent with how the engine will actually carry duration. ## Periodic ticks If the aura carries a `PeriodicData`, `apply_aura` schedules the first `AuraTick` event, with the first interval scaled by current haste and quantized onto the millisecond grid `SimTime` uses. Each tick is handled by `process_single_aura_tick`, which applies the periodic effect and then reschedules the next tick re-scaled against live haste, but only if the next tick lands before the aura expires. The effect itself is one of: | Variant | Per-tick effect | | ------------------------------ | -------------------------- | | `DamageAp{coef, is_physical}` | attack-power-scaled damage | | `DamageSp{coef, is_physical}` | spell-power-scaled damage | | `ResourceGain{amount}` | grant resource | | `ResourceDrain{amount}` | drain resource | | `ApplyAura{target_aura_local}` | apply another aura | `PeriodicData` pairs one of those `PeriodicKind` variants with a `tick_interval_ms`. Re-reading haste every tick means a haste buff gained mid-DoT speeds up the remaining ticks, which matches how hasted periodic effects behave in game. ## Snapshot DoTs Some DoTs lock in the stats present when they were applied and use those for every tick, ignoring later stat changes. For an aura with `is_snapshot` set, `apply_aura` calls `capture_snapshot` and stores the current AP, SP, crit, versatility, mastery, and damage multiplier into the aura slot. Ticks then route through `deal_damage_with_snapshot`, which feeds those captured values to `DamageCalc` instead of the live ones. See [combat formulas](/dev/bible/engine/combat-formulas). Refreshing a snapshot DoT does not blindly overwrite: the engine keeps the **higher** of the old and new damage multiplier, modelling the "rolling periodic" rule where you don't want to downgrade a strong snapshot by refreshing during a weaker window. ## Spell-group exclusivity A handful of auras are mutually exclusive: applying one must remove the others in its group. Before doing anything else, `apply_aura` runs `expire_conflicting_auras` and `sync_exclusive_group_enabled`. The group rule type, `SpellGroupRule`, currently has exactly one variant, the only grouping behaviour the engine needs so far: ```rust #[derive(Copy, Clone, Debug)] #[non_exhaustive] pub enum SpellGroupRule { Exclusive, } ``` It is left as an enum so other rules can be added without reworking call sites. ## Expiry Applying a (non-permanent) aura schedules an `AuraExpire` event at `expires_at`. Because a later refresh can push `expires_at` out, the handler re-checks the deadline when the expire event fires: if the aura was refreshed past the original expiry, the stale event is ignored and the real one is already queued. When it does expire, `expire_aura_by_id` calls `AuraSlot::remove()` and emits an `Expire` event. This refresh-aware expiry check is the reason the engine can keep stale expire events in the queue cheaply rather than hunting them down and cancelling them. ## Procs A proc is a random effect that fires off some trigger: a cast, a damage impact, a tick. The engine models two flavours: flat-chance rolls (a fixed probability per trigger) and RPPM (real procs per minute), where the chance scales with how long it has been since the last attempt so that, on average, the proc fires a target number of times per minute regardless of attack speed. All of the random sampling lives in one file, `crates/engine-sim/src/stochastic.rs`, layered on the deterministic [`SimRng`](/dev/bible/engine/event-system). Every primitive takes the RNG as `rng: &mut dyn FnMut() -> f64` so combat code can pass its own seeded generator. ## Flat-chance procs The simplest case is `proc_chance(rng, chance)`: one draw, `rng() < chance`. There is also `roll_tier(rng, thresholds)` for tiered outcomes, which returns the index of the first ascending cumulative threshold the roll falls under, and `shuffle_pick`, a partial Fisher-Yates used to pick N random items. These are the building blocks. The interesting one is RPPM. ## RPPM Each RPPM source has an `RppmTracker` holding its rate (`rppm`), the time of the last attempt and last successful proc, an accumulator for bad-luck protection (`accumulated_blp`), and two flags: `haste_scales` and `blp_enabled`: ```rust #[derive(Copy, Clone, Debug)] pub struct RppmTracker { pub rppm: f64, pub last_attempt_time: f64, pub last_proc_time: f64, pub accumulated_blp: f64, pub haste_scales: bool, pub blp_enabled: bool, } ``` `roll_rppm(tracker, now, haste_pct, rng)` does the work, and it is worth reading in order because each piece corrects for a real failure mode: 1. **Same-time guard.** If this attempt is within `SAME_TIME_TOLERANCE_S = 0.001` of the last one, it returns `false` without rolling. Two events landing at the same instant must not double-roll the same proc. 2. **Elapsed, capped.** `elapsed = (now - last_attempt).max(0)`, then capped at `MAX_INTERVAL_S = 3.5`. The cap stops a long gap (the pull, say, or a movement break) from handing out a near-guaranteed proc on the next attempt. 3. **Haste scaling.** When `haste_scales` is set, `haste_factor = 1 + haste_pct/100`, otherwise `1.0`. This is what makes "per minute" hold as attack speed rises. Faster attacks mean more attempts, so each attempt's chance is scaled up by haste to keep the rate constant. 4. **Base chance.** `base_chance = rppm * haste_factor * (elapsed / 60)`, the rate per minute converted to a probability for this interval. 5. **Bad Luck Protection.** When enabled and the effective rate is positive, the longer you go without a proc, the higher the chance climbs. With `expected_interval = 60 / real_ppm` and `accumulated = min(accumulated_blp, MAX_BAD_LUCK_PROT_S)`: ``` factor = max(1, 1 + (accumulated/expected_interval - 1.5) * 3) chance = clamp(base_chance * factor, 0, 1) ``` The `1.5` and `3.0` constants match the established SimulationCraft BLP factor, per the in-code note. The BLP cap is `MAX_BAD_LUCK_PROT_S = 1000`. 6. **Roll and reset.** `success = rng() < chance`, `last_attempt_time` always advances, and on success `last_proc_time = now` and `accumulated_blp` resets to zero. The accumulator only grows between procs, so BLP ramps and then snaps back. A few candid notes. There is no explicit internal-cooldown (ICD) field on `RppmTracker`. The same-time guard plus the `MAX_INTERVAL_S` cap are the only time-based limiters, so an ICD'd proc would need to be modelled separately. And BLP here is the standard SimC formula, not Blizzard's exact (undocumented) implementation; it is a faithful reproduction of community-reverse-engineered behaviour, which is the best available reference. RPPM trackers are registered at build time. Item procs use `register_item_rppm`, which delegates to `rppm` to register the tracker, then indexes it by item id so the generated item code can look it up: ```rust pub fn register_item_rppm(mut self, item_id: u32, rppm: f64, haste_scales: bool) -> Self { let idx = self.rppm(rppm, haste_scales); self.item_rppm_indices.insert(item_id, idx); self } ``` ## Impact procs Many procs trigger on a damage impact rather than a cast. Those go through `fire_impact_procs`, called at the end of every `deal_damage` and every periodic tick. An `ImpactProc` carries a `chance`, the function to run, and a set of filters that decide whether this particular hit is eligible: | Field | Meaning | | --------------- | ------------------------------------------------------------------ | | `chance` | flat per-eligible-impact proc probability | | `fire` | the `ImpactProcFn` run on a successful roll | | `spell_filter` | optional `fn(u32) -> bool` restricting which spells can trigger it | | `periodic_only` | only periodic (DoT/HoT) impacts are eligible | | `skip_periodic` | periodic impacts are ignored | | `crit_only` | only critical hits are eligible | Before doing any work, `fire_impact_procs` short-circuits on the cases that can never proc: no registered procs, a pet hit, or zero damage. ```rust if state.impact_procs.is_empty() { return; } if is_pet { return; } if amount <= 0.0 { return; } ``` Pet damage explicitly does not trigger player impact procs, see [pets](/dev/bible/engine/pets). Past the guards it iterates the registered procs, applies each proc's filters, and rolls. To avoid cloning the proc vector on every damage event (the hot path), it uses a `mem::take` and restore against a reusable scratch buffer, the same allocation-avoidance pattern the cast hooks use. The proc function itself receives a `HookCtx`, the constrained post-event context that can apply or consume an aura, gain a resource, reduce or reset a cooldown, deal damage, schedule events, and roll RPPM, but cannot reach the raw event queue directly. That constraint is what keeps proc effects composable: a proc can only do things the engine knows how to schedule. ## Resources A resource is a pool with a current value, a maximum, and a regeneration rate: mana, energy, rage, combo points, and so on. The engine models each spec's primary and (optional) secondary resource as a `ResourceSlot` in the [DenseBuffer](/dev/bible/engine/rotation-compiler), and a cast spends from and gains into those slots as part of the [cast pipeline](/dev/bible/engine/cast-pipeline). ## The slot A `ResourceSlot` is three numbers, `current`, `max`, and `regen_per_sec`, with two mutating operations: - `spend(amount) -> bool`: if `current < amount` it returns `false` and changes nothing; otherwise it subtracts and returns `true`. Spending is all-or-nothing, never partial. - `gain(amount) -> f64`: adds, clamps to `max`, and returns the wasted overflow, the amount that would have pushed `current` past `max`. That `gain` return value is the whole reason the engine can report wasted resource generation. It hands back what it could not fit instead of silently clamping: ```rust pub fn gain(&mut self, amount: f64) -> f64 { let new = (self.current + amount).min(self.max); let actual = new - self.current; self.current = new; amount - actual } ``` The slot also exposes derived reads for rotations, `deficit`, `pct`, `deficit_pct`, and `time_to_max`, so a script can ask "am I close to capping" or "how long until full" without the engine recomputing anything. ## Spending and gaining a cast When `process_cast` runs at `CastComplete`, it spends and gains resources through `process_resources`, called right after the cast telemetry is emitted. That handles the primary and secondary resource in one pass via `process_single_resource`, which does the same thing for each: - **Cost**: if `cost > 0`, spend it and emit a `Spend` event to the [telemetry sink](/dev/bible/engine/metrics). - **Gain**: if `gain > 0`, gain it and emit a `Gain { wasted }` event carrying the overflow from `ResourceSlot::gain`. The primary cost has two exceptions, both decided at the top of `process_resources`. The cost is zeroed when a cost-bypass aura is active, the mechanic behind "your next cast is free" buffs, or when the spell is a channel whose cost is paid per tick rather than up front. Otherwise it is the spell's flat `resource_cost`: ```rust fn process_resources(ctx: &mut CombatCtx<'_>, spell: &crate::state::SpellData, now: SimTime) { let bypass = spell.cost_bypass_aura_id != 0 && ctx .state .is_named_aura_active(ctx.buf, spell.cost_bypass_aura_id); let primary_cost = if bypass || (spell.is_channel && spell.channel_tick_cost > 0.0) { 0.0 } else { spell.resource_cost }; process_single_resource(ctx, primary_cost, spell.resource_gain, false, now); process_single_resource( ctx, spell.secondary_resource_cost, spell.secondary_resource_gain, true, now, ); } ``` Whether a cast is even allowed to start is a separate, earlier check. `can_cast` runs at `on_player_ready` and includes a resource-cost gate. If the resource is short, the rotation evaluator computes a `resource_wait`, the time until enough primary resource regenerates, and the handler waits instead of casting. So `process_resources` at cast completion is the bookkeeping; the affordability decision already happened. ## Regeneration Energy-style resources regenerate continuously, and the engine does this lazily rather than on a tick. `sync_resource` is called at the start of `on_player_ready` and brings the resource up to the current time in one step. It opens with a guard that makes the call idempotent within a timestamp: if `now <= last_sync` it returns immediately, so repeated reads at the same instant don't double-regenerate. ```rust pub fn sync_resource( state: &CombatState, buf: &mut DenseBuffer, now: SimTime, last_sync: &mut SimTime, ) { if now <= *last_sync { return; } ``` Past the guard the math is a single catch-up step: ``` elapsed_s = (now - last_sync) / 1000 regen = regen_per_sec * haste_mult * elapsed_s current = min(current + regen, max) ``` Computing regeneration on demand, only when the rotation is about to make a decision, instead of scheduling a stream of tiny regen events keeps the [event queue](/dev/bible/engine/event-system) small. There is no benefit to ticking energy 10 times a second when nothing reads it in between. The `haste_mult` term is the resource-system equivalent of hasted attack speed. When the spec's resource is haste-scaled (`state.haste_regen`), the engine reads the live haste, the player's base haste plus any aura contributions from `accumulate_buffs`, and scales regeneration by `1 + haste_pct/100`. It also writes the effective per-second rate back into the slot so the rotation's `regen` and `time_to_max` reads reflect current haste. ## Telemetry Every spend and gain pushes a `ResourceEvent` carrying the kind (`Spend` or `Gain { wasted }`), the resource type id, the amount, and the post-operation current and max. The telemetry accumulator folds those per-iteration events into per-resource totals, gained, spent, and wasted, which is how the results UI can show, for example, how much energy a rotation threw away by gaining at cap. The `wasted` figure is only meaningful because `ResourceSlot::gain` returns the overflow rather than silently clamping. ## Stats A character's gear gives ratings, a flat number of crit rating, haste rating, and so on. Combat math wants percentages. Converting one to the other is not a constant: WoW applies diminishing returns to secondary stats, so the hundredth percent of crit costs more rating than the first. The engine does not approximate that curve; it reads the game's own diminishing-returns curves out of the DBC data and interpolates them. ## Rating to percent The conversion is `rating_to_percent`: ```rust pub fn rating_to_percent(rating: f64, rating_type: RatingType, curves: &ResolvedCurves) -> f64 { let raw_pct = rating / BASE_RATING_80; let curve_id = dr_curve_id(rating_type); curves.interpolate(curve_id, raw_pct).unwrap_or_else(|| { tracing::warn!( target: "wowlab::stats", curve_id, rating_type = ?rating_type, "DR curve missing; falling back to raw rating percent" ); raw_pct }) } ``` The shape is two steps. The raw percent is `rating / BASE_RATING_80`, where `BASE_RATING_80` is 180.0, the rating-per-percent at level 80 before diminishing returns, the linear baseline. The `interpolate` step then bends that raw percent through the relevant DBC curve to get the effective percent. The curve is keyed by rating type: - **Secondary** stats (crit, haste, mastery, versatility) use `SECONDARY_DR_CURVE_ID`, curve 21024. - **Tertiary** stats (leech, speed, avoidance) use `TERTIARY_DR_CURVE_ID`, curve 21025. `dr_curve_id` does that mapping. The curves themselves are `ResolvedCurves`, piecewise-linear point sets resolved from the game data scaling tables at bootstrap and clamped at both endpoints. This matters for honesty about the model: the engine is **not** applying a flat "30% cap" or any hand-tuned diminishing-returns approximation. It interpolates the same curve the game uses, so the DR behaviour is correct by construction as long as the curve data is current. There is one fallback, visible above in the `unwrap_or_else`. If the requested curve is missing from the resolved data, `rating_to_percent` logs a warning and returns the raw, un-diminished percent. That is a degraded mode, it means DR is effectively off for that stat, and the warning is there so it is never silent. ## Recompute `recompute` turns the raw primary stats and ratings into the combat-ready numbers the [damage formula](/dev/bible/engine/combat-formulas) reads: ```rust pub fn recompute( primary: &PrimaryStats, ratings: &Ratings, spec: SpecId, mastery_coeff: f64, curves: &ResolvedCurves, ) -> CombatStats { let primary_attr = primary_stat_for_spec(spec); let primary_value = primary.get(primary_attr); ``` What it produces: - **Attack power and spell power** are both set to the spec's primary attribute value. The spec's primary attribute, strength, agility, or intellect, comes from `primary_stat_for_spec`. - **Crit** = `BASE_CRIT + crit_pct/100`, where `BASE_CRIT` is the innate 5% and `crit_pct` is the rating-derived percent. - **Haste** = the rating-derived haste percent. - **Mastery** = `mastery_pct * mastery_coeff`. The per-spec coefficient is what translates "mastery percent" into the spec's actual mastery effect; the manifest supplies it. - **Versatility** = the rating-derived versatility percent. Every secondary above goes through `rating_to_percent`, so the DR curve is applied uniformly. Attack power and spell power being identical to the primary attribute is a simplification. It skips weapon DPS and the various AP-per-stat conversions, but the weapon contribution to physical hits is added separately in the damage chain via the weapon roll, so the AP value here is the stat-scaling portion only. `CombatStats` is the output struct, re-exported from `engine-ports`. Those resolved stats are what the damage chain reads when it computes a hit. There is also a `default_stats()` used for introspection and quick smoke tests, AP/SP 15000, crit 25, haste 15, mastery 40, versatility 5, so a spec can be built and introspected without any gear resolved. ## Where the curves come from The DR curves are part of [game data](/dev/bible/game-data/data-resolution): the `curve_points` scaling table is fetched alongside item scaling, folded into `ResolvedCurves` via `from_scaling` (which sorts each curve's points by x), and carried into the sim as part of the resolved data. Because the same curve data drives both gear scaling and stat conversion, the rating-to-percent numbers stay consistent with how the game would scale the gear that produced those ratings. ## Spec Handlers The loop drives a `SpecHandler` trait object and knows nothing else. The thing on the other side of that trait, for every spec, is one type: `CombatHandler`. This page is about how a declarative spec definition becomes a `CombatHandler` the loop can run. The path runs from generated code, through the builder, into the registry the bootstrap looks up. ## One handler, every spec `SpecHandler` is the contract between the clock and the combat logic. Its methods mirror the [event variants](/dev/bible/engine/event-system): `on_player_ready` returns the next `SpecAction`, `on_cast_complete` runs the [cast pipeline](/dev/bible/engine/cast-pipeline), and `on_aura_tick`, `on_aura_expire`, `on_auto_attack`, `on_cooldown_ready` handle the rest. Around those are lifecycle hooks like `on_sim_start` and `reset`, the `flush_scheduled` drain the loop calls after every dispatch, and the `total_damage` read-out the loop needs after a run. There is exactly one production implementation, and it bundles a `CombatState` (the static spell/aura definitions plus mutable non-buffer state), a `DenseBuffer` (the runtime state the rotation reads), a `RotationEngine` (the compiled rotation), and its own `SimRng`: ```rust pub struct CombatHandler { state: CombatState, buf: DenseBuffer, engine: RotationEngine, rng: SimRng, names: Vec, last_resource_sync: SimTime, } ``` Its trait methods map cleanly onto combat functions. `on_player_ready` syncs resources and target health, evaluates the rotation, and GCD-gates the result. `on_cast_complete` builds a `CombatCtx` and calls `process_cast`. `flush_scheduled` drains the handler's pending events into the loop's queue. So "a spec" is not a type. It is a configuration of spells, auras, resources, and a rotation, fed into the one `CombatHandler`. ## From manifest to handler That configuration is authored as a TOML manifest and turned into Rust at build time by the code generator (covered under [Codegen](/dev/bible/game-data/codegen)). The generated code for each spec is three things wired together: a `build_combat_system` function, a `new_handler` factory, and a `define_spec_descriptor!` registration. `build_combat_system` is a long fluent chain on `CombatSystemBuilder`. It declares the resource, the secondary resource if any, every aura and spell and auto-attack, threads in the `ResolvedGameData` snapshot, sets fight duration and enemy count, registers equipped items, and finally calls `build(rotation_json)`. The builder methods take closures so each spell and aura can be configured inline, `spell(name, id, |s| ...)` and `aura(name, id, |a| ...)`, and the generated builder functions inside those closures read values straight out of `ResolvedGameData` wherever the manifest left a field unspecified. The manifest is overrides; the game data is the default. `build` is where compilation happens. It compiles the rotation JSON into a `RotationEngine` (JIT or interpreter, per the [rotation compiler](/dev/bible/engine/rotation-compiler)), assembles the `Vec`, populates the initial dense buffer, and returns the four pieces a handler needs: ```rust pub struct BuiltCombatSystem { pub state: CombatState, pub rotation: RotationEngine, pub buffer: DenseBuffer, pub names: Vec, } ``` The generated `new_handler` then wraps that built system into a `CombatHandler` boxed as a `Box`, by default through `default_new_handler`, or through a spec-specific custom handler when the manifest declares one. ## The inventory registry The bootstrap does not import any spec by name. It looks one up by id at runtime, and the lookup works because every generated spec registers itself at link time through the `inventory` crate. The `define_spec_descriptor!` macro expands to an `inventory::submit!` of a `SpecDescriptor`, carrying the spec id, class id, display name, talent list, hero-talent trees, and the `handler_factory` function pointer: ```rust inventory::submit! { wowlab_engine_ports::SpecDescriptor { spec_id: $spec_id, class_id: ($spec_id).class(), display_name: $name, talents: $talents, rotation_field_schema: $schema, metrics_plugin_id: $plugin_id, hero_talent_trees: $hero, handler_factory: |params| ($factory)(params), } } ``` At bootstrap, `find_descriptor(spec_id)` iterates the inventory and returns the matching descriptor, or a `SpecConstruction` error if none is registered. Then `bootstrap` calls that descriptor's `handler_factory` with a `HandlerParams` carrying the resolved game data, rotation JSON, stats, talents, gear, buffs, and race, and gets back the boxed handler. That is the join: the [orchestration layer](/dev/bible/distribution/orchestration) finds a descriptor it never linked against by name, and the descriptor knows how to build its own handler. One subtlety the registry depends on: inventory registrations can be dead-code-eliminated if nothing references the generated module. The generated `specs/mod.rs` defends against this with a `force_link_generated_specs` function that touches each `build_combat_system` pointer through `black_box`, so the linker keeps the `submit!` calls. Without it, a release build could silently drop specs from the registry. ## The introspection trick There is a chicken-and-egg problem in this pipeline worth calling out, because it shapes how `ResolvedGameData` is built. To resolve a spec's game data, the bootstrap needs to know which spell and aura ids the spec uses. But the only thing that knows those ids is the handler, which cannot be built without the resolved game data. The resolver breaks the cycle by building a throwaway handler against empty default game data, solely to call `introspect()` and read back the spell, aura, and auto-attack ids the spec declares. Those ids drive the real resolution pass, and the real handler is built afterward against the populated data. The `SpecHandler` trait carries `introspect` precisely so this discovery step exists. The accessors on `ResolvedGameData` are written to tolerate the empty case. They return defaults when the data map is empty (the introspection path) but `None` when the map is populated yet missing a spell, which the generated code turns into a hard `MissingSpellData` error. The full mechanics are under [Data Resolution](/dev/bible/game-data/data-resolution). With the handler built, registered, and driven by the loop, the only thing left is reading the result back out. The [metrics](/dev/bible/engine/metrics) page covers how a run's events become the telemetry the rest of the platform consumes. ## Metrics A single run produces a DPS number, but a useful sim produces a distribution: a mean and its error, per-spell breakdowns, buff uptimes, resource accounting, and a representative timeline to look at. The engine collects all of that incrementally as it runs, never holding more than one iteration's worth of fine-grained events in memory at a time. The shape is two layers. Each iteration fills a `TelemetrySink` with raw events; after the iteration, those events are folded into a `TelemetryAccumulator` that carries running aggregates across iterations. The sink is cleared and reused; the accumulator grows. At the end, the accumulator encodes itself into protobuf bytes: the `ChunkTelemetry` that travels back to the rest of the platform. ## The two layers `TelemetrySink` is the per-iteration collector. It is a bundle of pre-allocated vectors, one per event category, that the combat functions emit into during a run: ```rust pub struct TelemetrySink { pub damage: Vec, pub auras: Vec, pub resources: Vec, pub cooldowns: Vec, pub casts: Vec, } ``` It is allocated once per chunk and cleared at the start of each iteration, so the hot path appends without allocating. `TelemetryAccumulator` is the cross-iteration aggregate. It holds the DPS running sums (count, sum, sum-of-squares, min, max, and the full value vector for the histogram), per-spell aggregates, aura uptimes, resource totals, per-second damage buckets, the direct/periodic/pet damage split, and the representative iteration's captured timeline: ```rust pub struct TelemetryAccumulator { iteration_count: u32, dps_sum: f64, dps_sum_sq: f64, dps_min: f64, dps_max: f64, spell_totals: IntMap, dps_values: Vec, representative_iteration: Option<(usize, f64)>, aura_uptimes: IntMap, resource_totals: IntMap, cooldown_totals: IntMap, total_duration_ms: f64, gcd_locked_ms_total: f64, representative_sink: Option, pending_capture: bool, direct_damage_total: f64, periodic_damage_total: f64, pet_damage_total: f64, bucket_sums: Vec, trace_extras_enabled: bool, } ``` The handoff happens in the loop's post-iteration tail. After a run finishes, the loop computes the iteration's DPS and calls three accumulator methods in order: `record_sink_events` to fold this iteration's sink into the running totals, `record_iteration` to update the DPS statistics, and `maybe_capture_representative` to possibly snapshot this iteration's timeline.
```mermaid flowchart TB Iter["iteration: combat functions emit"] -->|emit_damage / emit_aura / ...| Sink[(TelemetrySink - per iteration)] Sink -->|record_sink_events| Acc["TelemetryAccumulator"] Iter -->|DPS| RecIter["record_iteration: running sums + representative pick"] RecIter --> Acc Sink -->|maybe_capture_representative| Rep["representative timeline snapshot"] Rep --> Acc Acc -->|merge parallel chunks| Merge["combined accumulator (sum-of-squares + SIMD bucket sums)"] Acc -->|encode| Encode["proto ChunkTelemetry"] Merge -->|encode| Encode Encode -->|build_histogram| Hist["HDR histogram"] Encode -->|encode_to_vec| Bytes["telemetry_bytes"] ```
This figure expands the **telemetry** box of the [sim-pipeline figure](/dev/bible/engine/discrete-event-simulation). The flow: - During an iteration the combat functions push into the sink (`emit_damage`, `emit_aura`, `emit_resource`, and so on). - `record_sink_events` folds the iteration's events into the accumulator's running totals, and buckets damage into per-second slots. The timeline's resolution is one second (`TIMELINE_BUCKET_MS`). - `record_iteration` updates the DPS sums and re-selects the representative. - `merge` combines two accumulators when chunks run in parallel. - `encode` produces the protobuf, building the HDR histogram and packing every aggregate. ## The representative iteration A run does thousands of iterations, but you can only show one timeline. Which one? The accumulator picks the iteration whose DPS is **closest to the running mean**. After each iteration it compares that iteration's distance from the current mean against the stored representative's distance, and keeps whichever is closer: ```rust let running_mean = self.dps_sum / self.iteration_count as f64; let distance = (dps - running_mean).abs(); let is_new_representative = match &self.representative_iteration { Some((_, rep_dps)) => distance < (*rep_dps - running_mean).abs(), None => true, }; if is_new_representative { self.representative_iteration = Some((iter_idx, dps)); self.pending_capture = true; } ``` When a new representative is chosen, the next `maybe_capture_representative` snapshots its timeline (cast markers, damage markers, aura windows, cooldown windows) into the accumulator. Picking the median-DPS iteration rather than the best or worst is a deliberate choice: a timeline you show a user should be typical, not a lucky outlier. The cost is that the captured timeline is whatever iteration happened to be closest at the moment it was chosen, which can shift as more iterations arrive, but since selection tracks the running mean, it converges to a genuinely representative run. ## Statistics and convergence The DPS statistics accumulate incrementally: each iteration adds to `dps_sum` and `dps_sum_sq`, from which the mean and standard deviation fall out cheaply without a second pass. The same step tracks a running mean to pick the representative iteration. The standard deviation feeds the adaptive early-exit in the [chunk loop](/dev/bible/distribution/orchestration): the loop periodically computes the relative standard error of the mean and stops once it drops below the requested `target_error`, so a chunk runs exactly as many iterations as it needs for the requested precision and no more. When chunks run in parallel (the CLI's multi-threaded runner), each thread builds its own accumulator and they are combined with `merge`. Merging sums the counts and the DPS sum and sum-of-squares directly, which combines the two distributions exactly since standard deviation is recovered from those sums, sums the per-spell and resource aggregates, adds the per-second bucket sums with a SIMD helper, and re-selects the representative against the combined mean. This is what lets a sim split across cores and still report one coherent distribution. ## The protobuf encoding The final step is `encode`, which consumes the accumulator and produces the `ChunkTelemetry` protobuf bytes. It builds an HDR histogram of the DPS values via the `hdrhistogram` crate, emits the per-spell action rows, the aura and resource and cooldown rows, the execution and damage-profile data, the per-second bucket sums, and the representative timeline snapshot, then serialises the whole thing with prost's `encode_to_vec`. Two encoding details matter for fidelity. DPS and damage values are scaled by ten (`PROTO_DPS_SCALE`) and resource values by a hundred (`PROTO_RESOURCE_SCALE`) before being rounded into integers, so a decimal place of precision survives the integer wire format. And the histogram is HDR rather than a fixed-bin histogram, so it records the DPS distribution across its full range at consistent relative precision without committing to bucket boundaries up front. Those bytes are the `telemetry_bytes` of a `ChunkReport`. From here the data leaves the engine entirely, decoded in the [portal](/dev/bible/portal/simulation-ui) for charts, or merged across chunks by the [orchestration](/dev/bible/distribution/orchestration) and [hosted-compute](/dev/bible/distribution/hosted-compute) layers for a full-job result. ## Pets I want to be candid here: pets are not a first-class actor in the engine. There is no separate pet event loop, no pet handler, no independent pet rotation. What exists is scaffolding: a handful of touch-points that let pet damage be attributed and let a rotation know a pet is present, sitting on top of the player's own [event loop](/dev/bible/engine/event-system). This page documents exactly that scaffolding and nothing more. ## What's actually there **A pet flag on the manifest.** A spec can declare `has_pet` in its `[spec]` section. That flows through the build into `BaseStats.has_pet`. **A pet buffer slot.** The [DenseBuffer](/dev/bible/engine/rotation-compiler) carries a singleton `PetSlot` with three fields: `is_active`, `count`, and `expires_at`, exposed to rotations as `is_active`, `count`, and `remaining`. At buffer initialization a spec flagged `has_pet` starts with the pet summoned, so `pet.is_active` reads true from `t = 0`: ```rust if state.base_stats.has_pet { *buf.pet_mut() = PetSlot { is_active: 1, count: 0, expires_at: 0.0, }; } ``` That is the whole lifecycle. A rotation can branch on whether a pet is up, but the engine does not itself summon, dismiss, or time a pet. Those fields are seeded once and otherwise driven only by whatever a spec's hooks choose to write. **A pet auto-attack.** `AutoAttackData` has an `is_pet` flag. In `phase_auto_attacks`, a pet auto-attack is the branch that gets no weapon slot. Pets have no weapon item, so it deals AP-only damage on its own swing timer. This is the one place a pet generates damage on its own schedule, and it does so by riding the same `AutoAttack` event the player uses. **A pet damage tag.** Every damage instance carries an `is_pet` flag, through `HitFlags::PET` and on the `DamageEvent` itself. The telemetry accumulator keeps a separate `pet_damage_total`, summed whenever a tagged event arrives, and reports it as its own slice of the damage profile alongside direct and periodic damage. That is why the results UI can show a pet's share of total damage. **Pet hits don't trigger player procs.** `fire_impact_procs` returns immediately on a pet hit, and the AoE path treats pet hits as single-target. Pet damage is accounted for but kept out of the player's [proc](/dev/bible/engine/procs) and cleave machinery. ## What this is not Taken together, the scaffolding lets a spec model a pet as a tagged source of auto-attack and ability damage that shares the player's clock and stats. What it does not provide: - No independent pet rotation or AI. A pet does not decide its own casts. - No pet stat sheet. Pet damage scales off the player's attack power via the auto-attack `ap_coef`, not a separate pet paperdoll. - No summon/despawn lifecycle beyond the seeded `PetSlot` fields; the `expires_at`/`count` fields exist but are not driven by a general pet-management system. - No pet-specific resources, cooldowns, or auras as first-class buffer domains. A proper pet implementation would mean a second actor: its own handler, its own slots in the buffer keyed per-pet, its own scheduled events, and a way to relate its stats to the owner's. That is a meaningful amount of structure, and the engine does not pretend to have it. The pet specs that exist today are approximated by folding pet output into the player's timeline as tagged damage, accurate enough for total throughput, but not a simulation of the pet as an entity. I would rather state that plainly than imply more. ## Orchestration Every caller that wants a simulation result goes through one function: `simulate_intent`. That means the browser, a hosted compute node, the CLI, and the forge comparison tool. You hand it a TOML config, a chunk description, a seed, a data resolver, and a progress sink; it hands back protobuf telemetry bytes. Everything between is `bootstrap` (turn config + game data into a ready handler) followed by `run_chunk` (drive that handler through N iterations). `simulate_intent` lives in the `engine-application` crate, the layer that sits between the host shells and the pure simulation core. Its job is orchestration, not simulation: parse and validate the intent, resolve all game data through the `DataResolver` port, build the per-spec `SpecHandler`, then loop the core engine and accumulate telemetry. ## The pipeline The figure below expands the **Engine** box of the system-context diagram. It is the Zoom-1 `sim-pipeline` figure; this page owns the Zoom-2 view of its **run_chunk loop** box, the [chunk lifecycle](#the-chunk-as-a-state-machine) state machine at the end. There are four public entry functions, and they all funnel into the same `bootstrap_chunk` → `run_chunk*` spine: | Function | Purpose | Drives | | ---------------------------- | --------------------------------------------------------------------------------------------------------------- | ---------------------- | | `simulate_intent` | Convenience entry. Wraps args into a `SimRequest` with default overrides, then calls `simulate_intent_request`. | `run_chunk` | | `simulate_intent_request` | Canonical entry taking a `SimRequest<'_>` envelope. | `run_chunk` | | `simulate_intent_with_trace` | Trace-mode entry for the in-browser preview pane; attaches a `DecisionTraceSink`. | `run_chunk_with_trace` | | `build_handler` | Builds a handler without running iterations (forge `paperdoll`). Runs `on_sim_start` once. | none | The signature of the convenience entry is the contract every host meets: ```rust pub async fn simulate_intent( sim_config: &str, chunk: &ChunkAssignment, seed_base: u64, resolver: &DynDataResolver<'_>, progress: &dyn ProgressSink, ) -> Result { ``` `simulate_intent` wraps its arguments into a `SimRequest` with `IntentOverrides::default()` and delegates to `simulate_intent_request`, whose body is the whole layer in two lines: `bootstrap_chunk` then `run_chunk`. ## bootstrap: config and game data become a handler `bootstrap` is where the cost lives. It is async because game-data resolution is async. In order, it: 1. Parses the TOML into a `SimConfigIntent` and validates `intent_version == "v1"`, the spec id, a non-empty `rotation_id`, and the fight `duration_s` against `1.0..=MAX_DURATION_S` (1800 s). 2. Resolves the buff toggles, bloodlust, flask (defaults on), augment rune (defaults on), tempered pre-pot, and race, all suppressed when `overrides.no_buffs` is set. 3. Looks up the spec via `find_descriptor(spec)`, an inventory-registry lookup over the link-time-collected `SpecDescriptor`s. 4. Fetches the rotation JSON through the resolver. 5. Turns gear into stats, either `compute_stats_from_gear` (the real pipeline) or `bench_gear_result` (default stats), both yielding a `GearResult`. 6. Assembles the full list of extra spell ids to resolve, item-effect spells, set-bonus auras, the on-toggle buff spells, the racial, and the codegen-emitted item resolve ids, then decodes talents. 7. Calls `resolve_game_data`, the big async pass that introspects the spec to discover its spell/aura ids and resolves every one into a `ResolvedGameData` ([data resolution](/dev/bible/game-data/data-resolution) covers this in detail). 8. Builds the handler by calling the descriptor's `handler_factory` with a 15-field `HandlerParams`. The output is a single `Box`. `bootstrap_chunk` wraps it together with the parsed duration and the effective seed, `seed_base.wrapping_add(chunk.seed_offset)`, into a `ChunkBootstrap`. The seed offset is how distinct chunks of the same job produce uncorrelated Monte-Carlo streams from one master seed. ## run_chunk: one handler, N iterations A chunk is the atomic unit of distribution. It declares how many iterations to run and, optionally, an early-exit error target. `run_chunk` builds the handler exactly once, allocates one `EventQueue` and one `TelemetrySink`, and reuses them across every iteration. Only `handler.reset()` and `queue.clear()` run between iterations: ```rust for i in 0..max_iters { handler.reset(); queue.clear(); ``` That is a deliberate trade. It makes a chunk cheap to run a thousand times but means the handler must scrub all its per-iteration state in `reset`. Each iteration `i`: 1. Builds a deterministic RNG from `SimRng::from_prefix(seed_prefix, i)`, where `seed_prefix` is an FNV-1a hash of `seed_base` and the chunk id, computed once. 2. Constructs a `SimEngine` over the shared refs and calls `engine.run()`, one full combat run. 3. Records the iteration's DPS into the one `TelemetryAccumulator`. Every `CHECK_INTERVAL = 100` iterations (and on the last), the loop computes a running relative standard error and, if the chunk carries a `target_error` and has passed `min_iterations`, breaks early once the error falls below the target. This adaptive early-exit is why a chunk's declared `iterations` is a ceiling, not a fixed count. When the loop ends, the accumulator encodes into a protobuf `ChunkTelemetry`, which becomes the `telemetry_bytes` of the returned `ChunkReport`. `ChunkAssignment` and `ChunkReport` are defined in `engine-ports`, not here. The application layer only consumes them. The assignment is what the coordinator hands a node, and it carries everything the loop needs: ```rust #[derive(Debug)] #[non_exhaustive] pub struct ChunkAssignment { pub job_id: String, pub chunk_id: String, pub chunk_index: u32, pub chunk_count: u32, pub iterations: u32, pub seed_offset: u64, pub permutation_index: Option, pub min_iterations: Option, pub max_iterations: Option, pub target_error: Option, } ``` The `permutation_index` is set for tournament and factorial work and `None` for a uniform split; the `min_iterations`/`max_iterations`/`target_error` triple drives the adaptive early-exit above. `ChunkReport` is the small envelope back: `job_id`, `chunk_id`, `chunk_index`, and `telemetry_bytes`. ## The chunk as a state machine
```mermaid stateDiagram-v2 [*] --> Assigned Assigned --> Bootstrapped: "bootstrap_chunk OK (handler, duration, seed)" Bootstrapped --> Running: "progress.on_start, build handler once" Running --> Running: "iteration engine.run() OK (completed++)" Running --> ConvergedEarly: "target_error met and completed >= min_iterations" Running --> Exhausted: "completed == max_iterations" Running --> Failed: "engine.run() Err EventBudgetExceeded" ConvergedEarly --> Reported: "encode ChunkTelemetry, on_complete" Exhausted --> Reported: "encode ChunkTelemetry, on_complete" Reported --> [*] Failed --> [*] ```
The `Failed` edge is the one failure the executor surfaces directly. If any iteration's event loop exceeds `MAX_EVENTS = 500_000`, `engine.run()` returns `SimRunError::EventBudgetExceeded`, which `run_chunk` maps to `EngineError::SimulationRuntime` and the whole chunk fails. There is no per-iteration recovery. One runaway iteration kills the chunk, and on the distributed path the chunk is later reclaimed and re-enqueued. ## Native callers vs the WASM caller The contract is the same everywhere, but how the resolver is built differs. **Native hosts** construct a Rust `DynDataResolver` and call `simulate_intent` directly. The worker node's `SimRunner::run` builds a `ChunkAssignment::single_indexed("", chunk_index, iterations)` and calls `simulate_intent` over a `SupabaseResolver`. The CLI single-thread path does the same with a `ConsoleProgress` sink. The CLI multi-thread path is the one exception that does _not_ call `simulate_intent`: it bootstraps once and then builds a per-rayon-thread handler factory around `run_chunk_into_accumulator`, merging accumulators across threads. Forge uses `simulate_intent` for its SimC comparison provider and `build_handler` for `paperdoll`. **The WASM host** cannot pass a Rust resolver. Instead the browser hands in a JS object, which `JsResolver` adapts into a `DataResolver`, and the WASM export calls the _same_ `simulate_intent`. The preview pane is the only caller of `simulate_intent_with_trace`, because attaching a decision-trace sink forces the interpreter rather than the JIT. That JS resolver and the worker plumbing around it are the subject of the next page, the [WASM boundary](/dev/bible/distribution/wasm-boundary). ## WASM Boundary The same Rust engine that runs on a hosted node also runs in your browser tab. Two crates, `wowlab-engine` and `wowlab-common`, compile to WebAssembly, and the simulation runs in a [web worker](https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API) rather than on the main thread, so a long sim never freezes the UI. Game data the engine needs is fetched back across the boundary through a JavaScript callback object. This page expands the **Browser** box of the system-context diagram. The figure below is the Zoom-1 `wasm-boundary` figure; this page also owns the Zoom-2 `worker-pool-state` view of its **web worker** box.
```mermaid flowchart TB subgraph rust["Rust (compiled to WASM)"] Engine["wowlab-engine cdylib"] Common["wowlab-common cdylib"] end Engine -->|wasm-pack target web| Exports["wasm-bindgen exports (runSimulation, runIterationTrace, getImplementedSpecs, ...)"] Common -->|merged into engine bundle| Exports Exports -->|initSync in worker, Comlink.wrap| Worker["web worker (sim-worker.ts)"] Worker -->|runSimulationWithProgress resolver| JsResolver["JsResolver (Rust bridge)"] JsResolver -->|Reflect.get + call + JsFuture| JsImpl["JS resolver (Map -> IndexedDB -> Supabase)"] JsImpl -->|getSpell/getItem/getScalingData/getPowerTypes/getRotationScript| JsResolver ```
## What compiles, and what it exports The engine crate is built `crate-type = ["rlib", "cdylib"]`; the `cdylib` is what produces the `.wasm`. Its WASM module is gated behind both `target_arch = "wasm32"` and a `wasm` feature, and it force-links the generated spec content so the inventory registry survives dead-stripping. There is no LLVM in the browser, so the WASM build always uses the rotation interpreter, never the JIT. See the [rotation compiler](/dev/bible/engine/rotation-compiler). The engine's WASM module has five submodules, `init`, `metadata`, `rotation`, `simulation`, `error`, and exports these functions across the boundary: | Export (js_name) | Module | Async? | What it does | | --------------------------------------- | ---------- | ------ | ------------------------------------------------------------------------- | | `runSimulation` | simulation | yes | Runs a chunk, returns protobuf `ChunkTelemetry` bytes | | `runSimulationWithProgress` | simulation | yes | Same, plus a per-tick JS progress callback; returns `{bytes, chunkIndex}` | | `runIterationTrace` | simulation | yes | One traced iteration for the preview pane (forces the interpreter) | | `getImplementedSpecs` | metadata | no | Array of implemented specs (id, class, slug, counts) | | `getSpecIntrospection` | metadata | no | Spell/aura names with zeroed values (empty game data) | | `getSpecIntrospectionResolved` | metadata | yes | Introspection with real resolved values | | `validateRotation` | rotation | no | Structural rotation validation | | `validateRotationForSpec` | rotation | no | Spec-aware validation (resolves spell/aura/talent names) | | `getFieldDescriptors` | rotation | no | The rotation field schema for a spec | | `getEngineVersion` / `getEngineGitHash` | metadata | no | Build identity | The parse, build, and decode helpers a host also needs, `parseSimc`, `buildSimConfig`, `decodeJobResult`, `decodeAndDerive`, `resolveItem`, are **not** in the engine crate. They live in `wowlab-common`. Because `wowlab-engine` depends on `wowlab-common` with the `wasm` feature, the engine bundle's `.d.ts` re-exports all of them too, so the engine package is a superset. Studio nonetheless imports parse/decode/build from the `wowlab-common` package and sim/trace/metadata from `wowlab-engine`. The [content system](/dev/bible/portal/content-system) and [simulation UI](/dev/bible/portal/simulation-ui) consume these on the main thread. When the module instantiates, `init_wasm_runtime` runs once: it calls the WASM constructors, force-links the generated specs, installs a panic hook, and panics if the spec registry is empty: ```rust let descriptor_count = inventory::iter::().count(); if descriptor_count == 0 { // #t(panic) boot invariant: spec registry must not be empty after wasm initialization panic!( "engine spec registry is empty in wasm runtime; check inventory ctor wiring (__wasm_call_ctors) and generated force-link registrations" ); } ``` That panic is a boot invariant. An engine that links but registers no specs is broken, and failing loudly at instantiate is better than returning empty results later. The installed panic hook also posts engine panics back to the worker host via `self.postMessage({ type: "wowlab:engine-panic" })`, so a crash inside the WASM module surfaces as a message on the main thread instead of a silent wedge. ## The build pipeline `pnpm build` runs `buildCommon` then `buildEngineWasm`, each calling `buildWasmPkg`. The pipeline is: 1. `wasm-pack build --target web` produces an ESM module with an async `default()` init and a synchronous `initSync({module})`. 2. Stamp the package version to `{baseVersion}-{sha256(bg.wasm).slice(0,12)}`, a content hash so a changed engine busts caches. 3. `pnpm pack` the tarball into `packages/archives`, then rewrite `apps/studio/package.json`'s `file:` dependency to point at it. Studio's `predev`/`prebuild` then run `sync:wasm`, which copies both `.wasm` blobs into `apps/studio/public/wasm/` under content-hashed names and writes a `manifest.json` mapping `engineWasm`/`commonWasm` to their hashed URLs. The browser reads that manifest to fetch the right bytes. ## Loading: default() on main, initSync() in workers The main thread loads each module lazily as a singleton: `await import("wowlab-engine")` then `await m.default()`, which fetches and instantiates the `.wasm`. The `WasmIsland` provider gates this behind a mount check and a `WebAssembly.Module` support probe, then exposes `useCommon()`/`useEngine()` to consumers for synchronous main-thread calls like parse and decode. Workers take a different path. The pool fetches the `.wasm` bytes once on the main thread and `Comlink.transfer`s the `ArrayBuffer`s into each worker zero-copy. Each worker then instantiates each module synchronously from the bytes it was handed, guarded so a repeat `init()` is a no-op. Both modules follow the same shape: ```typescript if (!commonReady) { commonMod.initSync({ module: commonWasm }); commonReady = true; } ``` So the main thread uses `default()`, which is async and fetches, and workers use `initSync()`, which is synchronous and reads transferred bytes. The split avoids re-fetching the same `.wasm` per worker. ## The JS resolver protocol The engine never reaches the network itself. Inside `run_simulation` it wraps the JS object the host passed in with `JsResolver::new`, then erases it to a `DynDataResolver`. `JsResolver` implements the Rust `DataResolver` trait by reflecting on the JS object: every call does `Reflect::get`, casts to a `js_sys::Function`, calls it, casts the result to a `Promise`, and awaits via `JsFuture`: ```rust async fn call_method(&self, method: &str, args: &[&JsValue]) -> Result { let func = js_sys::Reflect::get(&self.inner, &JsValue::from_str(method)).map_err(|e| { ResolverError::JsBridge { method: method.to_string(), message: format!("missing method: {e:?}"), } })?; let func: js_sys::Function = func.dyn_into().map_err(|e| ResolverError::JsBridge { method: method.to_string(), message: format!("not a function: {e:?}"), })?; let promise = match args { [] => func.call0(&self.inner), [a] => func.call1(&self.inner, a), _ => unreachable!("call_method supports 0 or 1 args"), } .map_err(|e| ResolverError::JsBridge { method: method.to_string(), message: format!("call failed: {e:?}"), })?; let promise: js_sys::Promise = promise.dyn_into().map_err(|e| ResolverError::JsBridge { method: method.to_string(), message: format!("did not return a Promise: {e:?}"), })?; JsFuture::from(promise) .await .map_err(|e| ResolverError::JsBridge { method: method.to_string(), message: format!("rejected: {e:?}"), }) } ``` All deserialization is `serde_wasm_bindgen`, no JSON roundtrip. The trait methods are thin wrappers over this one helper. The JS object must implement five methods: `getSpell(id)` returns `SpellDataFlat`, `getItem(id)` returns `ItemDataFlat`, `getScalingData()` returns `ItemScalingData`, `getPowerTypes()` returns the power-type list, and `getRotationScript(id)` returns a string. The studio implementation backs each method with the same 3-layer read-through cache the resolver chapter describes: an in-memory `Map`, then IndexedDB keyed under `patch-v1:`, then Supabase PostgREST. A spell or item resolved by one chunk is cheap for the next. ## The worker pool The sim path runs in a pool of workers, each created with `new Worker(new URL("sim-worker.ts", ...), { type: "module" })` and wrapped with `Comlink.wrap`. The `WorkerManager` defaults are a watchdog of `120_000` ms, `maxWorkers` 8, and `poolSize` 4; the auto worker count is `min(navigator.hardwareConcurrency, maxWorkers)`. A single slot's life looks like this:
```mermaid stateDiagram-v2 [*] --> Initializing: "spawn, initSync engine+common" Initializing --> Idle: "ready" Idle --> Busy: "runChunk(chunk)" state Busy { [*] --> Fetching: "createResolver" Fetching --> Simulating: "runSimulationWithProgress" Simulating --> Signing: "buildSignMessageBytes + signMessage" Signing --> Submitting: "POST /chunks/complete" Submitting --> [*] } Busy --> Idle: "done" Busy --> Error: "watchdog timeout" Error --> Initializing: "restart (kill + respawn)" Idle --> [*]: "kill (releaseProxy, terminate)" ```
Inside `runChunk`, the worker builds a resolver, calls `runSimulationWithProgress` whose progress callback updates live counters and emits per-phase metrics (`fetching`, `simulating`, `signing`, `submitting`), normalizes the returned `{bytes, chunkIndex}` to a `Uint8Array`, signs the body with the node keypair via `buildSignMessageBytes` + `signMessage`, and POSTs the protobuf to the sentinel's `/chunks/complete` with the `X-Node-Key`/`X-Node-Sig`/`X-Node-Ts` headers. That Ed25519 signing and the sentinel side of the handshake are covered under [hosted compute](/dev/bible/distribution/hosted-compute); the browser is just one more signing node. The watchdog is the resilience mechanism. Each slot arms a `setTimeout`; every progress beat re-arms it. If a slot goes silent past the timeout, a wedged sim, an engine panic, or a hung fetch, the manager marks it errored and restarts the whole pool: it rejects pending work, releases the Comlink proxy, and terminates the worker. This is coarse, one stuck slot restarts all of them, but it keeps a single bad chunk from silently stalling the contribution loop, and the chunk it dropped is reclaimed by the sentinel for another node. ## Realtime Three parties never talk to each other directly: the user's browser, the sentinel scheduler, and the compute nodes. They communicate through a pub/sub message bus. That bus is beacon (Centrifugo), a [Centrifugo](https://centrifugal.dev) v6 server that nodes and browsers connect to over WebSocket and that the sentinel publishes through. Work assignments, live job progress, and node-online state all ride channels on this one server. This page expands the **beacon** box of the system-context diagram. The figure below is the Zoom-1 `realtime-topology` figure.
```mermaid flowchart TB Studio["studio (browser)"] Route["studio server route /api/realtime/job-token"] Sentinel["sentinel"] Node["node / browser node"] Beacon["beacon (Centrifugo)"] Nats["nats (broker)"] Redis[(Redis - presence)] Studio -->|GET job-token| Route Route -->|HS256 conn + sub JWT jobs:id| Studio Studio -->|WSS subscribe jobs:id| Beacon Node -->|WSS subscribe chunks:pk, nodes:pk, nodes:online| Beacon Sentinel -->|WSS publish jobs / chunks / nodes channels| Beacon Sentinel -->|POST /api/presence X-API-Key nodes:online| Beacon Beacon -->|broker pub/sub| Nats Beacon -->|presence state| Redis ```
## Two backends, two jobs: NATS and Redis Centrifugo is the front door, but two other services back it, and they do different things. The broker, the thing that fans a publication out to every subscriber, is **NATS**: beacon's config sets the broker `type` to `nats` with url `nats://wowlab-nats.internal:4222`. The presence manager, the thing that tracks who is currently subscribed to a channel, is **Redis**: the same config sets the presence-manager `type` to `redis`, with the address injected from `CENTRIFUGO_PRESENCE_MANAGER_REDIS_ADDRESS`. So both NATS and Redis back beacon, for different subsystems. NATS is the message broker on the realtime hot path; it is its own Fly app co-located with beacon, pinned to region `lhr`. Redis is the presence store; it is external to the repo's deploy directory, so only the env var name lives here. This split is why the presence question has two answers: ask "what fans out messages?" and it is NATS; ask "who is online?" and it is Redis. ## The channel namespace Centrifugo declares three namespaces, `nodes`, `jobs`, and `chunks`, and every channel is one of these prefixes plus an id. The full set, with who publishes and who listens: | Channel | Direction | Publisher | Subscriber | | ------------------------ | ---------------- | ------------------------------------------- | -------------------------------------- | | `chunks:{nodePublicKey}` | server → node | sentinel scheduler | node; browser node | | `nodes:{nodePublicKey}` | server → node | sentinel presence | node | | `nodes:all` | server → clients | sentinel presence | studio fleet UI | | `nodes:online` | presence | (no publish; join/leave) | node + studio; sentinel reads via HTTP | | `jobs:{jobId}` | server → browser | sentinel scheduler and chunk-complete route | studio | | `jobs:all` | server → clients | sentinel chunk-complete route | admin overview UI | `chunks:{nodePublicKey}` is the work pipe: the sentinel publishes a `RuntimeChunkPayload` to a specific node's channel and only that node is subscribed to it. `jobs:{jobId}` is the progress pipe back to the browser. `nodes:online` is special. Nobody publishes to it; a node _appears_ in it by subscribing with `join_leave(true)`, and that subscription is what makes the node show up in Centrifugo presence. The sentinel does not subscribe to `nodes:online` over WebSocket; it reads the roster over the HTTP API (see below). All publication funnels through one method: `ServerState::publish(channel, payload)` serializes to JSON, then calls `centrifuge.publish` with up to `PUBLISH_RETRIES = 3` retries, backing off only on temporary errors: ```rust for attempt in 0..PUBLISH_RETRIES { // #t(clone_in_loop) data cloned for retry attempts, bounded by PUBLISH_RETRIES match self.centrifuge.publish(channel, data.clone()).await { Ok(()) => return, Err(e) if e.is_temporary() && attempt + 1 < PUBLISH_RETRIES => { tracing::debug!( error = %e, channel, attempt = attempt + 1, "Publish failed, retrying" ); tokio::time::sleep(PUBLISH_RETRY_DELAY).await; } Err(e) => { tracing::warn!(error = %e, channel, "Failed to publish"); return; } } } ``` Every `jobs:*`, `chunks:*`, and `nodes:*` message the sentinel sends goes through that one funnel. ## How presence is actually read The sentinel keeps the database `nodes.status` in sync with reality on a 30-second cron. The presence job asks Centrifugo who is online, but over the HTTP API, not a subscription. It sends `POST /api/presence` with the `X-API-Key` header for channel `nodes:online` and parses the returned user set into node public keys: ```rust pub async fn get_online(&self, channel: &str) -> Result, Error> { let resp = get_http_client() .post(format!("{}/api/presence", self.url)) .header("X-API-Key", self.api_key.expose()) .json(&json!({ "channel": channel })) .send() .await .map_err(|e| Error::Http(e.to_string()))?; ``` It then diffs that set against the DB online set and writes the difference back. Nodes that newly appear get marked online; nodes that vanished get marked offline, and each change publishes a `nodes:all` and `nodes:{pk}` update plus a Discord notification. The HTTP API key here is a third secret, distinct from the JWT secret below: `CENTRIFUGO_HTTP_API_KEY` on beacon, configured as `SENTINEL_CENTRIFUGO_KEY` on the sentinel. ## Two token systems Connecting to beacon and subscribing to a channel both require a JWT. There are two distinct minting paths. **Connection and subscription JWTs are HS256**, signed with a single shared HMAC secret, `CENTRIFUGO_CLIENT_TOKEN_HMAC_SECRET_KEY` on beacon. The sentinel mints its own connection token for subject `"sentinel"` and the node beacon tokens it hands out on register and refresh, all through `token::generate(subject, secret)` with a one-day TTL. The browser path is stricter because a user must not be able to watch another user's job. The studio server route `GET /api/realtime/job-token?jobId=` first requires an authenticated Supabase user, then verifies that user _owns_ the requested job by checking `jobs.user_id`, and only then mints **two** HS256 tokens: a connection token and a per-channel subscription token whose `channel` claim scopes the browser to exactly one `jobs:{id}` channel. ```typescript const [connectionToken, subscriptionToken] = await Promise.all([ mintConnectionToken(userId), mintSubscriptionToken(userId, channel), ]); ``` These browser tokens carry a 10-minute TTL, far shorter than the node tokens' day. The Centrifugo namespace config enforces the gate from the other side: the `jobs` namespace allows publish for clients but requires a subscription JWT to subscribe, so a forged or missing subscription token cannot join a `jobs:` channel. This is the trade chosen over a chattier model: rather than have the sentinel authorize every subscription, ownership is checked once at token-mint time and the short TTL bounds how long a leaked token is useful. **Node request signing is Ed25519, and it is a separate system.** It secures the sentinel's HTTP API, not the realtime bus. That belongs to [hosted compute](/dev/bible/distribution/hosted-compute), where the node protocol lives. The two systems never overlap: HS256 JWTs let you onto beacon; Ed25519 signatures let you call the sentinel. ## The Rust client Both the sentinel and the node speak to beacon through `crates/centrifuge`, a Rust port of centrifuge-js over WebSocket with the protobuf protocol. Its `Client` spawns a self-reconnecting task with full-jitter backoff and refreshes the connection JWT through a `get_token` callback both at connect and mid-session before the TTL expires. On the node side, `get_token` is wired to `sentinel.refresh_token()`, so a node's beacon token auto-refreshes through the sentinel without the node holding the HMAC secret. Once connected, the node opens its three subscriptions in one place: its config channel, its work channel, and the presence channel with `join_leave(true)`: ```rust let mut node_sub = client .subscribe(SubscriptionConfig::new(format!("nodes:{public_key}"))) .await?; let mut chunks_sub = client .subscribe(SubscriptionConfig::new(format!("chunks:{public_key}"))) .await?; let mut presence_sub = client .subscribe(SubscriptionConfig::new("nodes:online").join_leave(true)) .await?; ``` The browser side is the mirror of this. It passes its `getToken` callbacks straight to the `Centrifuge` JS client, which fetches a fresh connection and subscription token from the ownership-checked route whenever it needs one. ## Hosted Compute A simulation job is too big to run in one place, so it is split into chunks and farmed out to whatever compute is online: a hosted Fly node, a contributor's desktop, or a browser tab. The sentinel is the scheduler that does the splitting and assigning; a node is anything that can run the engine and sign its results. This page is the full hosted loop: how the sentinel picks up a job, hands a chunk to a node, verifies what comes back, and recovers when a node disappears. This page expands the **Nodes** box of the system-context diagram. The figure below is the Zoom-1 `hosted-compute-flow` figure; this page also owns the Zoom-2 `node-state` view of its **node** box and the Zoom-2 `chunk-claim-lifecycle` view of its **claim** box.
```mermaid flowchart TB Scheduler["sentinel scheduler (LISTEN pending_job)"] Assign["assign_chunks: claim batch, build RuntimeChunkPayload"] Node["node subscribe chunks:pk"] Fetch["GET /jobs/id/work_context (signed)"] Pool["WorkerPool simulate_intent per item"] Sign["sign body (Ed25519)"] Complete["POST /chunks/complete (protobuf)"] Runtime["runtime.complete -> CompletionOutcome"] Scheduler --> Assign Assign -->|publish chunks:pk| Node Node --> Fetch Fetch -->|baseSimConfig, tournamentPayload, sentinelConfig| Pool Pool --> Sign Sign --> Complete Complete --> Runtime ```
## The scheduler: from NOTIFY to assignment The sentinel runs four long-lived tasks in one process: the Discord bot, the scheduler, the cron runner, and the HTTP server, multiplexed with `tokio::select!`. The scheduler is the one that turns jobs into work. It picks up jobs through Postgres `LISTEN/NOTIFY`. `listen_and_assign` connects a `PgListener` and listens on the channel `pending_job` (singular). When a NOTIFY arrives it debounces 50 ms and runs `process_pending`; it also runs `process_pending` on a 30-second timeout as a safety net so a missed NOTIFY never strands a job. `process_pending` fetches pending jobs and hands them to `assign_chunks`, which is where a job becomes chunks: 1. For each fetched job it parses the sentinel config, builds an in-memory `JobRuntime`, inserts it into the runtime store, and flips the DB row `pending → running`. The split strategy depends on the config: `"single"` uniform splits at `DEFAULT_TARGET_CHUNK_ITERATIONS = 50_000`, `"tournament"` builds laddered phases, `"stat_weights"` builds a baseline plus four perturbation runs. 2. It fetches the online nodes (`status = 'online'`, joined to Discord identities) and computes each node's capacity as `min(max_parallel, total_cores)`. 3. It sorts jobs by priority and, per job, picks the best eligible node by `(eligibility priority, available capacity)`. The `Priority` ladder is `Public = 1 < Friends = 2 < Discord = 3 < Own = 4`, higher wins. 4. It claims a batch from the runtime with `runtime.claim_batch(node_key, max_items, now)` and publishes a `RuntimeChunkPayload` to that node's `chunks:{public_key}` channel. If the job was newly running, it also publishes a `jobs:{id}` "running" progress event so the browser sees it start. ## The node protocol A node is event-driven and polled every 100 ms by its host binary. Its lifecycle:
```mermaid stateDiagram-v2 [*] --> Setup Setup --> Registering: "NODE_CLAIM_TOKEN present" Setup --> Verifying: "no claim token" Verifying --> Running: "refresh_token OK" Verifying --> NotFound: "not found / not claimed" Verifying --> Unavailable: "other error (backoff)" Registering --> Running: "register OK, start_realtime" Registering --> Unavailable: "register error (backoff)" Registering --> Setup: "register fail (invalid token / 401)" Unavailable --> Verifying: "retry (already registered)" Unavailable --> Registering: "retry (not yet registered)" Running --> [*] ```
A node with a `NODE_CLAIM_TOKEN` goes straight to `Registering`; one without it goes to `Verifying` to check it was already claimed. The state machine is a flat enum, matched exhaustively by both host binaries: ```rust #[derive(Clone, Debug)] // #t(non_exhaustive_on_public) internal state enum matched exhaustively in node-headless and node-gui pub enum NodeState { Setup, Verifying, Registering, Running, NotFound, Unavailable, } ``` On reaching `Running`, `start_realtime` fetches a beacon token and subscribes to **three** channels: `chunks:{pk}` for work, `nodes:{pk}` for config updates, and `nodes:online` with join/leave for presence. When a chunk arrives on `chunks:{pk}`, `process_chunk` does two things. First it fetches the work context, the heavy shared part of the job (base sim config, tournament payload, sentinel config), through a signed `GET /jobs/{id}/work_context?hash=&claim_token=`, cached per chunk so the same job's later chunks reuse it. Then it builds a `WorkBatch` and hands it to the `WorkerPool`. The pool is a tokio `Semaphore` bounded by the node's enabled core count; each work item runs `SimRunner::run_item`, which derives the per-item config and calls `simulate_intent` over a `SupabaseResolver`. This is the same [orchestration entry](/dev/bible/distribution/orchestration) the browser uses, just driven by a Rust resolver instead of a JS one. ## Signing and completion Every node-to-sentinel HTTP request is Ed25519-signed; the result POST is no exception. The node's `RequestSigner` signs with its persisted 32-byte keypair over a canonical message that both sides build the same way: ```rust pub fn build_sign_message( timestamp: u64, method: &str, host: &str, path: &str, body: &[u8], ) -> String { let body_hash = sha256_hex(body); format!("{timestamp}\0{method}\0{host}\0{path}\0{body_hash}") } ``` The signature, public key, and timestamp travel in the `X-Node-Sig`, `X-Node-Key`, and `X-Node-Ts` headers. The sentinel's `verify_node` middleware rebuilds the same message, rejects a clock skew over 300 s, checks the 64-byte signature against the public key, and on success attaches a `VerifiedNode` extension. It runs over _all_ node-API routes, not just completion, so there is no chunk-specific check. `POST /chunks/complete` carries a protobuf `BatchChunkCompletion` body, not JSON. The body itself is not separately signed; integrity comes from the Ed25519 signature over the body plus the `claim_token`/`work_context_hash` matched against the live in-memory claim. The handler decodes the protobuf, validates the 32-byte hash, maps results, and calls `runtime.complete(...)`, which returns a `CompletionOutcome`: | `CompletionOutcome` | Meaning | HTTP | | --------------------------- | ------------------------------------------------------------------- | ---- | | `Accepted { job_complete }` | Results recorded; `job_complete` true triggers finalize | 200 | | `Idempotent` | Duplicate of an already-accepted completion (same token, same hash) | 200 | | `Conflict(msg)` | Token matched a prior completion but with different results | 409 | | `Stale(msg)` | No matching active claim (already reclaimed, expired, or unknown) | 410 | `Accepted { job_complete: false }` publishes a progress tick to `jobs:{id}`; `job_complete: true` runs `finalize_job`, which writes `result_pb`/`timeline_pb` to Postgres via `jobs_finalize.sql`, removes the runtime from the store, and publishes a "completed" event to both `jobs:{id}` and `jobs:all`. The `Conflict` and `Stale` outcomes are the idempotency guard: a node that retries after a timeout, or one that completes a chunk already reclaimed to someone else, gets a clean 409/410 instead of corrupting the aggregate. ## The claim lifecycle and reclaim A claim is the sentinel's record that a specific node holds a specific chunk. It is created at assignment, resolved at completion, and swept by a cron if it goes stale:
```mermaid stateDiagram-v2 [*] --> Claimed: "claim_batch (node, hash, items)" Claimed --> Accepted: "complete --> Accepted" Claimed --> Idempotent: "complete --> Idempotent / Conflict / Stale" Claimed --> Reclaimed: "reclaim_stale (timeout exceeded)" Reclaimed --> Claimed: "re-enqueue (under retry limit)" Reclaimed --> Failed: "max attempts exceeded" Accepted --> [*] Idempotent --> [*] Failed --> [*] ```
The `claim_batch`, `complete`, and `reclaim_stale` methods all live on the in-memory `JobRuntime`. Each in-flight chunk is one `ClaimRecord`: ```rust #[derive(Debug, Clone)] pub struct ClaimRecord { pub chunk_id: RuntimeChunkId, pub node_public_key: String, pub work_context_hash: WorkContextHash, pub work_items: Vec, pub claimed_at_ms: u64, pub reclaim_count: u32, } ``` The reclaim cron runs once a minute: it sweeps every runtime job and calls `runtime.reclaim_stale(now, timeout, max_attempts)` with a default 5-minute timeout and 3 attempts. A claim past its timeout but under the retry limit is re-enqueued for another node; one past `max_attempts` logs a failure event; a job whose claims have all permanently failed is marked `Failed` in the DB and publishes a "failed" event to `jobs:{id}` and `jobs:all`. This is the recovery path for the `Failed` chunk edge in [orchestration](/dev/bible/distribution/orchestration): a node that crashes mid-chunk simply stops reporting, its claim ages out, and the work moves on. ## Burst compute Online nodes are not always enough. The sentinel can rent bare-metal capacity from [Latitude.sh](https://www.latitude.sh) when the queue backs up. A `queue_depth` cron sums the in-flight chunk count across runtime jobs into a `pending_chunks_gauge`, and the `BurstScheduler` cron reads that gauge to decide how many burst nodes to run. It computes a target from pending depth, floor capacity, and per-node throughput, clamps it to `burst_max_nodes` (default 5), and reconciles: scale up provisions a server via the Latitude REST API; scale down kills only nodes that are below target _and_ older than 55 minutes, to avoid paying for a fresh hourly server it just started. A provisioned burst node boots from a cloud-init template that joins the headscale tailnet and runs the node container, picking up the same claim-token registration path as any other node. The infrastructure side of this, Fly, Cloudflare, headscale, and Latitude, is the next page, [deployment](/dev/bible/distribution/deployment). ## Why the runtime store is in memory The whole in-flight picture, which chunks exist, who claims them, partial results, lives only in the sentinel's process memory; only the final `result_pb`/`timeline_pb` ever reach Postgres. The cost is real: if the sentinel restarts, that state is gone, so on boot it fails any job left `running` and logs it as dropped. The benefit is that the hottest loop in the system, claim, complete, reclaim, every few seconds across many jobs, never touches the database. The [database chapter](/dev/bible/distribution/database) examines this trade-off and the schema split it produces. ## Deployment The platform runs on three hosts. The two Next.js apps run on Cloudflare's edge. The backend services all run on Fly.io in one region: scheduler, realtime bus, broker, log shipper, mesh control plane, and the standing compute fleet. The database is Supabase. Everything else hangs off those. This page expands the **Infra** box of the system-context diagram. The figure below is the Zoom-1 `fly-topology` figure.
```mermaid flowchart TB subgraph fly["Fly.io (region lhr)"] Sentinel["wowlab-sentinel"] Beacon["wowlab-beacon"] Nats["wowlab-nats"] Headscale["wowlab-headscale"] Alloy["wowlab-alloy"] FlyNode["wowlab-node"] end subgraph cf["Cloudflare Workers"] Studio["wowlab-studio (app.wowlab.gg)"] Landing["wowlab-landing (wowlab.gg)"] end Supabase[(Supabase Postgres)] Redis[(Redis - external)] Burst["Latitude burst node"] Grafana["Grafana Cloud"] Beacon -->|nats internal:4222| Nats Beacon -->|presence| Redis Sentinel -->|SENTINEL_DATABASE_URL| Supabase Sentinel -->|WSS publish| Beacon FlyNode -->|wss beacon.wowlab.gg| Beacon FlyNode -->|https sentinel.wowlab.gg| Sentinel FlyNode -->|https api.wowlab.gg| Supabase Burst -->|tailscale login-server fleet.wowlab.gg| Headscale Alloy -->|scrape /metrics| Sentinel Alloy -->|scrape /metrics| Beacon Alloy -->|remote_write| Grafana ```
## The six Fly apps Every backend service is its own Fly app, and all of them are pinned to `lhr` (London) so the realtime hot path, scheduler, broker, and bus, stays intra-region: | Fly app | Image | Internal port | Role | | ------------------ | --------------------------------- | ------------- | ------------------------------------------- | | `wowlab-sentinel` | `ghcr.io/legacy3/wowlab-sentinel` | 8080 | Scheduler, HTTP API, Discord bot, MCP, cron | | `wowlab-beacon` | `centrifugo/centrifugo:v6` | 8000 | Centrifugo realtime bus | | `wowlab-nats` | `nats:2.12-scratch` | 4222 | Message broker for beacon | | `wowlab-headscale` | `headscale/headscale:stable` | 8080 | Tailscale control plane (`fleet.wowlab.gg`) | | `wowlab-alloy` | `grafana/alloy:latest` | n/a | Metrics/log shipper | | `wowlab-node` | `ghcr.io/legacy3/wowlab-node` | n/a | Standing compute fleet | The app names and region come straight from each app's `fly.toml`. The realtime three, sentinel, beacon, and nats, all set `min_machines_running = 1` and turn auto-stop off, because a scheduler or bus that scaled to zero would drop the live loop. beacon and nats are co-located deliberately: nats is the broker beacon fans messages through, so the latency between them is on the critical path. The sentinel image is a distroless `cc-debian13:nonroot` carrying a prebuilt binary, nothing else in the container. Redis, the fourth backing service from the [realtime chapter](/dev/bible/distribution/realtime), is _not_ a Fly app. It is external, referenced only by the `CENTRIFUGO_PRESENCE_MANAGER_REDIS_ADDRESS` env var on beacon. So the realtime tier is five Fly apps plus one external Redis. ## The edge: Cloudflare and Supabase The user-facing apps do not run on Fly. `studio` (the authenticated app) and `landing` (the marketing site) are Next.js apps deployed to Cloudflare Workers via OpenNext, on custom domains `app.wowlab.gg` and `wowlab.gg`. Both use `nodejs_compat`, an R2 bucket for the OpenNext cache, and Durable Objects for queue and tag-cache handling. A third worker, `og`, generates Open Graph images. The studio app is the only one that holds the realtime job-token HMAC secret, since it is what mints browser subscription tokens. The [realtime chapter](/dev/bible/distribution/realtime) covers that minting. The database is Supabase Postgres, reached two ways. The sentinel connects directly via `SENTINEL_DATABASE_URL`, with `sslmode=require` forced on, for the `LISTEN/NOTIFY` loop and SQL. Nodes use Supabase PostgREST at `https://api.wowlab.gg` for read-only game data. The [database chapter](/dev/bible/distribution/database) covers the schema. ## The mesh: headscale and burst Standing Fly nodes reach the sentinel, beacon, and Supabase over the public internet with signed requests. Burst nodes and externally-provisioned boxes get a private mesh instead. `wowlab-headscale` is a self-hosted Tailscale control plane serving `fleet.wowlab.gg`, backed by a persistent volume for its sqlite database. The whole isolation model lives in the ACL policy. The `admin` group reaches everything and can SSH as root into any `tag:node` machine, but nodes get no peer ACL at all, so they cannot reach each other or the admin box over the tailnet: ```jsonc "acls": [ // Admins reach everything. Nodes have no peer ACL, so they can't // reach each other or the admin box over the tailnet. { "action": "accept", "src": ["group:admin"], "dst": ["*:*"], }, ], ``` That is the point: a contributor's burst box can be reached for admin and can reach out to the sentinel, but it is not a lateral path to anything else. A burst node provisioned by the sentinel's `BurstScheduler` (see [hosted compute](/dev/bible/distribution/hosted-compute)) boots from a cloud-init template that writes its claim token and node name, installs Docker and Tailscale, joins the tailnet at `https://fleet.wowlab.gg` with `--advertise-tags=tag:node`, runs the node container as a systemd service, and locks down SSH and the firewall to tailscale-only ingress. The template ships with `__CLAIM_TOKEN__` and `__TS_AUTHKEY__` placeholders that `render_user_data` fills at provision time before base64-encoding the result for cloud-init: ```rust fn render_user_data(claim_token: &str, ts_authkey: &str) -> String { use base64::Engine; let template = include_str!("../../../../deploy/node/latitude-userdata.yaml.tmpl"); let rendered = template .replace("__CLAIM_TOKEN__", claim_token) .replace("__TS_AUTHKEY__", ts_authkey); base64::engine::general_purpose::STANDARD.encode(rendered.as_bytes()) } ``` A one-time manual provisioning script, `ubuntu-setup.sh`, does the same for a box you set up by hand. ## Observability `wowlab-alloy` runs Grafana Alloy and scrapes the internal Fly DNS of the sentinel and beacon `/metrics` endpoints every 30 seconds, then `remote_write`s to Grafana Cloud. Both the sentinel and beacon expose Prometheus metrics. The sentinel installs its recorder at startup and beacon has Prometheus enabled in its config. Keeping alloy in `lhr` is what lets it reach those services over `*.internal` DNS rather than the public network. ## Database There is one Postgres database, on Supabase, with two very different jobs. The `game` schema is the read-only reference data the engine consumes: spells, items, specs, scaling curves. The `public` schema is the operational state of the platform: who submitted which job, which nodes are online, and the final results. The first is read heavily and written rarely; the second is the system of record for everything except the part of the job lifecycle that is hottest. ## Two schemas The `game` schema holds the parsed DBC data. Nodes and browsers read it through Supabase PostgREST behind the resolver's [3-layer cache](/dev/bible/game-data/data-resolution); the sentinel does not touch it on the hot path. Its tables are `game.spells`, `game.items`, `game.specs`, `game.specs_traits`, `game.power_types`, and the scaling tables. How they map to the engine's flat types is the subject of the [game data section](/dev/bible/game-data/dbc-overview). For distribution, what matters is that game data is effectively static between content patches, so it caches well and never bottlenecks the scheduler. The `public` schema is the operational state. The sentinel connects to it directly over `SENTINEL_DATABASE_URL` for its `LISTEN/NOTIFY` loop and its SQL queries. Two tables carry the distribution system. ### public.jobs A job is one simulation request. Its columns, as read and written by the sentinel's queries, are `id`, `user_id`, `sim_config`, `sentinel_config`, `meta` (jsonb), `status`, `result_pb`, and `timeline_pb`. The lifecycle in the DB is narrow: - The studio app inserts a row at `status = 'pending'`; a database trigger fires `NOTIFY pending_job`, which is what wakes the [scheduler](/dev/bible/distribution/hosted-compute). - The scheduler flips `pending → running` via `scheduler_mark_running.sql` when it builds the in-memory runtime. - `finalize_job` writes the terminal state through `jobs_finalize.sql`: `status = 'completed'`, the `meta` jsonb, and the protobuf `result_pb`/`timeline_pb`. `timeline_pb` is written with `COALESCE` so it is only overwritten when present. - On failure the reclaim cron sets `status = 'failed'` via `reclaim_mark_failed.sql`. That is the entire DB footprint of a job: pending, running, then completed or failed. The browser reads the finished row back. `result_pb` and `timeline_pb` are hex-encoded bytea that the common WASM `decodeJobResult` turns into view structs for the [results UI](/dev/bible/portal/simulation-ui). ### public.nodes A node is one registered compute contributor. Its columns are `public_key`, `user_id`, `name`, `total_cores`, `max_parallel`, `platform`, `version`, and `status`. The `public_key` is the node's base64url Ed25519 identity, the same key it signs requests with. `total_cores` and `max_parallel` are what the scheduler turns into capacity (`min(max_parallel, total_cores)`), and `status` is what the [presence cron](/dev/bible/distribution/realtime) reconciles against Centrifugo presence: a node listed online in the bus but offline in this table, or vice versa, gets written to match. The scheduler reads online nodes by joining this table to `auth.identities` for the contributor's Discord id. ## The chunk that is never in the database Notice what is missing: there is no `public.chunks` table. A job is split into chunks, chunks are claimed by nodes, partial results accumulate, claims go stale and get re-enqueued, and **none of that touches Postgres**. The entire in-flight picture lives in the sentinel's process memory, one `JobRuntime` per active job: ```rust #[derive(Debug)] pub struct JobRuntime { pub job_id: Uuid, pub user_id: Uuid, pub strategy_name: String, pub base_sim_config: String, pub sentinel_config: String, /// Encoded `TournamentPayload`, empty for non-tournament jobs. pub payload_bytes: Vec, pub work_context_hash: WorkContextHash, pub status: JobRuntimeStatus, /// Scheduling priority, higher first. pub priority: i32, strategy: RuntimeStrategyState, in_flight: HashMap, pending_reclaims: VecDeque, recent_completions: CompletionLog, failed_items: u32, next_chunk_id: u64, next_item_id: u64, } ``` Every field there is volatile process state. The `in_flight` claims, the `pending_reclaims` queue, the strategy's partial aggregates: all of it lives in this struct and nowhere else. The database sees only the two endpoints of a job, the `pending` insert and the `completed`/`failed` finalize. Everything between, `claim_batch`, `complete`, `reclaim_stale`, and the partial Welford aggregates, runs entirely in memory against this struct. This is the in-memory-runtime-store-vs-DB-of-record trade, taken deliberately. The hottest loop in the platform is the claim/complete/reclaim cycle: many jobs, each with many chunks, completing every few seconds. Routing that through Postgres would mean a write per claim, a write per completion, and a sweep query per minute, all on the connection that also carries `LISTEN/NOTIFY`. Keeping it in memory makes each of those operations a hash-map mutation instead of a round trip. The cost is durability. The runtime store is single-process and volatile: if the sentinel restarts, every in-flight job's chunk state is lost. The system handles this honestly rather than pretending otherwise. On boot the sentinel fails every job still marked `running` from the prior process and logs each as `restart_dropped`: ```rust match sqlx::query_file!("queries/scheduler_fail_running_on_restart.sql") .fetch_all(&state.db) .await { Ok(rows) => { for row in &rows { if let Err(e) = sqlx::query_file!( "queries/chunk_event_log_insert.sql", row.id, "restart_dropped", Option::::None, Option::::None, Some("sentinel restart dropped in-flight runtime state"), ) .execute(&state.db) .await { tracing::error!(job_id = %row.id, error = %e, "Failed to log restart_dropped"); } } if !rows.is_empty() { tracing::warn!(count = rows.len(), "Failed running jobs on restart"); } } Err(e) => tracing::error!(error = %e, "Failed to fail running jobs on restart"), } ``` A job interrupted by a restart is not silently abandoned and is not magically resumed. It is failed cleanly, and the user can resubmit. That is the right shape for the trade: the database is the source of truth for what a job _was_ and what it _finally produced_, and the in-memory store is the source of truth only for what is happening _right now_. The two never disagree about a completed job, because a job only becomes `completed` in Postgres after the runtime has finalized it. A second consequence is that the sentinel is, for in-flight work, a single point of failure with no horizontal scaling. There is one runtime store, in one process. For the current scale that is acceptable, and it is why the sentinel's Fly app keeps `min_machines_running = 1` with auto-stop off (see [deployment](/dev/bible/distribution/deployment)). If that ceiling is ever a problem, the honest fix is to move the runtime store to a shared backing store, not to add a second sentinel against the same in-memory state. ## Portal Architecture The portal is the studio app at `app.wowlab.gg`. It is a Next.js App Router project that renders mostly static, server-rendered shells and then mounts a small number of islands, focused `"use client"` providers, only where interactive features (state, queries, the WASM engine, the compute node) actually need them. Everything else stays a server component. I lean on this split because the heavy machinery is expensive to boot and pointless on a marketing or docs page: the engine compiled to WebAssembly, the React Query cache, the Redux store. Wrapping it in islands keeps that cost off the pages that do not need it. ## App Router and locale routing Routes live under `apps/studio/src/app/[locale]`. The `[locale]` segment is the first dynamic param; `generateStaticParams` enumerates the configured locales and the layout rejects anything outside that set with `notFound()`. Copy is wired through intlayer: the layout wraps children in `IntlayerServerProvider` and `IntlayerClientProvider`. Under the locale, route groups separate concerns without adding URL segments: - `(shell)`: the authenticated app shell (sidebar, header, the WASM and node islands). - `(shell)/(core)`: the user-facing features: `simulate`, `rotations`, `rankings`, `journal`, `plan`. - `(shell)/dev`: the docs and this bible, rendered from MDX. - `preview/`: standalone, `force-static` pages embeddable in an iframe. ## PPR is off, deliberately Partial Prerendering / `cacheComponents` is disabled. The config keeps it commented out with the reason inline: ```typescript // Keep cacheComponents/PPR off: it breaks on workerd (setTimeout trap) AND is // mutually exclusive with enableCacheInterception, which we enable in // open-next.config.ts. // cacheComponents: true, ``` It breaks on the Cloudflare `workerd` runtime, a `setTimeout` trap, and it is mutually exclusive with the OpenNext cache interception the app relies on. So the honest statement is: the portal does not use PPR. It does use the React Compiler and OpenNext to deploy to Cloudflare Workers. What it leans on instead of PPR is `generateStaticParams` for the MDX routes, so the bible and docs prerender every slug, plus per-page `Cache-Control` headers: immutable for the hashed `.wasm` files, short-lived for `manifest.json`, `stale-while-revalidate` for the `preview` pages. ## The island providers Two layers compose the providers. The locale layout mounts the always-on ones; the shell layout adds the heavy ones.
```mermaid flowchart TB Locale["[locale]/layout.tsx"] Theme["ThemeIsland (next-themes)"] Redux["ReduxIsland (Redux store)"] Query["QueryIsland (React Query)"] Auth["AuthEventsIsland (Supabase auth)"] Shell["(shell)/layout.tsx"] Search["SearchIsland"] Wasm["WasmIsland (engine + common WASM)"] Node["NodeIsland (worker pool)"] AppShell["AppShell + page"] Locale --> Theme --> Redux --> Query --> Auth --> Shell Shell --> Search --> Wasm --> Node --> AppShell ```
The locale layout nests `ThemeIsland` → `ReduxIsland` → `QueryIsland` → `AuthEventsIsland` around its children. Each is a thin `"use client"` wrapper: `QueryIsland` is a `QueryClientProvider` over a lazily-built singleton client; `ReduxIsland` is `react-redux`'s `Provider` over the simulator store; `ThemeIsland` wraps `next-themes`. The shell layout adds the expensive ones, nested `SearchIsland` → `WasmIsland` → `NodeIsland`. This placement matters: only the authenticated shell pays for the WASM engine and the worker pool, and `NodeIsland` lives inside `WasmIsland` because the node connection needs the loaded modules. A handful of islands are scoped even more tightly. `NuqsIsland`, the `nuqs` URL-state adapter, is not mounted globally. It is wrapped locally around the components that read query params, such as `UrlTabs`. ### WasmIsland gating `WasmIsland` is the gate for everything that calls into the engine. It does three things before exposing the modules. First it waits for client mount via `useSyncExternalStore` so the server never tries to load WASM. Then it feature-detects WebAssembly by instantiating a minimal module, the eight magic bytes of an empty `.wasm`: ```typescript const wasmModule = new WebAssembly.Module( Uint8Array.of(0x00, 0x61, 0x73, 0x6d, 0x01, 0x00, 0x00, 0x00), ); ``` Only then does it load both bundles under a `Suspense` boundary, running `useSuspenseQuery` for `getEngine` and `getCommon`. The loaded `{ common, engine }` pair is published on a context read by `useEngine()` and `useCommon()`, both of which throw if used outside the island. The crossing into WebAssembly is the subject of its own section in [the WASM boundary](/dev/bible/distribution/wasm-boundary); here it is just one more provider. This is the parent of the per-feature views the rest of this section covers: [state management](/dev/bible/portal/state-management), [the rotation editor](/dev/bible/portal/rotation-editor), [the simulation UI](/dev/bible/portal/simulation-ui), and [the content system](/dev/bible/portal/content-system) that renders this very page. ## State Management The portal does not have one state store. It has several, each chosen for a different job: React Query owns server data, Zustand owns client state (both the persisted simulator form and the rotation editor's local document), RxDB + TanStack DB back the offline game-data cache, and Supabase is the data-access client underneath the queries. The rule of thumb is simple: server data goes through React Query, local client state goes through Zustand, and the bulk immutable game data lives in the RxDB/TanStack DB cache. I want to be upfront that this is more than one library would be in a smaller app. The split exists because these kinds of state really do have different lifetimes and access patterns; folding them into one store would mean fighting that. ## React Query: server data Every read of database-backed data goes through `@tanstack/react-query`. The client is a lazily-constructed singleton, configured once with caching turned all the way up: ```typescript queryClient = new QueryClient({ defaultOptions: { queries: { gcTime: Infinity, retry: false, staleTime: Infinity, }, }, }); ``` `staleTime` and `gcTime` are both `Infinity` and `retry` is off. Caches are kept forever and never silently refetched, because the underlying game data is effectively immutable per patch and a failed query should surface, not loop. Query services live under `apps/studio/src/lib/query/services/`, grouped by domain: `jobs`, `rankings`, `billing`, `fleet`, and a `game/` subtree for specs, items, spells, and rotations. The one place that deviates is the live job poll. `useJob` sets a `refetchInterval` of 3000 ms while the job's status is `pending` or `running`, and `false` otherwise. That polling is a fallback; the realtime channel (covered in [the simulation UI](/dev/bible/portal/simulation-ui)) is the primary live path. ## Zustand: the simulator form The cross-page simulator form is a single Zustand store, `useSimulatorStore` (`apps/studio/src/lib/state/simulator-store.ts`), built with `create()` and the `persist` middleware. It holds exactly the fields that must survive navigation between the import step, the configure step, and the results page: `simcInput`, `specId`, `rotationId`, `iterations`, `settingOverrides`, and a capped list of `recentSimcProfiles`. Iterations are clamped to `[1000, 1_000_000]` in `setIterations` so a hand-edited value can never blow up a run, and `addRecentSimcProfile` de-duplicates and caps the list at eight entries. | Field | Type | Purpose | | -------------------- | ------------------- | ---------------------------------------------- | | `simcInput` | `string` | Raw pasted SimulationCraft profile | | `specId` | `number \| null` | Spec detected from the talent loadout | | `rotationId` | `string \| null` | Selected rotation for the run | | `iterations` | `number` | Clamped to `[1000, 1_000_000]` | | `settingOverrides` | `Record` | Per-setting overrides (bool / number / string) | | `recentSimcProfiles` | `string[]` | Last 8 pasted profiles | Persistence is the `persist` middleware, not a hand-rolled scheme. It writes a versioned envelope `{ state, version }` to `localStorage` under `wowlab:portal:simulator`, `partialize` selects the six fields above, and `version` guards the stored shape so a future schema change can migrate or discard stale data. Components read with selectors (`useSimulatorStore((s) => s.iterations)`) and call actions off the store; there is no provider, because Zustand stores are module singletons. ## Zustand: the rotation editor document The rotation editor also uses Zustand, but instantiates it differently, and the reason is lifetime. The editor's state is the in-progress rotation document plus undo/redo history, dialog state, the live trace, and validation results, all of which should reset when you leave the editor, not persist across the app. So instead of a module singleton it is a **context-provided, per-mount** store: `EditorStoreProvider` creates the stores with `createStore` and tears them down on unmount, so navigating away discards the document and its history for free. It is split into two stores. The **document store** holds the undoable state — the `EditorScript` plus its `metadata` draft — and is wrapped in the [`zustand-travel`](https://github.com/mutativejs/zustand-travel) middleware, which owns undo/redo as a patch history (`maxHistory` of 50, manual `archive()` on each discrete edit and on metadata commit, `reset()` back to the loaded baseline). The transient **UI store** holds dialog, selection, trace, validation, and the new-rotation setup flow — state that must not be undoable, and so is kept out of the tracked store. The document at the center of all this is the `EditorScript`: ```typescript export type EditorScript = { version: number; name: string; variables: Record; actions: ActionEntry[]; lists: Record; }; ``` The editor itself is the subject of [the next page](/dev/bible/portal/rotation-editor). ## RxDB + TanStack DB: the game-data cache Bulk game reference data — spells, items, specs, talent trees, scaling tables — is not fetched per-view through React Query. It is synced once into an offline-first store under `apps/studio/src/lib/game-data/`: RxDB (backed by Dexie/IndexedDB) is the persistence layer, and TanStack DB collections wrap it to expose reactive live queries to components. Because this data is immutable per patch, it is loaded ahead of time and read locally, which is why the React Query client above can keep its caches forever without refetching. ## Supabase: data access All of the React Query services that hit the database use the browser Supabase client. It is created with `createBrowserClient` from `@supabase/ssr`: ```typescript let client: ReturnType> | undefined; export function createClient() { client ??= createBrowserClient( env.SUPABASE_URL, env.SUPABASE_PUBLISHABLE_KEY, ); return client; } ``` It pulls the public env (`SUPABASE_URL`, `SUPABASE_PUBLISHABLE_KEY`); auth cookies are host-only, scoped to the studio app itself. Reads are plain PostgREST calls. `useJob`, for instance, selects a row from `jobs` by id. Writes that need server logic go through RPCs, such as `supabase.rpc("create_job", …)` for job submission. The database schema behind these calls is documented in [the database section](/dev/bible/distribution/database). One thing the portal does **not** use is a higher-level CRUD framework. The data layer is React Query services calling the Supabase client directly; there is no admin/resource abstraction layer on top of it. ## Rotation Editor The rotation editor lets you build a spell-priority rotation without writing the JSON by hand. You arrange action lists, each a drag-sortable list of actions (cast a spell, call another list, set a variable, wait), and each action carries a condition tree that decides when it fires. The whole document lives in a context-provided pair of Zustand stores; as you edit, the engine validates it and runs a live preview, both over the WebAssembly boundary. ## The document store The editor's state is split across two Zustand stores, created per-mount by `EditorStoreProvider` and reached through the `useEditorDocument`, `useEditorUi`, and `useEditorHistory` hooks. The **document store** holds the undoable document — the editable `script` (an `EditorScript` with `actions`, named `lists`, and `variables`) and the `metadata` draft. The transient **UI store** holds everything that must not be undoable: dialog state, the current selection, the latest `trace`, the latest `validation`, and the new-rotation setup flow. Undo/redo is not hand-rolled. The document store is wrapped in the [`zustand-travel`](https://github.com/mutativejs/zustand-travel) middleware in manual-archive mode (`maxHistory` of 50): each discrete edit calls `archive()` to commit one history frame, a metadata draft is committed as its own frame on blur, and `reset()` returns the document to the baseline it was seeded with. `useEditorHistory` exposes `undo`/`redo`/`canUndo`/`canRedo` off the travel controls. A detail worth naming: every action in the editor carries a client-side `id` from `crypto.randomUUID()` so React keys and drag-and-drop are stable. The engine's `Rotation` type has no such id, so the store exposes `dehydrate()`, which strips the ids back out before the script crosses into WASM: ```typescript export function dehydrateScript(script: EditorScript): Rotation { return { ...script, actions: stripActionIds(script.actions), lists: Object.fromEntries( Object.entries(script.lists).map(([name, actions]) => [ name, stripActionIds(actions), ]), ), }; } ``` Hydrate-in, dehydrate-out is the contract between the editor's representation and the engine's. The page component seeds the stores from the route: it loads the rotation via React Query when editing an existing one, otherwise starts a new draft, and passes the resulting seed to `EditorStoreProvider`. The provider is keyed by rotation id, so switching rotations remounts it with fresh state and an empty history. ## Action lists and the condition tree The left pane is a list switcher plus the action list for the active list. The action list itself is drag-sortable via `@dnd-kit`: a `DndContext` with a `SortableContext` keyed on the stable action ids, and `arrayMove` on drag-end. The action types you can add map one-to-one onto the engine's `Action` variants, and the option list is typed against `Action["type"]` so a new engine variant cannot be left out: ```typescript export const ACTION_TYPES: { value: Action["type"]; label: string }[] = [ { label: "Cast Spell", value: "cast" }, { label: "Call List", value: "call" }, { label: "Run List", value: "run" }, { label: "Set Variable", value: "set_var" }, { label: "Modify Variable", value: "modify_var" }, { label: "Wait", value: "wait" }, { label: "Wait Until", value: "wait_until" }, { label: "Pool", value: "pool" }, { label: "Use Trinket", value: "use_trinket" }, { label: "Use Item", value: "use_item" }, ]; ``` Each action's `condition` is a tree of `Condition` nodes, edited node-by-node. The node kinds the editor knows how to render and convert between are the engine's condition variants: `and` / `or` / `not`, `compare`, `arith`, `min_max`, `unary_math`, `if_then_else`, `read` (a field read like `player.haste`), `var`, and the literals `bool` / `int` / `float`. The comparison, arithmetic, and variable operators offered in the UI are the engine's `CompareOp` / `ArithOp` / `VarOp` enums, imported straight from `wowlab-engine`. Keeping these lists derived from the engine's own types is intentional: the editor cannot offer an operator the engine cannot evaluate. ## Validation over the WASM boundary As you edit, the editor asks the engine whether the rotation is valid for the chosen spec. `useRotationValidation` debounces 250 ms, dehydrates the script to JSON, and calls `engine.validateRotationForSpec(json, specId)`. That export is **spec-aware**: beyond structural parsing it resolves every spell, aura, and talent name against the spec's introspection and reports any that do not exist, so a typo'd spell slug shows up as an error rather than silently failing at sim time: ```rust #[wasm_bindgen(js_name = validateRotationForSpec)] pub fn validate_rotation_for_spec(json: &str, spec_id: u32) -> Result { let descriptor = find_spec_descriptor(spec_id)?; let intro = wowlab_engine_application::introspect_spec(descriptor.spec_id) .map_err(|e| WasmEngineError::Simulation(e.to_string()))?; let rotation: Rotation = WasmEngineError::parse_json(json)?; ``` It looks up the spec descriptor, introspects it, parses the rotation, runs the structural pass, then checks every extracted name against a resolver built from the spec's own spells and auras. The result's `errors` and `warnings` drive the validation strip and per-action badges, and a thrown WASM panic is caught and surfaced as a single error rather than crashing the page. This contradicts an older note that validation over the boundary is structural-only. The structural-only export (`validateRotation`) still exists, but the editor uses the spec-aware one. ## The live trace preview The right pane runs a single deterministic iteration and visualizes it: a timeline, DPS and resource graphs, a "why not?" panel explaining which conditions blocked each action. `useLiveTrace` debounces 250 ms and runs the trace in a dedicated web worker. It builds a sim config from a representative paperdoll profile plus the preview controls (fight style, duration, target count, seed), serializes the dehydrated rotation to JSON, and calls `api.runTrace(simConfig, rotationJson, seed, workerEnv)` over Comlink. The worker delegates to the engine's `runIterationTrace`, which forces the preview rotation through the engine and returns an `IterationTrace`: ```rust #[wasm_bindgen(js_name = runIterationTrace)] pub async fn run_iteration_trace( sim_config: &str, rotation_json: &str, seed: u64, resolver: JsValue, ) -> Result { let mut config = parse_sim_config(sim_config).map_err(WasmEngineError::Parse)?; config.rotation_id = PREVIEW_ROTATION_ID.to_string(); ``` It pins the config's `rotation_id` to the preview slot, overlays the editor's rotation JSON onto the JS resolver, attaches a decision-trace sink, and runs one seeded iteration through `simulate_intent_with_trace`. Two design choices stand out. The trace runs in a worker, not on the main thread, so a slow or panicking iteration never freezes the editor, and an out-of-date result is discarded via a per-run token guard. And the preview always uses the engine's interpreter, not the JIT: attaching a decision-trace sink forces the interpreted backend, because the "why not?" data only exists when the engine walks decisions one at a time. The interpreter-vs-JIT trade-off itself lives in [the rotation compiler](/dev/bible/engine/rotation-compiler). ## Simulation UI The simulate section has three modes, all starting from the same input, a pasted SimulationCraft profile, and all ending at the same results page. The difference is what they search over: **quick** runs one fixed gear set, **bags** (best-in-bags) runs a tournament over the items in your bags, and **drops** ranks unobtained loot. From the browser's point of view the flow is the same: parse the profile, build a sim config, create a job, watch it run, then decode and chart the result. The simulate index just links the three modes. The interesting path is the data flow underneath them.
```mermaid sequenceDiagram participant U as User participant P as useSimcParser participant C as wowlab-common participant S as Supabase participant SE as sentinel/nodes participant R as ResultsContent U->>P: paste SimC text P->>C: parseSimc(input) C-->>P: Profile P->>C: extractSpecIdFromLoadout(talents) C-->>P: specId U->>P: submit P->>C: buildSimConfig(IntentInput) C-->>P: sim_config TOML P->>S: rpc create_job(sim_config, sentinel_config) S-->>P: jobId SE->>SE: assign chunks, simulate, finalize SE->>S: write result_pb / timeline_pb R->>S: select jobs row by id S-->>R: job (result_pb, timeline_pb) R->>C: decodeJobResult(result_pb, timeline_pb) C-->>R: JobResultView R->>R: render charts and tables ```
This figure expands the **Browser** box of the system-context diagram (in [the overview](/dev/bible/overview/architecture)) for the simulation use case specifically. The `sentinel + nodes` step is shown collapsed here; its internals are [hosted compute](/dev/bible/distribution/hosted-compute). ## SimC paste to spec detection Parsing happens on the main thread, synchronously, against the loaded `wowlab-common` module. `useSimcParser` debounces 300 ms, then calls `parseSimcProfile(common, input)` to get a `Profile` and `extractSpecIdFromLoadout(common, profile.talents.encoded)` to recover the spec id from the talent string. The detected spec is gated against the engine's implemented specs, built once from `parseImplementedSpecs(common, engine.getImplementedSpecs())`, so an unsupported spec produces a clear error instead of a failed sim. On a successful, supported parse it dispatches the spec id and a rotation reset into the Redux simulator slice. ## Building the sim config Before submission the profile becomes a TOML sim config. `buildProfileSimConfig` maps the profile's equipment into the engine's `IntentInput` shape, packs in the talent loadout, the chosen `rotation_id`, the spec id, and any settings, then calls `buildSimConfig`, which lives in `wowlab-common` and returns the TOML string: ```typescript export function buildProfileSimConfig( common: CommonModule, { profile, rotationId, settings, specId }: BuildProfileSimConfigArgs, ): string { return buildSimConfig(common, { equipment: mapEquipment(profile.equipment), loadout: profile.talents.encoded || undefined, rotation_id: rotationId, settings, spec_id: specId, }); } ``` The equipment mapping is the only fiddly part: each item carries its `bonus_ids`, `gem_ids`, `enchant_id`, `crafted_stats`, `crafting_quality`, `drop_level`, `slot`, and `id`, which is everything the engine needs to resolve the item back to stats on its side. ## Job submission Both quick and bags submit through `useJobSubmission`, which routes to `useSubmitJob` when no slot is contested and `useSubmitBibJob` when there are tournament slots. Either path builds the sim config, builds a sentinel config with `buildSentinelConfig`, calls the `create_job` RPC on Supabase, and on success navigates to the results page for the returned job id. The difference is in the sentinel config. Quick runs a flat set of iterations at a `target_error` of `0.05`. Bags tightens that to `0.005` and adds a staged tournament: each phase runs more iterations against a shrinking pool of survivors, so cheap early rounds cull the obvious losers before the expensive rounds decide the winner. ```typescript const BIB_TOURNAMENT_PHASES = [ { iterations: 100, keep_fraction_x100: 10 }, { iterations: 500, keep_fraction_x100: 50 }, { iterations: 1000, keep_fraction_x100: 100 }, ] as const; ``` There is a second execution path worth mentioning: the browser can itself be a compute node. The same worker pool that runs distributed chunks is described in [the WASM boundary](/dev/bible/distribution/wasm-boundary); this page concerns the path where the job is submitted and executed by the hosted pool. ## Live progress While a job is `pending` or `running`, the results page shows live progress from the realtime channel. `ResultsProgressSection` calls `useJobProgress(jobId, { isEnabled })`, which subscribes the browser to the `jobs:{jobId}` channel on beacon (Centrifugo). The browser never holds the signing secret: it fetches two short-lived HMAC tokens, a connection token and a per-channel subscription token, from an ownership-checked server route, `GET /api/realtime/job-token`. Each publication carries chunk counts, phase, and for tournaments a live top-K ranking that the section renders as it ticks in. If the channel errors, `isFailed` flips and the caller falls back to the 3000 ms `useJob` poll. The realtime topology is detailed in [the realtime section](/dev/bible/distribution/realtime). ## Decoding and rendering results When the job has finished, `result_pb` and `timeline_pb` are present on the `jobs` row. `ResultsContent` loads the row with `useJob` and decodes it with `decodeJobResult(common, job)`, memoized so it only re-runs when the job or module changes. That helper hands the protobuf payloads, hex-encoded `bytea` straight out of Postgres, to `wowlab-common`, which decodes them on the Rust side and returns a `JobResultView`. The view is a tagged union, discriminated on `kind`: ```rust pub enum JobResultView { Single { analytics: AnalyticsView, }, Tournament { analytics: Option, tournament: TournamentView, }, } ``` A `single` job carries one `AnalyticsView`; a `tournament` job carries the tournament ranking plus an optional baseline `AnalyticsView`. `ResultsContent` branches on `jobResult.kind` and renders accordingly: the throughput chart, the DPS stats card from `analytics.core`, the spell-breakdown table from `analytics.actions`, the sim-config card, and for tournaments the ranking view. Every view struct (`AnalyticsView`, `CoreView`, `DistributionView`, the per-spell `ActionView`) is produced by the common WASM decode, so the UI does no statistics itself. It only renders the decoded view. ## Content System This page documents the system that renders this page. The bible and the docs are MDX files in `packages/shared/src/content`, compiled at build time by [velite](https://velite.js.org) into a typed content collection, then rendered through a set of custom MDX components: the `
`, ``, ``, and `` you see throughout. The content is shared source: both the studio and landing apps point velite at the same tree. The mental model is three stages: velite parses the MDX and validates frontmatter; a content-collection helper turns the flat list of entries into a navigable, ordered tree; and the App Router renders each entry with the custom components. ## velite: parse and validate The studio's `velite.config.ts` declares two collections, `bible` and `docs`, both reading from the same shared content root. The `bible` collection's schema requires `title` and `description`, optionally `nextSteps`, and derives `body` (compiled MDX), `toc`, a git-commit `updatedAt`, and a `sortKey` from the file path: ```typescript export const bible = defineCollection({ name: "BibleEntry", pattern: "bible/**/*.mdx", schema: s.object({ ...baseSchema, nextSteps: s.array(s.string()).optional(), sortKey: s.path(), }), }); ``` The `updatedAt` field is computed by shelling out to `git log -1 --format=%cd` per file, so the "last updated" you see is the real commit date, not the build date. Code blocks are highlighted at build time with `rehype-pretty-code` and Shiki, and a small rehype plugin stashes the raw source on each `
` so the copy button has something to copy.

## The number-prefix rule

Files and folders are numbered, e.g. `05-portal/03-simulation-ui.mdx`. That prefix is load-bearing, not cosmetic. The content-collection builder parses the leading `NN-` to derive ordering, and it refuses anything without it:




```typescript
if (!match) {
  throw new Error(
    `${collectionName} path segment missing number prefix: ${segment}`,
  );
}
```




The prefix is then stripped to form the public slug, so the URL is `/dev/bible/portal/simulation-ui`, not `.../05-portal/03-...`. This is why every cross-link in this bible uses the stripped path while `nextSteps` frontmatter uses the numbered one. They address the same page through two layers of the pipeline.

`createContentCollection` also builds the section index, resolves adjacent prev/next pages, and resolves the `nextSteps` numbered paths back to navigable items. The studio binds the velite output into this helper in `apps/studio/src/lib/content/bible.ts`.

## Rendering

The bible route is a dynamic catch-all, `[...slug]`, that prerenders every slug via `generateStaticParams` over `bible.slugs`. It fetches the page data and hands the compiled `body` to a shared `ArticleLayout` together with the custom MDX component map, `studioMdxComponents`.

That map is where the bible-specific components are registered:




```tsx
export const studioMdxComponents = {
  BibleTable: MdBibleTable,
  BibleTables: MdBibleTables,
  Bibliography: MdBibliography,
  Cite: MdCite,
  Figure: MdFigure,
  Figures: MdFigures,
  GithubEmbed: MdGithubEmbed,
  Glossary: MdGlossary,
  RoadmapPlanner: MdRoadmap,
  Term: MdTerm,
};
```




Each tag maps to the component that renders it:



| MDX tag                                                         | Component       | What it renders                                        |
| --------------------------------------------------------------- | --------------- | ------------------------------------------------------ |
| `
` | `MdFigure` | A numbered, captioned figure (mermaid / table / image) | | `` | `MdBibleTable` | A numbered, captioned GFM table | | `` | `MdTerm` | A glossary hover-card link | | `` | `MdCite` | A numbered citation with a reference hover-card | | `` / `` / `` / `` | list components | The index pages that enumerate all of each kind | ## The figure / table / term / reference registries A deliberate choice: the metadata for figures, tables, glossary terms, and references is **not** in the MDX. It lives in four central TypeScript files, `apps/studio/src/content/{figures,tables,terms,references}.ts`, keyed by id. The MDX only references an id; the component looks the rest up. - `
` wraps a mermaid fence (or table/image). `MdFigure` looks `x` up in `figures` to get its number and caption. Captions are registered centrally, so the MDX does not pass one. - `` works the same way against `tables`. - `label` renders a hover card with the term's name, expansion, and description from `terms`, linking to the glossary anchor; an unknown id renders a visible `(?)` rather than crashing. - `` renders a numbered link with a reference hover-card and an optional locator; references support plain URLs, DOIs, and archived snapshots. The reason for the central registries is consistency: numbering, captions, and the index pages (`/dev/bible/figures`, `/dev/bible/glossary`, `/dev/bible/references`) all derive from one source, so a figure cannot have two different captions and the numbering cannot drift from the prose. The cost is that adding a figure touches two files, the MDX and the registry, which is a fair trade for never having a mislabeled or dangling reference. Mermaid diagrams are rendered client-side; velite does not validate them, so a malformed diagram fails in the browser, not at build. That is why every diagram in this bible is wrapped in `
` with the fence isolated by blank lines. ## Death Knight Work in progress. ## Demon Hunter Work in progress. ## Druid Work in progress. ## Evoker Work in progress. ## Hunter Work in progress. ## Mage Work in progress. ## Monk Work in progress. ## Paladin Work in progress. ## Priest Work in progress. ## Rogue Work in progress. ## Shaman Work in progress. ## Warlock Work in progress. ## Warrior Work in progress. ## Figures _Dynamic page: content is rendered live at https://app.wowlab.gg/dev/bible/figures and is not included here._ ## Tables _Dynamic page: content is rendered live at https://app.wowlab.gg/dev/bible/tables and is not included here._ ## Glossary ## References We include screenshots of all website sources, as well as Wayback Machine links to fight link rot. Please consider visitig the original websites if they are still reachable and have the revelant sources. # Blog ## Hello! Welcome to WoW Lab. This is our first blog post. ## What is this? WoW Lab is a combat simulator for World of Warcraft. Build rotations in the browser, run sims right there too. The engine runs in WebAssembly, so free sims never leave your machine. If you want hosted pool speed, there's a paid plan. This is **version 0.1.0**, very much a work in progress. Things will break, features are incomplete, and we're iterating fast. Building in public. ## What's next? More news soon. This blog covers updates, guides, and announcements as we go. In the meantime: - Check out the [About](/about) page to see what WoW Lab is and how it works - See which specs are ready to sim on the [Spec Coverage](/dev/docs/guides/01-spec-coverage) page - Join our [Discord](/go/discord) to chat, ask questions, or share feedback Thanks for stopping by.