Nimbusv1.0.0

Multi-Node & Load Balancer

Distribute Nimbus services across multiple machines with cluster mode, agent nodes, placement strategies, TCP load balancing, and automatic failover.

Nimbus can distribute services across multiple machines using cluster mode. A central controller coordinates remote agent nodes that run services locally — scaling, placement, and monitoring all happen automatically. An optional TCP load balancer sits in front of your Velocity proxies so players connect to a single address.

When do I need this?

If your network runs on one machine, you don't — single-node mode handles everything. Multi-node becomes valuable when you need more RAM, CPU, or want geographic distribution across dedicated servers.

Architecture

Architecture
                        Players

                 ┌─────────▼──────────┐
                 │   Load Balancer    │  :25565 (optional)
                 └─────────┬──────────┘

              ┌────────────▼────────────┐
              │    Nimbus Controller      │  nimbus-core
              │ Groups · Scaling · API    │
              └───┬──────────────┬─────┘
                  │  WebSocket   │
         ┌────────▼───┐    ┌────▼────────┐
         │ Agent 1    │    │   Agent 2   │  nimbus-agent
         │ Proxy-1    │    │ BedWars-1   │
         │ Lobby-1    │    │ BedWars-2   │
         │ Lobby-2    │    │ BedWars-3   │
         └────────────┘    └─────────────┘

How it works:

  1. The controller (nimbus-core) manages all state — groups, scaling decisions, player routing
  2. Agent nodes (nimbus-agent) connect via WebSocket and run Java server processes locally
  3. When the scaling engine starts a service, it picks the best node using a placement strategy
  4. Agents auto-download templates from the controller, launch processes, and stream state back
  5. If a node disconnects, the controller detects it via heartbeats and reschedules services
  6. If no remote node is available, dynamic services fall back to running locally on the controller

Service Placement Rules

Not all service types are distributed equally. Static services have persistent data (worlds, configs) stored in services/static/ and run on the controller by default — but since v0.7.0, they can be placed on remote agent nodes with state sync enabled.

Service TypeWhere it runsReason
Static (e.g., Lobby, Survival)Controller (default), or any node with sync enabledDefaults to local; remote placement requires [group.sync] for data persistence
Dynamic (e.g., BedWars, SkyWars)Any node (controller or agent)Stateless — rebuilt from template every start
Proxy (Velocity)Any node (controller or agent)No persistent state

By default, static services run on the controller. For remote placement, enable state sync — see State Sync.

Quick Start

1. Enable cluster mode on the controller

Nimbus
cluster enable

This generates an auth token and saves the config. Restart Nimbus to activate.

Or edit config/nimbus.toml manually:

config/nimbus.toml
[cluster]
enabled = true
token = "your-secret-token"        # shared with all agents
agent_port = 8443                   # WebSocket port for agents
bind = "0.0.0.0"
heartbeat_interval = 5000           # ms between heartbeats
node_timeout = 15000                # ms before node is considered dead
placement_strategy = "least-services"

TLS is automatic

Cluster TLS is enabled by default. On first startup Nimbus generates a self-signed certificate and the agent setup wizard automatically pins its SHA-256 fingerprint via the /api/cluster/bootstrap endpoint — no manual keytool/truststore work. See Cluster TLS & Security for the threat model, cert rotation, and advanced options (custom CA, extra SANs).

2. Set up an agent node

Install the agent on each worker machine with a single command:

Terminal
curl -fsSL https://raw.githubusercontent.com/NimbusPowered/Nimbus/main/install-agent.sh | bash
PowerShell
irm https://raw.githubusercontent.com/NimbusPowered/Nimbus/main/install-agent.ps1 | iex

The installer handles Java 21 and downloads the latest agent release. The setup wizard runs on first start and walks you through trust + config.

Before running the wizard, grab the bootstrap details from the controller:

Nimbus (controller)
cluster bootstrap-url

This prints the REST URL and cluster token you need. Then start the agent — the wizard will:

  1. Ask for the Controller REST URL (e.g. http://10.0.0.1:8080)
  2. Ask for the Auth Token (the cluster token from the controller)
  3. Call /api/cluster/bootstrap to fetch the controller's TLS fingerprint
  4. Show you the fingerprint and expiry — you confirm with Y
  5. Ask for node name, memory, and max services (defaults are auto-detected)

The resulting agent.toml:

agent.toml
[agent]
controller = "wss://10.0.0.1:8443/cluster"
token = "your-cluster-token"
node_name = "worker-1"
max_memory = "16G"
max_services = 10
public_host = ""  # Leave blank to auto-detect; set to your public IP if behind NAT
trusted_fingerprint = "AA:BB:CC:DD:..."  # pinned by the wizard
tls_verify = true
truststore_path = ""
truststore_password = ""

# Optional: specify paths to Java installations.
# Leave empty for auto-detection / auto-download from Adoptium.
[java]
java_16 = ""
java_17 = ""
java_21 = ""

Re-running the wizard

After cert rotation or if you want to re-pin the controller, run java -jar nimbus-agent.jar --setup to force the wizard even when agent.toml already exists.

JDK Auto-Resolution

Agent nodes automatically detect installed JDKs and download missing versions from Adoptium when needed. You only need to configure the [java] section if you want to override the auto-detected paths.

3. Verify the connection

On the controller:

Nimbus — Cluster Nodes
nimbus » nodes
── Cluster Nodes ──────────────────────────────────
NODE        HOST          STATUS    CPU    MEMORY         SERVICES
──────────────────────────────────────────────────────────────────
worker-1    10.0.0.2      online    12%    3241/16384MB   2/10
worker-2    10.0.0.3      online    8%     1024/8192MB    1/5
2/2 online

Placement Strategies

When starting a new service, the controller picks a node using one of these strategies:

StrategyDescriptionBest for
least-servicesNode running the fewest services (default)Even distribution
least-memoryNode with the most free memoryMemory-heavy servers (modded, large worlds)
round-robinRotate through nodes sequentiallyPredictable, simple
config/nimbus.toml
[cluster]
placement_strategy = "least-services"

Nodes that are offline, at max services, or out of memory are automatically skipped.

TCP Load Balancer

When running multiple Velocity proxy instances across nodes, enable the load balancer so all players connect to one address (play.yourserver.com:25565):

Nimbus
lb enable

Or in config/nimbus.toml:

config/nimbus.toml
[loadbalancer]
enabled = true
bind = "0.0.0.0"
port = 25565                    # players connect here
strategy = "least-players"      # or "round-robin"
proxy_protocol = false          # enable for real client IPs
connection_timeout = 5000
buffer_size = 16384

The load balancer is a Layer-4 TCP proxy — it forwards raw Minecraft protocol bytes without inspecting them. This means zero overhead and full compatibility with any Minecraft version.

Load balancer strategies

StrategyDescription
least-playersRoute to the proxy with the fewest players (default)
round-robinDistribute evenly across all proxies

PROXY protocol (real client IPs)

By default, Velocity sees all connections coming from the load balancer's IP. Enable PROXY protocol v2 to preserve real client IPs:

config/nimbus.toml
[loadbalancer]
proxy_protocol = true

Nimbus automatically patches haproxy-protocol = true in Velocity's config when this is enabled.

Monitoring

Nimbus — Load Balancer
nimbus » lb
── Load Balancer ───────────────────────────────
  Total Connections:   1,247
  Active Connections:  89

BACKEND       HOST          PORT    PLAYERS   STATE
──────────────────────────────────────────────────────
Proxy-1       10.0.0.2      30000   45        ● READY
Proxy-2       10.0.0.3      30000   44        ● READY

CLI Commands

Manage cluster mode and load balancer from the console without editing config files:

Nimbus
# Cluster (multi-node)
cluster status                  # Show cluster status
cluster enable                  # Enable cluster mode (generates token)
cluster disable                 # Disable cluster mode
cluster token                   # Show auth token
cluster token regenerate        # Generate new token (update all agents!)
cluster cert                    # Show TLS cert fingerprint, expiry, SANs
cluster cert regenerate         # Delete cluster.jks (agents must re-run setup)
cluster bootstrap-url           # Print REST URL + token for the agent wizard

# Load Balancer (independent of cluster mode)
lb                              # Show LB status + backend proxies
lb enable                       # Enable load balancer
lb disable                      # Disable load balancer
lb strategy <name>              # Set strategy (least-players, round-robin)

Changes are saved to nimbus.toml and take effect after restart.

The load balancer is independent of cluster mode. You can use it with multiple local Velocity proxies without enabling cluster mode.

Node Failure & Recovery

Nimbus handles node failures automatically:

ScenarioWhat happens
Agent disconnects brieflyServices keep running. Agent reconnects within node_timeout and resumes.
Agent doesn't reconnectAfter 2 × node_timeout, services are marked CRASHED. Scaling engine restarts them on other nodes.
Controller restartsAgents reconnect automatically (5-second retry loop). Running services are preserved.

When the controller restarts, it waits for agents to reconnect during a configurable reconciliation window (reconciliation_delay, default 10 seconds). During this window, agents report any services that are still running on their nodes. Only after the reconciliation delay expires does startMinimumInstances() run. Because the recovered services are already registered at that point, the controller sees them and only starts the difference -- preventing duplicate instances from being launched.

Template Distribution

Agents automatically download templates from the controller when starting a service:

  1. Controller sends StartService with template name + SHA-256 hash
  2. Agent checks its local cache — if hash matches, uses cached version
  3. If not, downloads the template as a ZIP from GET /api/templates/{name}/download
  4. Extracts to local templates/ directory and starts the service

Templates are cached on agents — they only re-download when the template changes on the controller.

Velocity forwarding configuration is auto-patched on agent nodes, just like on the controller. Agents apply the same modern or legacy forwarding settings so proxy services on remote nodes work seamlessly.

Shutdown Order

Controller shutdown (cluster mode)

  1. Cancel scaling engine and load balancer jobs
  2. Send ShutdownAgent to all connected nodes
  3. Wait ~500 ms for messages to flush over the WebSocket
  4. Stop the cluster WebSocket server
  5. Stop the REST API server and flush audit collectors
  6. Stop all local services in order: game servers → lobbies → proxies
  7. Cancel the coroutine scope and exit

Agent shutdown (Ctrl+C or SIGTERM)

  1. Stop all local services (game → lobby → proxy). For sync-enabled services, the graceful stop triggers a working-directory delta push back to the controller's canonical store before the process exits.
  2. Close WebSocket connection
  3. Exit

Configuration Reference

See nimbus.toml — Cluster and nimbus.toml — Load Balancer for all configuration options.

Next Steps