Nimbus — Minecraft Cloud System

Distribute Nimbus services across multiple machines with cluster mode, agent nodes, placement strategies, TCP load balancing, and automatic failover.

Nimbus can distribute services across multiple machines using cluster mode. A central controller coordinates remote agent nodes that run services locally — scaling, placement, and monitoring all happen automatically. An optional TCP load balancer sits in front of your Velocity proxies so players connect to a single address.

When do I need this?

If your network runs on one machine, you don't — single-node mode handles everything. Multi-node becomes valuable when you need more RAM, CPU, or want geographic distribution across dedicated servers.

Architecture

                        Players
                           │
                 ┌─────────▼──────────┐
                 │   Load Balancer    │  :25565 (optional)
                 └─────────┬──────────┘
                           │
              ┌────────────▼────────────┐
              │    Nimbus Controller      │  nimbus-core
              │ Groups · Scaling · API    │
              └───┬──────────────┬─────┘
                  │  WebSocket   │
         ┌────────▼───┐    ┌────▼────────┐
         │ Agent 1    │    │   Agent 2   │  nimbus-agent
         │ Proxy-1    │    │ BedWars-1   │
         │ Lobby-1    │    │ BedWars-2   │
         │ Lobby-2    │    │ BedWars-3   │
         └────────────┘    └─────────────┘

How it works:

The controller (nimbus-core) manages all state — groups, scaling decisions, player routing
Agent nodes (nimbus-agent) connect via WebSocket and run Java server processes locally
When the scaling engine starts a service, it picks the best node using a placement strategy
Agents auto-download templates from the controller, launch processes, and stream state back
If a node disconnects, the controller detects it via heartbeats and reschedules services
If no remote node is available, dynamic services fall back to running locally on the controller

Service Placement Rules

Not all service types are distributed equally. Static services have persistent data (worlds, configs) stored in services/static/ and run on the controller by default — but since v0.7.0, they can be placed on remote agent nodes with state sync enabled.

Service Type	Where it runs	Reason
Static (e.g., Lobby, Survival)	Controller (default), or any node with sync enabled	Defaults to local; remote placement requires `[group.sync]` for data persistence
Dynamic (e.g., BedWars, SkyWars)	Any node (controller or agent)	Stateless — rebuilt from template every start
Proxy (Velocity)	Any node (controller or agent)	No persistent state

By default, static services run on the controller. For remote placement, enable state sync — see State Sync.

Quick Start

1. Enable cluster mode on the controller

Nimbus

cluster enable

This generates an auth token and saves the config. Restart Nimbus to activate.

Or edit config/nimbus.toml manually:

config/nimbus.toml

[cluster]
enabled = true
token = "your-secret-token"        # shared with all agents
agent_port = 8443                   # WebSocket port for agents
bind = "0.0.0.0"
heartbeat_interval = 5000           # ms between heartbeats
node_timeout = 15000                # ms before node is considered dead
placement_strategy = "least-services"

TLS is automatic

Cluster TLS is enabled by default. On first startup Nimbus generates a self-signed certificate and the agent setup wizard automatically pins its SHA-256 fingerprint via the /api/cluster/bootstrap endpoint — no manual keytool/truststore work. See Cluster TLS & Security for the threat model, cert rotation, and advanced options (custom CA, extra SANs).

2. Set up an agent node

Install the agent on each worker machine with a single command:

Terminal

curl -fsSL https://raw.githubusercontent.com/NimbusPowered/Nimbus/main/install-agent.sh | bash

PowerShell

irm https://raw.githubusercontent.com/NimbusPowered/Nimbus/main/install-agent.ps1 | iex

The installer handles Java 21 and downloads the latest agent release. The setup wizard runs on first start and walks you through trust + config.

Before running the wizard, grab the bootstrap details from the controller:

Nimbus (controller)

cluster bootstrap-url

This prints the REST URL and cluster token you need. Then start the agent — the wizard will:

Ask for the Controller REST URL (e.g. http://10.0.0.1:8080)
Ask for the Auth Token (the cluster token from the controller)
Call /api/cluster/bootstrap to fetch the controller's TLS fingerprint
Show you the fingerprint and expiry — you confirm with Y
Ask for node name, memory, and max services (defaults are auto-detected)

The resulting agent.toml:

agent.toml

[agent]
controller = "wss://10.0.0.1:8443/cluster"
token = "your-cluster-token"
node_name = "worker-1"
max_memory = "16G"
max_services = 10
public_host = ""  # Leave blank to auto-detect; set to your public IP if behind NAT
trusted_fingerprint = "AA:BB:CC:DD:..."  # pinned by the wizard
tls_verify = true
truststore_path = ""
truststore_password = ""

# Optional: specify paths to Java installations.
# Leave empty for auto-detection / auto-download from Adoptium.
[java]
java_16 = ""
java_17 = ""
java_21 = ""

Re-running the wizard

After cert rotation or if you want to re-pin the controller, run java -jar nimbus-agent.jar --setup to force the wizard even when agent.toml already exists.

JDK Auto-Resolution

Agent nodes automatically detect installed JDKs and download missing versions from Adoptium when needed. You only need to configure the [java] section if you want to override the auto-detected paths.

3. Verify the connection

On the controller:

Nimbus — Cluster Nodes

nimbus » nodes
── Cluster Nodes ──────────────────────────────────
NODE        HOST          STATUS    CPU    MEMORY         SERVICES
──────────────────────────────────────────────────────────────────
worker-1    10.0.0.2      online    12%    3241/16384MB   2/10
worker-2    10.0.0.3      online    8%     1024/8192MB    1/5
2/2 online

Placement Strategies

When starting a new service, the controller picks a node using one of these strategies:

Strategy	Description	Best for
`least-services`	Node running the fewest services (default)	Even distribution
`least-memory`	Node with the most free memory	Memory-heavy servers (modded, large worlds)
`round-robin`	Rotate through nodes sequentially	Predictable, simple

config/nimbus.toml

[cluster]
placement_strategy = "least-services"

Nodes that are offline, at max services, or out of memory are automatically skipped.

TCP Load Balancer

When running multiple Velocity proxy instances across nodes, enable the load balancer so all players connect to one address (play.yourserver.com:25565):

Nimbus

lb enable

Or in config/nimbus.toml:

config/nimbus.toml

[loadbalancer]
enabled = true
bind = "0.0.0.0"
port = 25565                    # players connect here
strategy = "least-players"      # or "round-robin"
proxy_protocol = false          # enable for real client IPs
connection_timeout = 5000
buffer_size = 16384

The load balancer is a Layer-4 TCP proxy — it forwards raw Minecraft protocol bytes without inspecting them. This means zero overhead and full compatibility with any Minecraft version.

Load balancer strategies

Strategy	Description
`least-players`	Route to the proxy with the fewest players (default)
`round-robin`	Distribute evenly across all proxies

PROXY protocol (real client IPs)

By default, Velocity sees all connections coming from the load balancer's IP. Enable PROXY protocol v2 to preserve real client IPs:

config/nimbus.toml

[loadbalancer]
proxy_protocol = true

Nimbus automatically patches haproxy-protocol = true in Velocity's config when this is enabled.

Monitoring

Nimbus — Load Balancer

nimbus » lb
── Load Balancer ───────────────────────────────
  Total Connections:   1,247
  Active Connections:  89

BACKEND       HOST          PORT    PLAYERS   STATE
──────────────────────────────────────────────────────
Proxy-1       10.0.0.2      30000   45        ● READY
Proxy-2       10.0.0.3      30000   44        ● READY

CLI Commands

Manage cluster mode and load balancer from the console without editing config files:

Nimbus

# Cluster (multi-node)
cluster status                  # Show cluster status
cluster enable                  # Enable cluster mode (generates token)
cluster disable                 # Disable cluster mode
cluster token                   # Show auth token
cluster token regenerate        # Generate new token (update all agents!)
cluster cert                    # Show TLS cert fingerprint, expiry, SANs
cluster cert regenerate         # Delete cluster.jks (agents must re-run setup)
cluster bootstrap-url           # Print REST URL + token for the agent wizard

# Load Balancer (independent of cluster mode)
lb                              # Show LB status + backend proxies
lb enable                       # Enable load balancer
lb disable                      # Disable load balancer
lb strategy <name>              # Set strategy (least-players, round-robin)

Changes are saved to nimbus.toml and take effect after restart.

The load balancer is independent of cluster mode. You can use it with multiple local Velocity proxies without enabling cluster mode.

Node Failure & Recovery

Nimbus handles node failures automatically:

Scenario	What happens
Agent disconnects briefly	Services keep running. Agent reconnects within `node_timeout` and resumes.
Agent doesn't reconnect	After `2 × node_timeout`, services are marked CRASHED. Scaling engine restarts them on other nodes.
Controller restarts	Agents reconnect automatically (5-second retry loop). Running services are preserved.

When the controller restarts, it waits for agents to reconnect during a configurable reconciliation window (reconciliation_delay, default 10 seconds). During this window, agents report any services that are still running on their nodes. Only after the reconciliation delay expires does startMinimumInstances() run. Because the recovered services are already registered at that point, the controller sees them and only starts the difference -- preventing duplicate instances from being launched.

Template Distribution

Agents automatically download templates from the controller when starting a service:

Controller sends StartService with template name + SHA-256 hash
Agent checks its local cache — if hash matches, uses cached version
If not, downloads the template as a ZIP from GET /api/templates/{name}/download
Extracts to local templates/ directory and starts the service

Templates are cached on agents — they only re-download when the template changes on the controller.

Velocity forwarding configuration is auto-patched on agent nodes, just like on the controller. Agents apply the same modern or legacy forwarding settings so proxy services on remote nodes work seamlessly.

Shutdown Order

Controller shutdown (cluster mode)

Cancel scaling engine and load balancer jobs
Send ShutdownAgent to all connected nodes
Wait ~500 ms for messages to flush over the WebSocket
Stop the cluster WebSocket server
Stop the REST API server and flush audit collectors
Stop all local services in order: game servers → lobbies → proxies
Cancel the coroutine scope and exit

Agent shutdown (Ctrl+C or SIGTERM)

Stop all local services (game → lobby → proxy). For sync-enabled services, the graceful stop triggers a working-directory delta push back to the controller's canonical store before the process exits.
Close WebSocket connection
Exit

Configuration Reference

See nimbus.toml — Cluster and nimbus.toml — Load Balancer for all configuration options.

Next Steps

Scaling Guide — How auto-scaling works with multi-node
Commands Reference — nodes, lb, cluster commands
REST API — /api/nodes, /api/loadbalancer endpoints
Architecture — Technical deep-dive

Multi-Node & Load Balancer

Architecture

Service Placement Rules

Quick Start

1. Enable cluster mode on the controller

2. Set up an agent node

3. Verify the connection

Placement Strategies

TCP Load Balancer

Load balancer strategies

PROXY protocol (real client IPs)

Monitoring

CLI Commands

Node Failure & Recovery

Template Distribution

Shutdown Order

Controller shutdown (cluster mode)

Agent shutdown (Ctrl+C or SIGTERM)

Configuration Reference

Next Steps

On this page