Cluster TLS & Security
How Nimbus secures agent-to-controller traffic with TLS fingerprint pinning, why self-signed certs are fine, and how to rotate or customize the cluster cert.
Nimbus cluster mode uses a dedicated TLS-encrypted WebSocket between the controller and each agent. This page explains the trust model, how to set it up, and how to rotate the certificate.
Threat Model
The cluster channel carries:
- The cluster token (shared secret for agent authentication)
- Service start/stop commands and configuration
- File transfers (templates, server JARs, config patches)
- Live stdout and state updates from agent-managed processes
An attacker on the network between controller and agent could — if the channel were unencrypted or unauthenticated — inject service commands, steal the cluster token, or replace template files with malicious ones. TLS with certificate pinning prevents this.
What's protected: confidentiality + integrity + authentication of the agent ↔ controller channel. What's not: the cluster token itself must be distributed securely (don't paste it into public chat). The bootstrap endpoint delivers the cert over plain HTTP, gated by the cluster token — if the token leaks, an attacker can fetch the cert material, but cert material is public by design.
Why Self-Signed Certs Are Fine
A common misconception: "self-signed = insecure." That's wrong. Self-signed certs encrypt just as well as CA-issued ones. The only thing they lack is automatic trust via a well-known Certificate Authority.
Nimbus solves this with fingerprint pinning: the agent is configured with the SHA-256 fingerprint of exactly one controller cert, and refuses to connect to anything else. This is:
- ✅ Immune to MITM — an attacker would need the controller's private key, which never leaves
config/cluster.jks. - ✅ Immune to CA compromise — no public CA is trusted, so a rogue CA can't issue a cert for your controller.
- ✅ Zero ops — no renewal schedule, no Let's Encrypt ACME dance, no 90-day churn.
This is the same trust model SSH uses when you accept a host key on first connection.
Trust-On-First-Use Flow
- Controller starts →
TlsHelper.ensureKeyStoregeneratesconfig/cluster.jks(RSA-4096, self-signed, 10-year validity) if it doesn't already exist. - User runs
cluster bootstrap-urlon the controller → gets the REST URL and cluster token. - User starts the agent for the first time →
SetupWizardprompts for those two values. - Agent calls
GET http://<controller>:8080/api/cluster/bootstrapwithAuthorization: Bearer <cluster-token>. - Controller responds with
{ fingerprint, certPem, wsUrl, validUntil, sans }. - Agent displays the fingerprint + expiry → user confirms with
Y. - Wizard saves
trusted_fingerprintandcontroller(thewsUrlfrom the response) intoagent.toml. - Agent connects to the controller via
wss://. The pinned trust manager validates the server's leaf certificate SHA-256 against the stored fingerprint. No CA chain, no hostname verification — the fingerprint is the only trust anchor.
The bootstrap endpoint lives on the plaintext REST API port (default 8080). This is intentional: if it required TLS, the agent would need to trust the cert before it knows what to trust — chicken-and-egg. The cluster token is what gates access.
Cert Rotation
When you want to rotate the controller cert (new hostname, old one expired, compromise suspicion):
cluster cert regenerate
# confirm the warning prompt
# restart Nimbus — a fresh self-signed cert is generatedOn every agent:
java -jar nimbus-agent.jar --setup
# re-run the wizard, confirm the new fingerprintThat's it. The agent's old trusted_fingerprint is overwritten, and the next wss:// connection validates against the new cert.
Why manual re-trust?
Automatic re-trust would defeat the entire point of pinning — an attacker who can MITM the bootstrap request could substitute their own cert. Requiring a human confirmation on each agent is the whole security benefit.
Advanced: Custom CA / Existing Certs
If you already have a CA-issued cert (e.g., from an internal PKI or Let's Encrypt on a private domain), you can use it instead of the auto-generated self-signed one:
Option 1 — provide the keystore:
[cluster]
tls_enabled = true
keystore_path = "/etc/nimbus/controller.p12"
keystore_password = "..."On the agent, configure a JKS/PKCS12 truststore that contains the CA cert:
[agent]
trusted_fingerprint = "" # leave empty to use truststore instead
truststore_path = "/etc/nimbus/ca-trust.jks"
truststore_password = "..."Precedence on the agent side: trusted_fingerprint > truststore_path > system CAs > tls_verify = false (trust all).
Option 2 — add SANs to the auto-generated cert:
If you want to keep the auto-generated cert but have the agent connect via a hostname that isn't 127.0.0.1 or the local hostname, add the extra SANs to nimbus.toml before the first start (the cert is only generated if config/cluster.jks doesn't exist yet):
[cluster]
extra_sans = ["controller.example.com", "10.0.0.5"]If the file already exists, delete it (cluster cert regenerate) and restart. Strings that look like dotted IPv4 are added as IP SANs; everything else becomes a DNS SAN.
Public Host for Bootstrap URL
When the controller is behind NAT or has multiple interfaces, the wsUrl returned by /api/cluster/bootstrap may point to the wrong address. Override it explicitly:
[cluster]
public_host = "controller.example.com"The bootstrap endpoint will then return wss://controller.example.com:8443/cluster regardless of what bind is set to.
Dev Escape Hatch
For local-only testing (e.g., both controller and agent on the same dev machine), you can skip trust entirely:
tls_verify = falseNever use tls_verify = false on a production agent. It disables all certificate validation and makes the connection trivially MITM-able. The only reason to use it is local debugging.
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
TLS handshake failed: not an SSL/TLS record in controller log | Agent is using ws:// against the TLS port | Set controller = "wss://..." in agent.toml |
unable to find valid certification path on agent | No trust material configured | Run --setup to pin the fingerprint |
Controller cert fingerprint mismatch | Cert was rotated | Run --setup on each agent to re-pin |
0.0.0.0 in connect URL | User put a bind address as connect address | Use the real hostname/IP or 127.0.0.1 for local |
No subject alternative names matching ... (truststore mode only) | Cert SAN list doesn't include your hostname | Add to extra_sans, regenerate cert |
Related
- Multi-Node Setup — cluster mode overview
- nimbus.toml — cluster — all TLS config options
- Agent config — agent.toml reference including all TLS fields
Cluster Topologies
Advanced multi-node patterns — single-controller hubs, split game-mode clusters, geo-distributed agents, placement pinning, and state-sync trade-offs.
Bedrock / Geyser Setup
Enable Bedrock Edition crossplay in Nimbus with Geyser and Floodgate — auto-configured for mobile, console, and Windows 10/11 players.