Nimbusv1.0.0

WebSocket API

Live event stream, per-service console, Remote CLI multiplex, and the cluster agent channel.

Nimbus runs WebSocket endpoints on two different engines:

  • Main REST API (Ktor CIO) on [api].port (default 8080) hosts /api/events, /api/services/{name}/console, and /api/console/stream. Auth matches the REST layer — bearer token via Authorization header (preferred) or ?token= query param.
  • Cluster server (Ktor Netty) on [cluster].agent_port (default 8443) hosts /cluster. This is a separate server process with its own TLS keystore and auth (cluster.token shared secret + TLS fingerprint pinning). It is not part of the REST API.

The cluster WebSocket is not on the REST port. Older docs listed ws://controller:8080/cluster — that was always wrong. Use wss://controller:8443/cluster (default, TLS enabled) or ws://controller:8443/cluster when [cluster] tls_enabled = false.

Connecting

Auth header (preferred)

GET /api/events HTTP/1.1
Host: controller:8080
Connection: Upgrade
Upgrade: websocket
Authorization: Bearer <master-token-or-JWT>

Query parameter (fallback)

wss://controller:8080/api/events?token=<master-token>

Browser WebSocket clients can't set custom headers, so the dashboard uses the query-param form. The controller accepts both on every authenticated WS endpoint.

Tokens can be:

  • The master API token ([api].token / NIMBUS_API_TOKEN).
  • The derived service token (HMAC-SHA256 of the master token with message nimbus-service-token), for /api/events only.
  • A JWT with the admin scope (for admin-only endpoints) or any SERVICE_SCOPES entry (for service-level endpoints).

Unauthenticated connections are closed immediately with close code 1008 VIOLATED_POLICY and reason "Authentication required...".


/api/events — Live event stream

Path: GET /api/events (WebSocket upgrade)
Auth: service (master or derived service token)
Direction: server → client only.

The controller pushes every event emitted on the internal EventBus as one JSON text frame per event. Frame schema:

{
  "type": "SERVICE_READY",
  "timestamp": "2026-04-15T12:34:56.123Z",
  "data": { "service": "Lobby-1", "group": "Lobby" }
}

type is a stable uppercase string. data is a flat { string: string } map — numbers and booleans are stringified so consumers don't have to guess the value type per key. See Events reference for every type.

The server does not send ping frames specific to this endpoint — the Ktor WebSocket plugin auto-pings every 15 s with a 30 s timeout. Clients only need to respond to pong frames as usual.

Frame size

maxFrameSize is 64 KiB on the controller. Every event payload today fits inside one frame; if a future event exceeds this, the controller would close the connection — so keep the data shape flat.


/api/services/{name}/console — Per-service console

Path: GET /api/services/{name}/console
Auth: admin only (master token / JWT admin). Service tokens are rejected.
Direction: bidirectional.

Opens a bidirectional stream for a single running service:

  • Outbound (server → client): every line of the service process's stdout, as plain text frames.
  • Inbound (client → server): each text frame is .trim()-ed and, if non-empty, written as a command to the process stdin.

Close codes

CodeReason
1003 CANNOT_ACCEPTService '<name>' not found — the service isn't in the registry.
1003 CANNOT_ACCEPTNo process handle for '<name>' — the service is registered but not running (STOPPED / CRASHED).
1008 VIOLATED_POLICYAuth failed.

There is no command framing — every text frame becomes a stdin line. That means paste buffers can inject many commands at once. If you need command/output correlation, use /api/console/stream instead.


/api/console/stream — Remote CLI multiplex

Path: GET /api/console/stream
Auth: admin only (master token).
Direction: bidirectional, multiplexed JSON.

This is the channel used by the Remote CLI (nimbus-cli) and the dashboard console. Every frame is a JSON object with a type field.

Inbound message types

// Hello — first message after connect, carries client info for the session tracker.
{ "type": "hello",
  "text": "{\"username\":\"jonas\",\"hostname\":\"laptop\",\"os\":\"Linux\"}" }

// Execute a command. id correlates output frames; echoed back in output_end.
{ "type": "execute", "id": "cmd-7", "input": "services list" }

// Ask for tab completions of a partial buffer.
{ "type": "complete", "id": "c-3", "input": "ser" }

// Attach to a service process's stdout screen (one active per session).
{ "type": "screen_attach", "service": "Lobby-1" }

// Send a single line to the attached service's stdin.
{ "type": "screen_input", "service": "Lobby-1", "text": "say hi" }

// Detach from the current screen session.
{ "type": "screen_detach" }

Outbound message types

// Command output line tagged by kind (header, info, success, error, item, text).
{ "type": "output", "id": "cmd-7", "kind": "success", "text": "Started Lobby-1" }

// Sent once after the dispatcher finishes a command — acts as a barrier for clients.
{ "type": "output_end", "id": "cmd-7" }

// Tab-completion response.
{ "type": "completions", "id": "c-3", "candidates": ["services", "service"] }

// Live event envelope — same payload as /api/events but inside this channel.
{ "type": "event", "event": { "type": "SERVICE_READY", "data": { ... } } }

// Screen lifecycle.
{ "type": "screen_attached", "text": "Lobby-1" }
{ "type": "screen_line", "text": "[12:34:56] Player joined" }
{ "type": "screen_detached" }
{ "type": "screen_error", "text": "Service 'X' not found" }

High-frequency events (NODE_HEARTBEAT, STRESS_TEST_UPDATED) are suppressed on this channel to keep the CLI readable. Subscribe to /api/events directly if you need them.

Session tracking

On the first hello, the controller emits CLI_SESSION_CONNECTED with a GeoIP-resolved location. On disconnect it emits CLI_SESSION_DISCONNECTED with durationSeconds and commandCount. Both events are visible on /api/events and recorded in the cli_sessions database table.

Command output ordering

The controller launches a drain loop for each execute so command output can't block the Ktor I/O thread pool. Output lines are delivered in order of emission, and output_end always follows the last output frame of its command.


/cluster — Agent control channel (separate Netty server)

Path: GET /cluster
Host: [cluster].agent_port (default 8443), not the REST API port
Auth: cluster token (AUTH_REQUEST message) + TLS fingerprint pinning
Transport: wss:// by default ([cluster].tls_enabled = true), ws:// only when TLS is explicitly disabled.

This endpoint lives on the ClusterServer Ktor Netty server, which runs with its own thread pool and its own TLS keystore (cluster.jks — auto-generated self-signed if none is configured). Agents connect with:

  1. Bootstrap — if no cert is pinned, the agent calls GET /api/cluster/bootstrap on the REST port first, pins the returned fingerprint, then opens the TLS WebSocket.
  2. AUTH_REQUEST — first message after the handshake. Presents nodeId, agentVersion, and token (must match [cluster].token).
  3. AUTH_RESPONSE — the controller either accepts or closes the connection.
  4. Heartbeat loop — agent periodically reports node stats, service states, resident memory; controller replies with commands (start/stop service, template hash, etc.).

Message framing is the sealed ClusterMessage hierarchy in nimbus-protocol. The schema lives with the protocol module and evolves with the cluster wire format — read nimbus-protocol/src/main/kotlin/ for the current types.

The controller cannot reject individual messages mid-stream — a misbehaving agent is disconnected entirely. Agents are expected to reconnect with exponential backoff. The TLS certificate is pinned by SHA-256 fingerprint on first contact; changes require the agent to explicitly accept a new cert, matching the SSH-style known_hosts model.

Why a separate server?

  • The cluster channel needs TLS with a keystore the controller generates itself — mixing it with the public REST API (which is typically fronted by a reverse proxy) would force the operator to manage the REST certificate's SANs for agent hostnames too.
  • agent_port is usually firewalled to internal networks. The REST port is often exposed to the operator's office / dashboard via a different path (VPN, bastion, reverse proxy).
  • Throughput: cluster traffic (heartbeats, state sync notifications) uses Netty's connection-per-agent model efficiently without competing with REST request handling.

Keepalives, reconnect, and backpressure

  • All authenticated endpoints inherit Ktor's install-wide WebSocket config: pingPeriod = 15 s, timeout = 30 s, maxFrameSize = 64 KiB.
  • There is no server-side reconnect — clients must retry. Exponential backoff (1 s → 30 s) is recommended.
  • /api/events uses an unbounded internal buffer per subscriber; slow clients will accumulate memory on the server until disconnect. Prefer output_end barriers and explicit flow control on the client side.
  • The cluster channel uses bounded queues per agent; sustained stalls trigger a disconnect so the agent re-registers cleanly.

Example — live event subscriber (Node.js)

import WebSocket from 'ws';

const ws = new WebSocket('ws://127.0.0.1:8080/api/events', {
  headers: { Authorization: `Bearer ${process.env.NIMBUS_API_TOKEN}` }
});

ws.on('message', (buf) => {
  const evt = JSON.parse(buf.toString());
  console.log(evt.timestamp, evt.type, evt.data);
});

ws.on('close', (code, reason) => {
  console.error('closed', code, reason.toString());
  // retry with backoff...
});

Example — executing a command on the multiplex channel

ws.send(JSON.stringify({ type: 'hello',
  text: JSON.stringify({ username: 'alice', hostname: 'workstation', os: 'macOS' })
}));

ws.send(JSON.stringify({ type: 'execute', id: 'r-1', input: 'services list' }));

ws.on('message', (buf) => {
  const m = JSON.parse(buf.toString());
  if (m.id === 'r-1' && m.type === 'output') console.log(m.kind, m.text);
  if (m.id === 'r-1' && m.type === 'output_end') console.log('done.');
});

For the event schema pushed through /api/events and /api/console/stream, see Events reference.