Setting up replication
Run a leader, point one or more followers at it, and they stream the leader’s changes by sequence number — catching up automatically after a disconnect. dfdb serve can stand up a leader or follower directly, no custom code: pass the role via flag, env var, or a replication block in node.json, and the node brings up both the HTTP API and the replication wire when it starts.
Replication runs over a dedicated TCP port, separate from the HTTP API. For the underlying model — sequence numbers, catchup, failover trade-offs — see the replication concept page.
One shared secret on every node. Whatever value you set for replicationSecret on the leader must match on every follower or the handshake is rejected.
Start a leader
The HTTP API listens on :5000; the replication wire listens on :5500. Followers connect to the replication port, not the HTTP port.
dfdb serve --port 5000 --data-dir ./data/leader \
--replication-role leader \
--replication-port 5500 \
--replication-secret repl_shared_xyzAttach a follower
The follower opens its own HTTP API on :5010 (so your app can read from it), and connects outbound to leader-host:5500 for the replication stream.
dfdb serve --port 5010 --data-dir ./data/replica \
--replication-role follower \
--leader-host localhost --leader-port 5500 \
--replication-secret repl_shared_xyzOn startup, the banner shows the active role:
dfdb serve
node: replica-1
data dir: /var/lib/dfdb/replica
listening: http://localhost:5010
security: API key OFF | TLS OFF | replication-secret ON
replication: FOLLOWER following leader.internal:5500 | auto-failover after 10s (new port :5500)Verify catchup
Query /stats on both nodes — document counts should match once catchup completes. Write to the leader and confirm the row appears on the follower:
curl http://localhost:5000/stats # leader
curl http://localhost:5010/stats # follower — counts converge after catchupTest failover
Bring up the follower with auto-failover enabled. If the leader goes silent for 10 seconds, this follower promotes itself and begins accepting writes on the leader’s replication port:
dfdb serve --port 5010 --data-dir ./data/replica \
--replication-role follower \
--leader-host localhost --leader-port 5500 \
--auto-failover-seconds 10Stop the leader process. After the silence window elapses, the follower promotes; confirm by writing to it. Add --auto-failover-new-port N to take over on a different port.
Auto-failover accepts a small data-loss window and has no built-in split-brain protection — enable it on one follower only. For maintenance moves, prefer planned handover (see below), which guarantees zero data loss.
All-in-one via node.json
For production, put everything in a file. The replication block is honored by dfdb serve out of the box:
{
"nodeName": "replica-1",
"port": 5010,
"dataDir": "/var/lib/dfdb/replica",
"security": {
"replicationSecret": "repl_shared_xyz"
},
"replication": {
"role": "follower",
"leaderHost": "leader.internal",
"leaderPort": 5500,
"autoFailover": {
"silenceSeconds": 10,
"newLeaderPort": 5500
}
}
}dfdb serve --config ./node.jsonEnv-var equivalents
Every flag has an env var, handy in containers:
| Flag | Env var |
|---|---|
--replication-role | DFDB_REPLICATION_ROLE |
--replication-port | DFDB_REPLICATION_PORT |
--leader-host | DFDB_LEADER_HOST |
--leader-port | DFDB_LEADER_PORT |
--auto-failover-seconds | DFDB_AUTO_FAILOVER_SECONDS |
--replication-secret | DFDB_REPLICATION_SECRET |
Runtime control over HTTP
Everything above is also exposed via REST, so a management tool can drive it. See the CLI / REST reference for the full list:
GET /replication/status
POST /replication/start-leader { "port": 5500 }
POST /replication/start-follower { "host": "leader.internal", "port": 5500 }
POST /replication/read-only # planned handover step 1
POST /replication/promote { "port": 5500 } # step 2
POST /replication/auto-failover/enable { "silenceSeconds": 10, "newLeaderPort": 5500 }
POST /replication/auto-failover/disableDirect C# API
The CLI flags and HTTP endpoints map 1:1 to the C# methods, which remain available if you’re embedding DocumentForge in your own process instead of running dfdb serve:
db.StartLogicalReplicationServer(port: 5500, sharedSecret: "...");
db.StartLogicalReplicationFollower("leader-host", 5500, sharedSecret: "...");
db.EnableAutoFailover(newLeaderPort: 5500, silenceTimeout: TimeSpan.FromSeconds(10));Observability properties:
leader.LeaderCurrentSeq // latest seq assigned
leader.GetLogicalFollowerCount() // how many followers are connected
follower.FollowerLastSeq // last applied op
follower.LogicallyReplicatedOps // running total
follower.GapsDetected // non-zero means lost ops — investigate!Planned handover (datacenter move)
For maintenance or a datacenter move, planned handover guarantees zero data loss — the new leader catches up fully before the old one releases the role. A concrete recipe for moving a production database from Datacenter A to Datacenter B:
- Week before: provision the new DocumentForge instance in DC-B. Have it run as a logical follower of the DC-A leader. Verify
FollowerLastSeqmatchesLeaderCurrentSeq(or is within seconds). - Maintenance window (15 min): inform clients of brief write unavailability.
- T+0: call
oldLeader.BeginPlannedHandover(...). Writes on DC-A now fail with a clear error. - T+seconds: the call returns the
finalSeqonce DC-B is fully caught up. - T+immediately: call
newLeader.PromoteToLeader(port). DC-B is now the writable leader. - T+now: flip your load-balancer / DNS / connection string to point at DC-B.
- Optional: reconfigure DC-A as a follower of DC-B for ongoing disaster recovery.
What could go wrong, and how you’d detect it:
- If DC-B doesn’t catch up within the timeout,
BeginPlannedHandoverthrows and re-enables writes on DC-A. Abort the handover, investigate the network, retry. - If DC-B promotes while DC-A is still accepting writes, you’d have split-brain. The protocol prevents this: DC-A is read-only before DC-B is promoted. Never bypass that order.
- Clients writing to DC-A during the window see an error — make sure your client code has retry logic that can fail over to DC-B.