Chapter 04

Replication & Handover

Read scaling with coherent followers. Zero-downtime leader handover for datacenter moves.

Concepts

DocumentForge supports two replication modes. They solve different problems and can be used together.

LogicalPhysical
Stream contentsOperations (Insert / Delete / Index)Raw page bytes
Follower can serve queries✓ — indexes stay coherent✗ — indexes are stale
Follower file byte-identicalNo (layout may differ)Yes (byte-for-byte)
Best forRead scale, planned handoverHot backup, disaster recovery
Sequence numbersYes — guaranteed orderingNo
Catchup on reconnectYes — from last-seen seqNo (manual resync)

Logical replication

On a write, the leader broadcasts an operation record to every connected follower. The follower runs the operation through its own engine, so indexes, location maps, and cached state all stay consistent with the data file.

           Leader                           Follower
       ┌──────────┐                      ┌──────────┐
  write│  Insert  │ ──(seq 42, Insert)──▶│  Insert  │
       │  ↓       │ ──(seq 43, Insert)──▶│  ↓       │
       │  engine  │ ──(seq 44, CreIdx)──▶│  engine  │
       │  ↓       │ ───(heartbeat)──────▶│          │
       │  disk    │                      │  disk    │
       └──────────┘                      └──────────┘

Each op carries a monotonic seq. Followers persist the last-applied seq to a sidecar file. On reconnect, they tell the leader their position and get replayed everything after.

Physical replication

Page-level byte streaming. The follower ends up with a byte-identical data file. Use this when you want a hot backup or plan to manually promote the follower as a new leader without any engine warmup.

Physical replication doesn't solve the "can the follower serve reads" problem — its in-memory structures (indexes, location map) were built from the original data and aren't updated by page writes. Close and reopen the follower to rebuild them.

Setting it up (C# API)

Replication runs over a dedicated TCP port, separate from the HTTP API. At the moment it's wired in code — you write a tiny bootstrap program that opens the database and calls the replication methods. See From the CLI below for the current CLI story and the roadmap.

Leader

using var leader = DocumentForgeDb.OpenOrCreate("prod.dfdb");
leader.StartLogicalReplicationServer(port: 5500);

// Normal writes broadcast automatically
leader.Insert("orders", @"{""pnr"": ""ABC123""}");

Follower (read replica)

using var follower = DocumentForgeDb.OpenOrCreate("replica.dfdb");
follower.StartLogicalReplicationFollower("leader-host", 5500);

// Reads work correctly, with live indexes
var r = follower.Execute("SELECT * FROM orders WHERE pnr = 'ABC123'");

Observability

leader.LeaderCurrentSeq       // latest seq assigned
leader.GetLogicalFollowerCount() // how many followers are connected

follower.FollowerLastSeq         // last applied op
follower.LogicallyReplicatedOps  // running total
follower.GapsDetected            // non-zero means lost ops — investigate!

From the CLI

dfdb serve can stand up a leader or follower directly — no custom code. Pass the role via flag, env var, or a replication block in node.json, and the node will bring up both the HTTP API and the replication wire when it starts.

Leader

dfdb serve --port 5000 --data-dir ./data/leader \
           --replication-role leader \
           --replication-port 5500 \
           --replication-secret repl_shared_xyz

The HTTP API listens on :5000; the replication wire listens on :5500. Followers connect to the replication port, not the HTTP port.

Follower

dfdb serve --port 5010 --data-dir ./data/replica \
           --replication-role follower \
           --leader-host localhost --leader-port 5500 \
           --replication-secret repl_shared_xyz

The follower opens its own HTTP API on :5010 (so your app can read from it), and connects outbound to leader-host:5500 for the replication stream.

Follower with auto-failover

dfdb serve --port 5010 --data-dir ./data/replica \
           --replication-role follower \
           --leader-host localhost --leader-port 5500 \
           --auto-failover-seconds 10

If the leader goes silent for 10 seconds, this follower promotes itself and begins accepting writes on the leader's replication port. Add --auto-failover-new-port N if you want it to take over on a different port. See Auto-failover for the trade-offs.

All-in-one via node.json

For production, put everything in a file. The replication block is honored by dfdb serve out of the box:

{
  "nodeName": "replica-1",
  "port": 5010,
  "dataDir": "/var/lib/dfdb/replica",
  "security": {
    "replicationSecret": "repl_shared_xyz"
  },
  "replication": {
    "role": "follower",
    "leaderHost": "leader.internal",
    "leaderPort": 5500,
    "autoFailover": {
      "silenceSeconds": 10,
      "newLeaderPort": 5500
    }
  }
}
dfdb serve --config ./node.json

Env-var equivalents

Every flag has an env var, handy in containers:

FlagEnv var
--replication-roleDFDB_REPLICATION_ROLE
--replication-portDFDB_REPLICATION_PORT
--leader-hostDFDB_LEADER_HOST
--leader-portDFDB_LEADER_PORT
--auto-failover-secondsDFDB_AUTO_FAILOVER_SECONDS
--replication-secretDFDB_REPLICATION_SECRET
One shared secret on every node. Whatever value you set for replicationSecret on the leader must match on every follower or the handshake is rejected. See Security → Replication secret.

Verifying it's working

On startup, the banner shows the active role:

dfdb serve
  node:        replica-1
  data dir:    /var/lib/dfdb/replica
  listening:   http://localhost:5010
  security:    API key OFF  |  TLS OFF  |  replication-secret ON
  replication: FOLLOWER  following leader.internal:5500  |  auto-failover after 10s (new port :5500)

Then query /stats on both nodes — document counts should match once catchup completes.

Runtime control over HTTP

Everything above is also exposed via REST, so a management tool (or the Postman collection) can drive it. See the REST API reference for the full list:

GET  /replication/status
POST /replication/start-leader       { "port": 5500 }
POST /replication/start-follower     { "host": "leader.internal", "port": 5500 }
POST /replication/read-only          # planned handover step 1
POST /replication/promote            { "port": 5500 }  # step 2
POST /replication/auto-failover/enable   { "silenceSeconds": 10, "newLeaderPort": 5500 }
POST /replication/auto-failover/disable

Direct C# API

The CLI flags and HTTP endpoints map 1:1 to the C# methods, which remain available if you're embedding DocumentForge in your own process instead of running dfdb serve:

db.StartLogicalReplicationServer(port: 5500, sharedSecret: "...");
db.StartLogicalReplicationFollower("leader-host", 5500, sharedSecret: "...");
db.EnableAutoFailover(newLeaderPort: 5500, silenceTimeout: TimeSpan.FromSeconds(10));

Catchup on reconnect

When a follower disconnects and reconnects:

  1. Follower loads its last-applied seq from disk (the .followerseq sidecar file)
  2. Follower sends a handshake to the leader: "I'm at seq N, please bring me up to date"
  3. Leader replays all ops with seq > N from its in-memory ring buffer
  4. Leader adds the follower to its live broadcast list
Ring buffer limit: the leader keeps the last 10,000 ops by default. If a follower is further behind than that, it cannot catch up from the buffer alone. For MVP, you should resync manually (copy the leader's data file to the follower). A future version will handle this by sending a full snapshot.

Planned handover

When you need to move the leader role — typically for maintenance or a datacenter move — use the planned-handover flow. It guarantees zero data loss because the new leader catches up fully before the old one releases the role.

// 1. Old leader enters read-only and waits for new leader to catch up
ulong finalSeq = oldLeader.BeginPlannedHandover(
    followerLastSeqProbe: () => newLeader.FollowerLastSeq,
    timeout: TimeSpan.FromMinutes(5));

// 2. Old leader is now read-only. New leader is caught up.
//    Promote new leader.
newLeader.PromoteToLeader(port: 5500);

// 3. Direct clients to new leader's address.
//    Old leader can be shut down or repurposed as a follower.

Between steps 1 and 2, writes to oldLeader throw DocumentForgeException("Database is in read-only mode..."). After step 2, newLeader accepts writes.

Auto-failover

Planned handover is the safe path — zero data loss, no split-brain risk. For crashes (leader process died, host went dark, network partition), you can opt in to automatic promotion. A follower watches the leader's heartbeat stream and, if nothing arrives within a silence threshold, promotes itself to leader.

using var follower = DocumentForgeDb.OpenOrCreate("replica.dfdb");
follower.StartLogicalReplicationFollower("leader-host", 5500);

// Enable auto-promotion if the leader goes silent for 10 seconds.
follower.EnableAutoFailover(
    newLeaderPort: 5500,
    silenceTimeout: TimeSpan.FromSeconds(10),
    onPromoted: port => Console.WriteLine($"Promoted to leader on :{port}"));

The watcher has a 3-second grace period after startup (so normal connect-time silence doesn't trigger promotion). Once promoted, the follower stops consuming from the old leader, opens its own replication server on newLeaderPort, and begins accepting writes.

Useful properties while running:

follower.WasAutoFailoverPromoted   // true once it has taken over
follower.DisableAutoFailover()     // stop watching (safe to call any time)
Trade-offs. Auto-failover accepts a small data loss window — any ops the leader committed but hadn't yet broadcast are lost. If your application cannot tolerate this, use planned handover for maintenance and reserve auto-failover for genuine crashes. To avoid split-brain, make sure only one follower has auto-failover enabled, or use a witness/quorum mechanism in front of it.

Datacenter move runbook

A concrete recipe for moving a production database from Datacenter A to Datacenter B:

  1. Week before: provision the new DocumentForge instance in DC-B. Have it run as a logical follower of the DC-A leader. Verify FollowerLastSeq matches LeaderCurrentSeq (or is within seconds).
  2. Maintenance window (15 min): inform clients of brief write unavailability.
  3. T+0: call oldLeader.BeginPlannedHandover(...). Writes on DC-A now fail with a clear error.
  4. T+seconds: the call returns the finalSeq once DC-B is fully caught up.
  5. T+immediately: call newLeader.PromoteToLeader(port). DC-B is now the writable leader.
  6. T+now: flip your load-balancer / DNS / connection string to point at DC-B.
  7. Optional: reconfigure DC-A as a follower of DC-B for ongoing disaster recovery.
What could go wrong, and how you'd detect it:
1) If DC-B doesn't catch up within the timeout, BeginPlannedHandover throws and re-enables writes on DC-A. Abort the handover, investigate the network, retry.
2) If DC-B promotes while DC-A is still accepting writes, you'd have split-brain. The protocol prevents this: DC-A is read-only before DC-B is promoted. Never bypass that order.
3) Clients writing to DC-A during the window see an error — make sure your client code has retry logic that can fail over to DC-B.

Caveats and current limits