CLI Reference

One binary, every feature. dfdb runs a node, queries a file, drives a cluster, and rebalances shards — all from the same executable.

Install / run

There are two ways to run the CLI. Both accept identical subcommands.

Dev mode (source checkout)


dotnet run --project src/DocumentForge.Cli -- <command> [args]

Published binary


# Build the single-file binary for your platform
./scripts/publish-dfdb.sh     # or publish-dfdb.ps1 on Windows
 
# Then run it from dist/ — no .NET runtime required on the target
./dist/linux-x64/dfdb <command> [args]
./dist/win-x64/dfdb.exe <command> [args]

In the rest of this page we write dfdb; substitute whichever form fits your setup.

Top-level help


dfdb help        # or --help, -h
dfdb version     # or --version, -v

serve

Start the REST API for a single node. This is what the admin UI, the LINQ client, and the HttpShardTransport talk to.


dfdb serve [--config node.json]
           [--node-name NAME]
           [--port N]
           [--data-dir DIR]
           [--api-key KEY]
           [--replication-secret SECRET]
           [--bind-all]
           [--replication-role leader|follower]
           [--replication-port N]
           [--leader-host H] [--leader-port N]
           [--auto-failover-seconds N] [--auto-failover-new-port N]

Flag	Default	Notes
`--config`	—	Path to a `node.json` file. Loaded first; other flags override.
`--node-name`	`node-1`	Shows up in logs, health output, and shard descriptors.
`--port`	`5000`	HTTP (or HTTPS, with TLS configured) port.
`--data-dir`	`./data`	Directory for `data.dfdb`, WAL, and recovery log.
`--api-key`	off	If set, every request must carry `Authorization: Bearer <key>`.
`--replication-secret`	off	Shared secret enforced on the replication handshake.
`--bind-all`	loopback	Bind to `0.0.0.0` instead of `127.0.0.1`. Only enable behind a trusted network / firewall.
`--replication-role`	off	`leader` or `follower`. Omit for a single-node setup.
`--replication-port`	`5500`	Leader: TCP port for the replication listener. Must differ from `--port`.
`--leader-host`	—	Follower: hostname of the leader.
`--leader-port`	—	Follower: replication port on the leader.
`--auto-failover-seconds`	off	Follower: promote if the leader is silent this many seconds.
`--auto-failover-new-port`	= leader port	Follower: which port to bind as leader once promoted.

Examples


# Single-node dev
dfdb serve --port 5000 --data-dir ./data
 
# Production node with API key + shared secret
dfdb serve --config ./node-shard-a.json \
           --api-key sk_prod_abc123 \
           --replication-secret repl_shared_xyz
 
# Everything via env vars (useful in containers)
DFDB_NODE_NAME=shard-a \
DFDB_PORT=5001 \
DFDB_DATA_DIR=/var/lib/dfdb/shard-a \
DFDB_API_KEY=sk_prod_abc123 \
dfdb serve
 
# Leader with replication enabled
dfdb serve --port 5000 --data-dir ./data/leader \
           --replication-role leader --replication-port 5500
 
# Follower with auto-failover
dfdb serve --port 5010 --data-dir ./data/replica \
           --replication-role follower \
           --leader-host localhost --leader-port 5500 \
           --auto-failover-seconds 10

Replication is fully CLI-driven. Pass --replication-role (and the follower flags) and dfdb serve stands up the replication listener or follower loop alongside the HTTP API. See Concepts → Replication for the full protocol and failure modes.

inspect

Offline introspection. Opens a .dfdb file locally and prints stats — size, page counts, collections, and indexes. Does not start a server.


dfdb inspect <file>
 
# Example
dfdb inspect ./data/shard-a/data.dfdb

query

Run a single SQL statement against a local file and print the results. Great for scripts and sanity checks. See the SQL reference for the query surface.


dfdb query <file> "<sql>"
 
# Examples
dfdb query ./data/shard-a/data.dfdb "SELECT COUNT(*) FROM orders"
dfdb query ./data/shard-a/data.dfdb "SELECT * FROM orders WHERE pnr = 'ABC123'"

repl

Interactive SQL console. Opens the file, gives you a prompt, executes statements on Enter. Type stats to see storage info, exit to quit.


dfdb repl <file>
 
dfdb> SELECT * FROM orders LIMIT 5
dfdb> SELECT status, COUNT(*) FROM orders GROUP BY status
dfdb> CREATE INDEX idx_pnr ON orders (pnr)
dfdb> stats
dfdb> exit

compact

Reclaim physical space left by DELETE. Rewrites the collection’s pages in-place, releasing freed slots back to the allocator.


dfdb compact <file> <collection>
 
# Example
dfdb compact ./data/shard-a/data.dfdb orders

Note: compaction takes an exclusive write lock. Don’t run it against a file that a live dfdb serve has open — stop the server first, compact, then restart.

seed

Loads realistic airline-reservation sample data (orders with nested passengers and flights). Useful for demos and smoke tests.


dfdb seed <file> [count]
 
# Examples
dfdb seed ./data/demo.dfdb               # default 10,000 orders
dfdb seed ./data/demo.dfdb 250000        # a quarter million

cluster

Build and inspect the cluster.json topology file. This file describes every shard and what’s sharded vs replicated.


dfdb cluster init            <config>
dfdb cluster show            <config>
dfdb cluster add-shard       <config> <name> <endpoint>
dfdb cluster add-collection  <config> <name> (hash <path>|replicated)

Examples


# Fresh cluster config
dfdb cluster init cluster.json
 
# Register three shards
dfdb cluster add-shard cluster.json shard-a http://localhost:5001
dfdb cluster add-shard cluster.json shard-b http://localhost:5002
dfdb cluster add-shard cluster.json shard-c http://localhost:5003
 
# orders are hash-sharded on pnr; airports are fully replicated to every shard
dfdb cluster add-collection cluster.json orders   hash pnr
dfdb cluster add-collection cluster.json airports replicated
 
# Inspect what we built
dfdb cluster show cluster.json

health

Ping every shard in a cluster config and report status + basic stats. Intended to be safe to run from anywhere — it only reads.


dfdb health <cluster-config>
 
# Example output: three green dots, response times, doc counts per shard
dfdb health cluster.json

rebalance

Plan or execute a topology change (adding or removing shards). --plan-only prints what would move without touching data.


dfdb rebalance <old-config> <new-config> [--plan-only]
 
# Preview only — no data moves
dfdb rebalance old-cluster.json new-cluster.json --plan-only
 
# Execute the migration
dfdb rebalance old-cluster.json new-cluster.json

See Concepts → Sharding for the online-rebalance protocol and what happens to reads/writes during migration.

router

Run a stateless cluster gateway. The router loads a cluster.json and applies hash / replicated routing to every request, so a single URL fronts the whole cluster.


dfdb router --config <cluster.json> [--port N] [--bind-all]

Flag	Default	Purpose
`--config`, `-c`	(required)	Path to the `cluster.json` describing shards and collection strategies
`--port`, `-p`	from config	HTTP listen port; overrides `router.port` in the config
`--bind-all`	off	Listen on all interfaces instead of just localhost


dfdb router --config cluster.json --port 5000

Build the config with dfdb cluster and check shard health with dfdb health.

node.json format

Full schema. Every field except nodeName, port, and dataDir is optional.


{
  "nodeName": "shard-a",
  "port": 5001,
  "dataDir": "/var/lib/dfdb/shard-a",
  "bindAllInterfaces": false,
  "security": {
    "apiKey": "sk_prod_abc123",
    "replicationSecret": "repl_shared_xyz",
    "tls": {
      "certPath": "/etc/dfdb/cert.pfx",
      "certPassword": "env:DFDB_CERT_PASSWORD"
    }
  },
  "replication": {
    "role": "follower",
    "port": 5500,
    "leaderHost": "leader.internal",
    "leaderPort": 5500,
    "autoFailover": {
      "silenceSeconds": 10,
      "newLeaderPort": 5500
    }
  }
}

Notes:

bindAllInterfaces: true is equivalent to --bind-all — only enable behind a firewall.
certPassword supports env:VAR_NAME indirection so you don’t commit passwords to the config file.
Omit the security block entirely for local dev.
Omit the replication block for a single-node setup. role is either "leader" or "follower".
On a leader, only role and port are read. On a follower, leaderHost + leaderPort are required; autoFailover is optional.

Environment variables

Every flag on dfdb serve has an env-var equivalent. Useful for containers and orchestrators.

Flag	Env var
`--node-name`	`DFDB_NODE_NAME`
`--port`	`DFDB_PORT`
`--data-dir`	`DFDB_DATA_DIR`
`--api-key`	`DFDB_API_KEY`
`--replication-secret`	`DFDB_REPLICATION_SECRET`
`--replication-role`	`DFDB_REPLICATION_ROLE`
`--replication-port`	`DFDB_REPLICATION_PORT`
`--leader-host`	`DFDB_LEADER_HOST`
`--leader-port`	`DFDB_LEADER_PORT`
`--auto-failover-seconds`	`DFDB_AUTO_FAILOVER_SECONDS`

The admin UI also reads one env var at build time:


NEXT_PUBLIC_DFDB_URL=http://localhost:5000 npm run dev

Config precedence

When the same setting is provided by multiple sources, the later one wins:

--config node.json — loaded first
CLI flags (--port, --data-dir, etc.) — override the config file
Env vars (DFDB_*) — fill in anything still at its default
Defaults — localhost:5000, ./data, no security

Rule of thumb: put stable topology (nodeName, port, dataDir) in the config file, and pass secrets (--api-key, DFDB_REPLICATION_SECRET, cert password) via env vars or a secret manager.

REST API endpoints

Once dfdb serve is running, every database operation is available over HTTP. See the REST API reference for more.

Application (CRUD + query)

The natural way to use the API: address documents by their business key (PNR, email, SKU…), not by DocumentForge’s internal id.

Method	Path	Purpose
`GET`	`/health`	Liveness + node name, version, read-only flag, uptime. Always public — never gated by the API key, so platform health checks (Render, Docker HEALTHCHECK, K8s probes) don’t need credentials.
`POST`	`/query`	Run arbitrary SQL. Body: `{ "sql": "..." }`. Default response is the full materialised JSON envelope. For large result sets, request NDJSON streaming via `Accept: application/x-ndjson` or `?stream=true` — line 1 is the meta envelope, then one document per line. Plan / count / executionMs are also returned in `X-DFDB-Plan` / `X-DFDB-Count` / `X-DFDB-ExecutionMs` headers. 400 on parse error or unknown collection.
`GET`	`/stats`	File size, pages, per-collection counts, all indexes.
`GET`	`/collections`	List all collection names.
`POST`	`/collections/{name}`	Insert a single JSON document. Returns `{ id }`.
`POST`	`/collections/{name}/bulk`	Bulk insert (JSON array). Response includes `ids[]` for every successful insert and `errors[{ index, error }]` for failures. Pass `?atomic=true` for all-or-nothing semantics. Pass `?skipIndexes=true` for cold-load throughput; you must then call `POST /admin/rebuild-indexes/{name}` before any indexed query.
`GET`	`/collections/{name}`	List documents. Supports `?limit=N`.
`GET`	`/collections/{name}/by/{field}/{value}`	Find by your own key — uses an index when one exists. Field name accepts dot/bracket paths (e.g. `passenger.lastName`, `flights[0].departureAirport`).
`PUT`	`/collections/{name}/by/{field}/{value}`	Replace by your own key. Body is the full new document JSON. The internal `_id` of the matched document is preserved server-side.
`DELETE`	`/collections/{name}/by/{field}/{value}`	Delete every document matching that field value.
`DELETE`	`/collections/{name}`	Drop the entire collection. Requires header `X-Confirm: true`.
`GET`	`/indexes/{collection}`	List indexes on a collection.
`POST`	`/index`	Create an index. Body: `{ collection, path, name?, unique? }`.
`POST`	`/seed`	Populate the demo airline dataset. Body: `{ "orders": 500 }`.

Application (advanced — by internal _id)

The internal _id is a 16-byte sequential identifier returned by inserts. You normally don’t want this — prefer /by/{field}/{value} above. Use these only when you’ve cached the _id from a prior write and want to skip the index lookup.

Method	Path	Purpose
`GET`	`/collections/{name}/{_id}`	Direct location-map lookup by the internal id (the value of `_id`, formatted as a Guid).
`PUT`	`/collections/{name}/{_id}`	Replace the document at that internal id. Body is the full new JSON. The `_id` is preserved automatically.
`DELETE`	`/collections/{name}/{_id}`	Direct delete by internal id.

Admin

Method	Path	Purpose
`POST`	`/admin/flush`	Flush dirty pages to disk, truncate recovery log.
`POST`	`/admin/checkpoint`	Drain the WAL to a consistent on-disk state.
`POST`	`/admin/compact/{collection}`	Reclaim space from deletes. Returns pages compacted + bytes reclaimed.
`POST`	`/admin/rebuild-indexes/{collection}`	Rebuild every index on the collection from scratch. Needed after a bulk insert that used `?skipIndexes=true`, or any time you suspect index drift.
`POST`	`/admin/rebuild-index/{collection}/{indexName}`	Surgical rebuild of a single named index. Use when one specific index has drifted (faster than the all-indexes variant on a large collection).

Replication

Method	Path	Purpose
`GET`	`/replication/status`	Role, seqs, follower count, gaps, auto-failover state. Safe to poll.
`POST`	`/replication/start-leader`	Body: `{ port, sharedSecret? }`. Opens the replication listener.
`POST`	`/replication/start-follower`	Body: `{ host, port, sharedSecret? }`.
`POST`	`/replication/read-only`	Enter read-only mode (planned handover step 1).
`POST`	`/replication/read-write`	Exit read-only mode (recover / abort handover).
`POST`	`/replication/promote`	Body: `{ port }`. Take over as leader (handover step 2).
`POST`	`/replication/auto-failover/enable`	Body: `{ silenceSeconds, newLeaderPort }`.
`POST`	`/replication/auto-failover/disable`	Stop watching for leader silence.

Sharding control is CLI-only on purpose. Cluster topology lives in a cluster.json file you version alongside your infra — not per-node state. Use dfdb cluster / dfdb health / dfdb rebalance to manage it. Each individual shard is still a regular node, so every endpoint above works on each one.