CLI Reference
One binary, every feature. dfdb runs a node, queries a file, drives a
cluster, and rebalances shards — all from the same executable.
Install / run
There are two ways to run the CLI. Both accept identical subcommands.
Dev mode (source checkout)
dotnet run --project src/DocumentForge.Cli -- <command> [args]Published binary
# Build the single-file binary for your platform
./scripts/publish-dfdb.sh # or publish-dfdb.ps1 on Windows
# Then run it from dist/ — no .NET runtime required on the target
./dist/linux-x64/dfdb <command> [args]
./dist/win-x64/dfdb.exe <command> [args]In the rest of this page we write dfdb; substitute whichever form fits your
setup.
Top-level help
dfdb help # or --help, -h
dfdb version # or --version, -vserve
Start the REST API for a single node. This is what the admin UI, the LINQ
client, and the HttpShardTransport talk to.
dfdb serve [--config node.json]
[--node-name NAME]
[--port N]
[--data-dir DIR]
[--api-key KEY]
[--replication-secret SECRET]
[--bind-all]
[--replication-role leader|follower]
[--replication-port N]
[--leader-host H] [--leader-port N]
[--auto-failover-seconds N] [--auto-failover-new-port N]| Flag | Default | Notes |
|---|---|---|
--config | — | Path to a node.json file. Loaded first; other flags override. |
--node-name | node-1 | Shows up in logs, health output, and shard descriptors. |
--port | 5000 | HTTP (or HTTPS, with TLS configured) port. |
--data-dir | ./data | Directory for data.dfdb, WAL, and recovery log. |
--api-key | off | If set, every request must carry Authorization: Bearer <key>. |
--replication-secret | off | Shared secret enforced on the replication handshake. |
--bind-all | loopback | Bind to 0.0.0.0 instead of 127.0.0.1. Only enable behind a trusted network / firewall. |
--replication-role | off | leader or follower. Omit for a single-node setup. |
--replication-port | 5500 | Leader: TCP port for the replication listener. Must differ from --port. |
--leader-host | — | Follower: hostname of the leader. |
--leader-port | — | Follower: replication port on the leader. |
--auto-failover-seconds | off | Follower: promote if the leader is silent this many seconds. |
--auto-failover-new-port | = leader port | Follower: which port to bind as leader once promoted. |
Examples
# Single-node dev
dfdb serve --port 5000 --data-dir ./data
# Production node with API key + shared secret
dfdb serve --config ./node-shard-a.json \
--api-key sk_prod_abc123 \
--replication-secret repl_shared_xyz
# Everything via env vars (useful in containers)
DFDB_NODE_NAME=shard-a \
DFDB_PORT=5001 \
DFDB_DATA_DIR=/var/lib/dfdb/shard-a \
DFDB_API_KEY=sk_prod_abc123 \
dfdb serve
# Leader with replication enabled
dfdb serve --port 5000 --data-dir ./data/leader \
--replication-role leader --replication-port 5500
# Follower with auto-failover
dfdb serve --port 5010 --data-dir ./data/replica \
--replication-role follower \
--leader-host localhost --leader-port 5500 \
--auto-failover-seconds 10Replication is fully CLI-driven. Pass --replication-role (and the
follower flags) and dfdb serve stands up the replication listener or
follower loop alongside the HTTP API. See
Concepts → Replication for the full protocol and
failure modes.
inspect
Offline introspection. Opens a .dfdb file locally and prints stats — size,
page counts, collections, and indexes. Does not start a server.
dfdb inspect <file>
# Example
dfdb inspect ./data/shard-a/data.dfdbquery
Run a single SQL statement against a local file and print the results. Great for scripts and sanity checks. See the SQL reference for the query surface.
dfdb query <file> "<sql>"
# Examples
dfdb query ./data/shard-a/data.dfdb "SELECT COUNT(*) FROM orders"
dfdb query ./data/shard-a/data.dfdb "SELECT * FROM orders WHERE pnr = 'ABC123'"repl
Interactive SQL console. Opens the file, gives you a prompt, executes
statements on Enter. Type stats to see storage info, exit to quit.
dfdb repl <file>
dfdb> SELECT * FROM orders LIMIT 5
dfdb> SELECT status, COUNT(*) FROM orders GROUP BY status
dfdb> CREATE INDEX idx_pnr ON orders (pnr)
dfdb> stats
dfdb> exitcompact
Reclaim physical space left by DELETE. Rewrites the collection’s pages
in-place, releasing freed slots back to the allocator.
dfdb compact <file> <collection>
# Example
dfdb compact ./data/shard-a/data.dfdb ordersNote: compaction takes an exclusive write lock. Don’t run it against a
file that a live dfdb serve has open — stop the server first, compact, then
restart.
seed
Loads realistic airline-reservation sample data (orders with nested passengers and flights). Useful for demos and smoke tests.
dfdb seed <file> [count]
# Examples
dfdb seed ./data/demo.dfdb # default 10,000 orders
dfdb seed ./data/demo.dfdb 250000 # a quarter millioncluster
Build and inspect the cluster.json topology file. This file describes every
shard and what’s sharded vs replicated.
dfdb cluster init <config>
dfdb cluster show <config>
dfdb cluster add-shard <config> <name> <endpoint>
dfdb cluster add-collection <config> <name> (hash <path>|replicated)Examples
# Fresh cluster config
dfdb cluster init cluster.json
# Register three shards
dfdb cluster add-shard cluster.json shard-a http://localhost:5001
dfdb cluster add-shard cluster.json shard-b http://localhost:5002
dfdb cluster add-shard cluster.json shard-c http://localhost:5003
# orders are hash-sharded on pnr; airports are fully replicated to every shard
dfdb cluster add-collection cluster.json orders hash pnr
dfdb cluster add-collection cluster.json airports replicated
# Inspect what we built
dfdb cluster show cluster.jsonhealth
Ping every shard in a cluster config and report status + basic stats. Intended to be safe to run from anywhere — it only reads.
dfdb health <cluster-config>
# Example output: three green dots, response times, doc counts per shard
dfdb health cluster.jsonrebalance
Plan or execute a topology change (adding or removing shards). --plan-only
prints what would move without touching data.
dfdb rebalance <old-config> <new-config> [--plan-only]
# Preview only — no data moves
dfdb rebalance old-cluster.json new-cluster.json --plan-only
# Execute the migration
dfdb rebalance old-cluster.json new-cluster.jsonSee Concepts → Sharding for the online-rebalance protocol and what happens to reads/writes during migration.
router
Run a stateless cluster gateway. The router loads a cluster.json and applies
hash / replicated routing to every request, so a single URL fronts the whole
cluster.
dfdb router --config <cluster.json> [--port N] [--bind-all]| Flag | Default | Purpose |
|---|---|---|
--config, -c | (required) | Path to the cluster.json describing shards and collection strategies |
--port, -p | from config | HTTP listen port; overrides router.port in the config |
--bind-all | off | Listen on all interfaces instead of just localhost |
dfdb router --config cluster.json --port 5000Build the config with dfdb cluster and check shard health with dfdb health.
node.json format
Full schema. Every field except nodeName, port, and dataDir is optional.
{
"nodeName": "shard-a",
"port": 5001,
"dataDir": "/var/lib/dfdb/shard-a",
"bindAllInterfaces": false,
"security": {
"apiKey": "sk_prod_abc123",
"replicationSecret": "repl_shared_xyz",
"tls": {
"certPath": "/etc/dfdb/cert.pfx",
"certPassword": "env:DFDB_CERT_PASSWORD"
}
},
"replication": {
"role": "follower",
"port": 5500,
"leaderHost": "leader.internal",
"leaderPort": 5500,
"autoFailover": {
"silenceSeconds": 10,
"newLeaderPort": 5500
}
}
}Notes:
bindAllInterfaces: trueis equivalent to--bind-all— only enable behind a firewall.certPasswordsupportsenv:VAR_NAMEindirection so you don’t commit passwords to the config file.- Omit the
securityblock entirely for local dev. - Omit the
replicationblock for a single-node setup.roleis either"leader"or"follower". - On a leader, only
roleandportare read. On a follower,leaderHost+leaderPortare required;autoFailoveris optional.
Environment variables
Every flag on dfdb serve has an env-var equivalent. Useful for containers and
orchestrators.
| Flag | Env var |
|---|---|
--node-name | DFDB_NODE_NAME |
--port | DFDB_PORT |
--data-dir | DFDB_DATA_DIR |
--api-key | DFDB_API_KEY |
--replication-secret | DFDB_REPLICATION_SECRET |
--replication-role | DFDB_REPLICATION_ROLE |
--replication-port | DFDB_REPLICATION_PORT |
--leader-host | DFDB_LEADER_HOST |
--leader-port | DFDB_LEADER_PORT |
--auto-failover-seconds | DFDB_AUTO_FAILOVER_SECONDS |
The admin UI also reads one env var at build time:
NEXT_PUBLIC_DFDB_URL=http://localhost:5000 npm run devConfig precedence
When the same setting is provided by multiple sources, the later one wins:
--config node.json— loaded first- CLI flags (
--port,--data-dir, etc.) — override the config file - Env vars (
DFDB_*) — fill in anything still at its default - Defaults — localhost:5000,
./data, no security
Rule of thumb: put stable topology (nodeName, port, dataDir) in the
config file, and pass secrets (--api-key, DFDB_REPLICATION_SECRET, cert
password) via env vars or a secret manager.
REST API endpoints
Once dfdb serve is running, every database operation is available over HTTP.
See the REST API reference for more.
Application (CRUD + query)
The natural way to use the API: address documents by their business key (PNR, email, SKU…), not by DocumentForge’s internal id.
| Method | Path | Purpose |
|---|---|---|
GET | /health | Liveness + node name, version, read-only flag, uptime. Always public — never gated by the API key, so platform health checks (Render, Docker HEALTHCHECK, K8s probes) don’t need credentials. |
POST | /query | Run arbitrary SQL. Body: { "sql": "..." }. Default response is the full materialised JSON envelope. For large result sets, request NDJSON streaming via Accept: application/x-ndjson or ?stream=true — line 1 is the meta envelope, then one document per line. Plan / count / executionMs are also returned in X-DFDB-Plan / X-DFDB-Count / X-DFDB-ExecutionMs headers. 400 on parse error or unknown collection. |
GET | /stats | File size, pages, per-collection counts, all indexes. |
GET | /collections | List all collection names. |
POST | /collections/{name} | Insert a single JSON document. Returns { id }. |
POST | /collections/{name}/bulk | Bulk insert (JSON array). Response includes ids[] for every successful insert and errors[{ index, error }] for failures. Pass ?atomic=true for all-or-nothing semantics. Pass ?skipIndexes=true for cold-load throughput; you must then call POST /admin/rebuild-indexes/{name} before any indexed query. |
GET | /collections/{name} | List documents. Supports ?limit=N. |
GET | /collections/{name}/by/{field}/{value} | Find by your own key — uses an index when one exists. Field name accepts dot/bracket paths (e.g. passenger.lastName, flights[0].departureAirport). |
PUT | /collections/{name}/by/{field}/{value} | Replace by your own key. Body is the full new document JSON. The internal _id of the matched document is preserved server-side. |
DELETE | /collections/{name}/by/{field}/{value} | Delete every document matching that field value. |
DELETE | /collections/{name} | Drop the entire collection. Requires header X-Confirm: true. |
GET | /indexes/{collection} | List indexes on a collection. |
POST | /index | Create an index. Body: { collection, path, name?, unique? }. |
POST | /seed | Populate the demo airline dataset. Body: { "orders": 500 }. |
Application (advanced — by internal _id)
The internal _id is a 16-byte sequential identifier returned by inserts. You
normally don’t want this — prefer /by/{field}/{value} above. Use these only
when you’ve cached the _id from a prior write and want to skip the index
lookup.
| Method | Path | Purpose |
|---|---|---|
GET | /collections/{name}/{_id} | Direct location-map lookup by the internal id (the value of _id, formatted as a Guid). |
PUT | /collections/{name}/{_id} | Replace the document at that internal id. Body is the full new JSON. The _id is preserved automatically. |
DELETE | /collections/{name}/{_id} | Direct delete by internal id. |
Admin
| Method | Path | Purpose |
|---|---|---|
POST | /admin/flush | Flush dirty pages to disk, truncate recovery log. |
POST | /admin/checkpoint | Drain the WAL to a consistent on-disk state. |
POST | /admin/compact/{collection} | Reclaim space from deletes. Returns pages compacted + bytes reclaimed. |
POST | /admin/rebuild-indexes/{collection} | Rebuild every index on the collection from scratch. Needed after a bulk insert that used ?skipIndexes=true, or any time you suspect index drift. |
POST | /admin/rebuild-index/{collection}/{indexName} | Surgical rebuild of a single named index. Use when one specific index has drifted (faster than the all-indexes variant on a large collection). |
Replication
| Method | Path | Purpose |
|---|---|---|
GET | /replication/status | Role, seqs, follower count, gaps, auto-failover state. Safe to poll. |
POST | /replication/start-leader | Body: { port, sharedSecret? }. Opens the replication listener. |
POST | /replication/start-follower | Body: { host, port, sharedSecret? }. |
POST | /replication/read-only | Enter read-only mode (planned handover step 1). |
POST | /replication/read-write | Exit read-only mode (recover / abort handover). |
POST | /replication/promote | Body: { port }. Take over as leader (handover step 2). |
POST | /replication/auto-failover/enable | Body: { silenceSeconds, newLeaderPort }. |
POST | /replication/auto-failover/disable | Stop watching for leader silence. |
Sharding control is CLI-only on purpose. Cluster topology lives in a
cluster.json file you version alongside your infra — not per-node state. Use
dfdb cluster / dfdb health / dfdb rebalance to manage it. Each
individual shard is still a regular node, so every endpoint above works on
each one.