Chapter 09

CLI Reference

One binary, every feature. dfdb runs a node, queries a file, drives a cluster, and rebalances shards — all from the same executable.

Install / run

There are two ways to run the CLI. Both accept identical subcommands.

Dev mode (source checkout)

dotnet run --project src/DocumentForge.Cli -- <command> [args]

Published binary

# Build the single-file binary for your platform
./scripts/publish-dfdb.sh     # or publish-dfdb.ps1 on Windows

# Then run it from dist/ — no .NET runtime required on the target
./dist/linux-x64/dfdb <command> [args]
./dist/win-x64/dfdb.exe <command> [args]

In the rest of this page we write dfdb; substitute whichever form fits your setup.

Top-level help

dfdb help        # or --help, -h
dfdb version     # or --version, -v

serve

Start the REST API for a single node. This is what the admin UI, the LINQ client, and the HttpShardTransport talk to.

dfdb serve [--config node.json]
            [--node-name NAME]
            [--port N]
            [--data-dir DIR]
            [--api-key KEY]
            [--replication-secret SECRET]
            [--bind-all]
            [--replication-role leader|follower]
            [--replication-port N]
            [--leader-host H] [--leader-port N]
            [--auto-failover-seconds N] [--auto-failover-new-port N]
FlagDefaultNotes
--configPath to a node.json file. Loaded first; other flags override.
--node-namenode-1Shows up in logs, health output, and shard descriptors.
--port5000HTTP (or HTTPS, with TLS configured) port.
--data-dir./dataDirectory for data.dfdb, WAL, and recovery log.
--api-keyoffIf set, every request must carry Authorization: Bearer <key>.
--replication-secretoffShared secret enforced on the replication handshake.
--bind-allloopbackBind to 0.0.0.0 instead of 127.0.0.1. Only enable behind a trusted network / firewall.
--replication-roleoffleader or follower. Omit for a single-node setup.
--replication-port5500Leader: TCP port for the replication listener. Must differ from --port.
--leader-hostFollower: hostname of the leader.
--leader-portFollower: replication port on the leader.
--auto-failover-secondsoffFollower: promote if the leader is silent this many seconds.
--auto-failover-new-port= leader portFollower: which port to bind as leader once promoted.

Examples

# Single-node dev
dfdb serve --port 5000 --data-dir ./data

# Production node with API key + shared secret
dfdb serve --config ./node-shard-a.json \
           --api-key sk_prod_abc123 \
           --replication-secret repl_shared_xyz

# Everything via env vars (useful in containers)
DFDB_NODE_NAME=shard-a \
DFDB_PORT=5001 \
DFDB_DATA_DIR=/var/lib/dfdb/shard-a \
DFDB_API_KEY=sk_prod_abc123 \
dfdb serve

# Leader with replication enabled
dfdb serve --port 5000 --data-dir ./data/leader \
           --replication-role leader --replication-port 5500

# Follower with auto-failover
dfdb serve --port 5010 --data-dir ./data/replica \
           --replication-role follower \
           --leader-host localhost --leader-port 5500 \
           --auto-failover-seconds 10
Replication is fully CLI-driven. Pass --replication-role (and the follower flags) and dfdb serve stands up the replication listener or follower loop alongside the HTTP API. See Replication → From the CLI for the full protocol and failure modes.

inspect

Offline introspection. Opens a .dfdb file locally and prints stats — size, page counts, collections, and indexes. Does not start a server.

dfdb inspect <file>

# Example
dfdb inspect ./data/shard-a/data.dfdb

query

Run a single SQL statement against a local file and print the results. Great for scripts and sanity checks.

dfdb query <file> "<sql>"

# Examples
dfdb query ./data/shard-a/data.dfdb "SELECT COUNT(*) FROM orders"
dfdb query ./data/shard-a/data.dfdb "SELECT * FROM orders WHERE pnr = 'ABC123'"

repl

Interactive SQL console. Opens the file, gives you a prompt, executes statements on Enter. Type stats to see storage info, exit to quit.

dfdb repl <file>

dfdb> SELECT * FROM orders LIMIT 5
dfdb> SELECT status, COUNT(*) FROM orders GROUP BY status
dfdb> CREATE INDEX idx_pnr ON orders (pnr)
dfdb> stats
dfdb> exit

compact

Reclaim physical space left by DELETE. Rewrites the collection's pages in-place, releasing freed slots back to the allocator.

dfdb compact <file> <collection>

# Example
dfdb compact ./data/shard-a/data.dfdb orders
Note: compaction takes an exclusive write lock. Don't run it against a file that a live dfdb serve has open — stop the server first, compact, then restart.

seed

Loads realistic airline-reservation sample data (orders with nested passengers and flights). Useful for demos and smoke tests.

dfdb seed <file> [count]

# Examples
dfdb seed ./data/demo.dfdb               # default 10,000 orders
dfdb seed ./data/demo.dfdb 250000        # a quarter million

cluster

Build and inspect the cluster.json topology file. This file describes every shard and what's sharded vs replicated.

dfdb cluster init            <config>
dfdb cluster show            <config>
dfdb cluster add-shard       <config> <name> <endpoint>
dfdb cluster add-collection  <config> <name> (hash <path>|replicated)

Examples

# Fresh cluster config
dfdb cluster init cluster.json

# Register three shards
dfdb cluster add-shard cluster.json shard-a http://localhost:5001
dfdb cluster add-shard cluster.json shard-b http://localhost:5002
dfdb cluster add-shard cluster.json shard-c http://localhost:5003

# orders are hash-sharded on pnr; airports are fully replicated to every shard
dfdb cluster add-collection cluster.json orders   hash pnr
dfdb cluster add-collection cluster.json airports replicated

# Inspect what we built
dfdb cluster show cluster.json

health

Ping every shard in a cluster config and report status + basic stats. Intended to be safe to run from anywhere — it only reads.

dfdb health <cluster-config>

# Example output: three green dots, response times, doc counts per shard
dfdb health cluster.json

rebalance

Plan or execute a topology change (adding or removing shards). --plan-only prints what would move without touching data.

dfdb rebalance <old-config> <new-config> [--plan-only]

# Preview only — no data moves
dfdb rebalance old-cluster.json new-cluster.json --plan-only

# Execute the migration
dfdb rebalance old-cluster.json new-cluster.json

See the Sharding guide for the online-rebalance protocol and what happens to reads/writes during migration.

node.json format

Full schema. Every field except nodeName, port, and dataDir is optional.

{
  "nodeName": "shard-a",
  "port": 5001,
  "dataDir": "/var/lib/dfdb/shard-a",
  "bindAllInterfaces": false,
  "security": {
    "apiKey": "sk_prod_abc123",
    "replicationSecret": "repl_shared_xyz",
    "tls": {
      "certPath": "/etc/dfdb/cert.pfx",
      "certPassword": "env:DFDB_CERT_PASSWORD"
    }
  },
  "replication": {
    "role": "follower",
    "port": 5500,
    "leaderHost": "leader.internal",
    "leaderPort": 5500,
    "autoFailover": {
      "silenceSeconds": 10,
      "newLeaderPort": 5500
    }
  }
}

Notes:

Environment variables

Every flag on dfdb serve has an env-var equivalent. Useful for containers and orchestrators.

FlagEnv var
--node-nameDFDB_NODE_NAME
--portDFDB_PORT
--data-dirDFDB_DATA_DIR
--api-keyDFDB_API_KEY
--replication-secretDFDB_REPLICATION_SECRET
--replication-roleDFDB_REPLICATION_ROLE
--replication-portDFDB_REPLICATION_PORT
--leader-hostDFDB_LEADER_HOST
--leader-portDFDB_LEADER_PORT
--auto-failover-secondsDFDB_AUTO_FAILOVER_SECONDS

The admin UI also reads one env var at build time:

NEXT_PUBLIC_DFDB_URL=http://localhost:5000 npm run dev

Config precedence

When the same setting is provided by multiple sources, the later one wins:

  1. --config node.json — loaded first
  2. CLI flags (--port, --data-dir, etc.) — override the config file
  3. Env vars (DFDB_*) — fill in anything still at its default
  4. Defaults — localhost:5000, ./data, no security
Rule of thumb: put stable topology (nodeName, port, dataDir) in the config file, and pass secrets (--api-key, DFDB_REPLICATION_SECRET, cert password) via env vars or a secret manager.

REST API endpoints

Once dfdb serve is running, every database operation is available over HTTP. The Postman collection covers them all with working examples.

Application (CRUD + query)

The natural way to use the API: address documents by their business key (PNR, email, SKU…), not by DocumentForge's internal id.

MethodPathPurpose
GET/healthLiveness + node name, version, read-only flag, uptime. Always public — never gated by the API key, so platform health checks (Render, Docker HEALTHCHECK, K8s probes) don't need credentials.
POST/queryRun arbitrary SQL. Body: { "sql": "..." }. Default response is the full materialised JSON envelope. For large result sets, request NDJSON streaming via Accept: application/x-ndjson or ?stream=true — line 1 is the meta envelope, then one document per line. Plan / count / executionMs are also returned in X-DFDB-Plan / X-DFDB-Count / X-DFDB-ExecutionMs headers.
GET/statsFile size, pages, per-collection counts, all indexes.
GET/collectionsList all collection names.
POST/collections/{name}Insert a single JSON document. Returns { id }.
POST/collections/{name}/bulkBulk insert (JSON array). Rebuilds indexes after by default. Pass ?skipIndexes=true for cold-load throughput; you must then call POST /admin/rebuild-indexes/{name} before any indexed query.
GET/collections/{name}List documents. Supports ?limit=N.
GET/collections/{name}/by/{field}/{value}Find by your own key — uses an index when one exists. Field name accepts dot/bracket paths (e.g. passenger.lastName, flights[0].departureAirport).
PUT/collections/{name}/by/{field}/{value}Replace by your own key. Body is the full new document JSON. The internal _id of the matched document is preserved server-side — you don't need to include it.
DELETE/collections/{name}/by/{field}/{value}Delete every document matching that field value.
DELETE/collections/{name}Drop the entire collection. Requires header X-Confirm: true.
GET/indexes/{collection}List indexes on a collection.
POST/indexCreate an index. Body: { collection, path, name?, unique? }.
POST/seedPopulate the demo airline dataset. Body: { "orders": 500 }.

Application (advanced — by internal _id)

The internal _id is a 16-byte sequential identifier returned by inserts. You normally don't want this — prefer /by/{field}/{value} above. Use these only when you've cached the _id from a prior write and want to skip the index lookup.

MethodPathPurpose
GET/collections/{name}/{_id}Direct location-map lookup by the internal id (the value of _id, formatted as a Guid).
PUT/collections/{name}/{_id}Replace the document at that internal id. Body is the full new JSON. The _id is preserved on the new doc automatically.
DELETE/collections/{name}/{_id}Direct delete by internal id.

Admin

MethodPathPurpose
POST/admin/flushFlush dirty pages to disk, truncate recovery log.
POST/admin/checkpointDrain the WAL to a consistent on-disk state.
POST/admin/compact/{collection}Reclaim space from deletes. Returns pages compacted + bytes reclaimed.
POST/admin/rebuild-indexes/{collection}Rebuild every index on the collection from scratch. Needed after a bulk insert that used ?skipIndexes=true, or any time you suspect index drift.

Replication

MethodPathPurpose
GET/replication/statusRole, seqs, follower count, gaps, auto-failover state. Safe to poll.
POST/replication/start-leaderBody: { port, sharedSecret? }. Opens the replication listener.
POST/replication/start-followerBody: { host, port, sharedSecret? }.
POST/replication/read-onlyEnter read-only mode (planned handover step 1).
POST/replication/read-writeExit read-only mode (recover / abort handover).
POST/replication/promoteBody: { port }. Take over as leader (handover step 2).
POST/replication/auto-failover/enableBody: { silenceSeconds, newLeaderPort }.
POST/replication/auto-failover/disableStop watching for leader silence.
Sharding control is CLI-only on purpose. Cluster topology lives in a cluster.json file you version alongside your infra — not per-node state. Use dfdb cluster / dfdb health / dfdb rebalance to manage it. Each individual shard is still a regular node, so every endpoint above works on each one.

Postman collection

A ready-to-import Postman collection lives at docs/postman/DocumentForge.postman_collection.json. Two folders:

Setting it up

  1. Open Postman → FileImport → choose the DocumentForge.postman_collection.json file.
  2. On the collection, set baseUrl (default http://localhost:5000) and, if you started the node with --api-key, set apiKey.
  3. Run Application → Seed sample airline data to get ~500 realistic orders + flights with 5 indexes.
  4. Pre-request scripts auto-save returned ids to collection variables so Insert → Find → Delete chains work without hand-editing URLs.

See docs/postman/README.md for the variable reference and recommended first-run order.