Skip to Content

CLI Reference

One binary, every feature. dfdb runs a node, queries a file, drives a cluster, and rebalances shards — all from the same executable.

Install / run

There are two ways to run the CLI. Both accept identical subcommands.

Dev mode (source checkout)

dotnet run --project src/DocumentForge.Cli -- <command> [args]

Published binary

# Build the single-file binary for your platform ./scripts/publish-dfdb.sh # or publish-dfdb.ps1 on Windows # Then run it from dist/ — no .NET runtime required on the target ./dist/linux-x64/dfdb <command> [args] ./dist/win-x64/dfdb.exe <command> [args]

In the rest of this page we write dfdb; substitute whichever form fits your setup.

Top-level help

dfdb help # or --help, -h dfdb version # or --version, -v

serve

Start the REST API for a single node. This is what the admin UI, the LINQ client, and the HttpShardTransport talk to.

dfdb serve [--config node.json] [--node-name NAME] [--port N] [--data-dir DIR] [--api-key KEY] [--replication-secret SECRET] [--bind-all] [--replication-role leader|follower] [--replication-port N] [--leader-host H] [--leader-port N] [--auto-failover-seconds N] [--auto-failover-new-port N]
FlagDefaultNotes
--configPath to a node.json file. Loaded first; other flags override.
--node-namenode-1Shows up in logs, health output, and shard descriptors.
--port5000HTTP (or HTTPS, with TLS configured) port.
--data-dir./dataDirectory for data.dfdb, WAL, and recovery log.
--api-keyoffIf set, every request must carry Authorization: Bearer <key>.
--replication-secretoffShared secret enforced on the replication handshake.
--bind-allloopbackBind to 0.0.0.0 instead of 127.0.0.1. Only enable behind a trusted network / firewall.
--replication-roleoffleader or follower. Omit for a single-node setup.
--replication-port5500Leader: TCP port for the replication listener. Must differ from --port.
--leader-hostFollower: hostname of the leader.
--leader-portFollower: replication port on the leader.
--auto-failover-secondsoffFollower: promote if the leader is silent this many seconds.
--auto-failover-new-port= leader portFollower: which port to bind as leader once promoted.

Examples

# Single-node dev dfdb serve --port 5000 --data-dir ./data # Production node with API key + shared secret dfdb serve --config ./node-shard-a.json \ --api-key sk_prod_abc123 \ --replication-secret repl_shared_xyz # Everything via env vars (useful in containers) DFDB_NODE_NAME=shard-a \ DFDB_PORT=5001 \ DFDB_DATA_DIR=/var/lib/dfdb/shard-a \ DFDB_API_KEY=sk_prod_abc123 \ dfdb serve # Leader with replication enabled dfdb serve --port 5000 --data-dir ./data/leader \ --replication-role leader --replication-port 5500 # Follower with auto-failover dfdb serve --port 5010 --data-dir ./data/replica \ --replication-role follower \ --leader-host localhost --leader-port 5500 \ --auto-failover-seconds 10

Replication is fully CLI-driven. Pass --replication-role (and the follower flags) and dfdb serve stands up the replication listener or follower loop alongside the HTTP API. See Concepts → Replication for the full protocol and failure modes.

inspect

Offline introspection. Opens a .dfdb file locally and prints stats — size, page counts, collections, and indexes. Does not start a server.

dfdb inspect <file> # Example dfdb inspect ./data/shard-a/data.dfdb

query

Run a single SQL statement against a local file and print the results. Great for scripts and sanity checks. See the SQL reference for the query surface.

dfdb query <file> "<sql>" # Examples dfdb query ./data/shard-a/data.dfdb "SELECT COUNT(*) FROM orders" dfdb query ./data/shard-a/data.dfdb "SELECT * FROM orders WHERE pnr = 'ABC123'"

repl

Interactive SQL console. Opens the file, gives you a prompt, executes statements on Enter. Type stats to see storage info, exit to quit.

dfdb repl <file> dfdb> SELECT * FROM orders LIMIT 5 dfdb> SELECT status, COUNT(*) FROM orders GROUP BY status dfdb> CREATE INDEX idx_pnr ON orders (pnr) dfdb> stats dfdb> exit

compact

Reclaim physical space left by DELETE. Rewrites the collection’s pages in-place, releasing freed slots back to the allocator.

dfdb compact <file> <collection> # Example dfdb compact ./data/shard-a/data.dfdb orders

Note: compaction takes an exclusive write lock. Don’t run it against a file that a live dfdb serve has open — stop the server first, compact, then restart.

seed

Loads realistic airline-reservation sample data (orders with nested passengers and flights). Useful for demos and smoke tests.

dfdb seed <file> [count] # Examples dfdb seed ./data/demo.dfdb # default 10,000 orders dfdb seed ./data/demo.dfdb 250000 # a quarter million

cluster

Build and inspect the cluster.json topology file. This file describes every shard and what’s sharded vs replicated.

dfdb cluster init <config> dfdb cluster show <config> dfdb cluster add-shard <config> <name> <endpoint> dfdb cluster add-collection <config> <name> (hash <path>|replicated)

Examples

# Fresh cluster config dfdb cluster init cluster.json # Register three shards dfdb cluster add-shard cluster.json shard-a http://localhost:5001 dfdb cluster add-shard cluster.json shard-b http://localhost:5002 dfdb cluster add-shard cluster.json shard-c http://localhost:5003 # orders are hash-sharded on pnr; airports are fully replicated to every shard dfdb cluster add-collection cluster.json orders hash pnr dfdb cluster add-collection cluster.json airports replicated # Inspect what we built dfdb cluster show cluster.json

health

Ping every shard in a cluster config and report status + basic stats. Intended to be safe to run from anywhere — it only reads.

dfdb health <cluster-config> # Example output: three green dots, response times, doc counts per shard dfdb health cluster.json

rebalance

Plan or execute a topology change (adding or removing shards). --plan-only prints what would move without touching data.

dfdb rebalance <old-config> <new-config> [--plan-only] # Preview only — no data moves dfdb rebalance old-cluster.json new-cluster.json --plan-only # Execute the migration dfdb rebalance old-cluster.json new-cluster.json

See Concepts → Sharding for the online-rebalance protocol and what happens to reads/writes during migration.

router

Run a stateless cluster gateway. The router loads a cluster.json and applies hash / replicated routing to every request, so a single URL fronts the whole cluster.

dfdb router --config <cluster.json> [--port N] [--bind-all]
FlagDefaultPurpose
--config, -c(required)Path to the cluster.json describing shards and collection strategies
--port, -pfrom configHTTP listen port; overrides router.port in the config
--bind-alloffListen on all interfaces instead of just localhost
dfdb router --config cluster.json --port 5000

Build the config with dfdb cluster and check shard health with dfdb health.

node.json format

Full schema. Every field except nodeName, port, and dataDir is optional.

{ "nodeName": "shard-a", "port": 5001, "dataDir": "/var/lib/dfdb/shard-a", "bindAllInterfaces": false, "security": { "apiKey": "sk_prod_abc123", "replicationSecret": "repl_shared_xyz", "tls": { "certPath": "/etc/dfdb/cert.pfx", "certPassword": "env:DFDB_CERT_PASSWORD" } }, "replication": { "role": "follower", "port": 5500, "leaderHost": "leader.internal", "leaderPort": 5500, "autoFailover": { "silenceSeconds": 10, "newLeaderPort": 5500 } } }

Notes:

  • bindAllInterfaces: true is equivalent to --bind-all — only enable behind a firewall.
  • certPassword supports env:VAR_NAME indirection so you don’t commit passwords to the config file.
  • Omit the security block entirely for local dev.
  • Omit the replication block for a single-node setup. role is either "leader" or "follower".
  • On a leader, only role and port are read. On a follower, leaderHost + leaderPort are required; autoFailover is optional.

Environment variables

Every flag on dfdb serve has an env-var equivalent. Useful for containers and orchestrators.

FlagEnv var
--node-nameDFDB_NODE_NAME
--portDFDB_PORT
--data-dirDFDB_DATA_DIR
--api-keyDFDB_API_KEY
--replication-secretDFDB_REPLICATION_SECRET
--replication-roleDFDB_REPLICATION_ROLE
--replication-portDFDB_REPLICATION_PORT
--leader-hostDFDB_LEADER_HOST
--leader-portDFDB_LEADER_PORT
--auto-failover-secondsDFDB_AUTO_FAILOVER_SECONDS

The admin UI also reads one env var at build time:

NEXT_PUBLIC_DFDB_URL=http://localhost:5000 npm run dev

Config precedence

When the same setting is provided by multiple sources, the later one wins:

  1. --config node.json — loaded first
  2. CLI flags (--port, --data-dir, etc.) — override the config file
  3. Env vars (DFDB_*) — fill in anything still at its default
  4. Defaults — localhost:5000, ./data, no security

Rule of thumb: put stable topology (nodeName, port, dataDir) in the config file, and pass secrets (--api-key, DFDB_REPLICATION_SECRET, cert password) via env vars or a secret manager.

REST API endpoints

Once dfdb serve is running, every database operation is available over HTTP. See the REST API reference for more.

Application (CRUD + query)

The natural way to use the API: address documents by their business key (PNR, email, SKU…), not by DocumentForge’s internal id.

MethodPathPurpose
GET/healthLiveness + node name, version, read-only flag, uptime. Always public — never gated by the API key, so platform health checks (Render, Docker HEALTHCHECK, K8s probes) don’t need credentials.
POST/queryRun arbitrary SQL. Body: { "sql": "..." }. Default response is the full materialised JSON envelope. For large result sets, request NDJSON streaming via Accept: application/x-ndjson or ?stream=true — line 1 is the meta envelope, then one document per line. Plan / count / executionMs are also returned in X-DFDB-Plan / X-DFDB-Count / X-DFDB-ExecutionMs headers. 400 on parse error or unknown collection.
GET/statsFile size, pages, per-collection counts, all indexes.
GET/collectionsList all collection names.
POST/collections/{name}Insert a single JSON document. Returns { id }.
POST/collections/{name}/bulkBulk insert (JSON array). Response includes ids[] for every successful insert and errors[{ index, error }] for failures. Pass ?atomic=true for all-or-nothing semantics. Pass ?skipIndexes=true for cold-load throughput; you must then call POST /admin/rebuild-indexes/{name} before any indexed query.
GET/collections/{name}List documents. Supports ?limit=N.
GET/collections/{name}/by/{field}/{value}Find by your own key — uses an index when one exists. Field name accepts dot/bracket paths (e.g. passenger.lastName, flights[0].departureAirport).
PUT/collections/{name}/by/{field}/{value}Replace by your own key. Body is the full new document JSON. The internal _id of the matched document is preserved server-side.
DELETE/collections/{name}/by/{field}/{value}Delete every document matching that field value.
DELETE/collections/{name}Drop the entire collection. Requires header X-Confirm: true.
GET/indexes/{collection}List indexes on a collection.
POST/indexCreate an index. Body: { collection, path, name?, unique? }.
POST/seedPopulate the demo airline dataset. Body: { "orders": 500 }.

Application (advanced — by internal _id)

The internal _id is a 16-byte sequential identifier returned by inserts. You normally don’t want this — prefer /by/{field}/{value} above. Use these only when you’ve cached the _id from a prior write and want to skip the index lookup.

MethodPathPurpose
GET/collections/{name}/{_id}Direct location-map lookup by the internal id (the value of _id, formatted as a Guid).
PUT/collections/{name}/{_id}Replace the document at that internal id. Body is the full new JSON. The _id is preserved automatically.
DELETE/collections/{name}/{_id}Direct delete by internal id.

Admin

MethodPathPurpose
POST/admin/flushFlush dirty pages to disk, truncate recovery log.
POST/admin/checkpointDrain the WAL to a consistent on-disk state.
POST/admin/compact/{collection}Reclaim space from deletes. Returns pages compacted + bytes reclaimed.
POST/admin/rebuild-indexes/{collection}Rebuild every index on the collection from scratch. Needed after a bulk insert that used ?skipIndexes=true, or any time you suspect index drift.
POST/admin/rebuild-index/{collection}/{indexName}Surgical rebuild of a single named index. Use when one specific index has drifted (faster than the all-indexes variant on a large collection).

Replication

MethodPathPurpose
GET/replication/statusRole, seqs, follower count, gaps, auto-failover state. Safe to poll.
POST/replication/start-leaderBody: { port, sharedSecret? }. Opens the replication listener.
POST/replication/start-followerBody: { host, port, sharedSecret? }.
POST/replication/read-onlyEnter read-only mode (planned handover step 1).
POST/replication/read-writeExit read-only mode (recover / abort handover).
POST/replication/promoteBody: { port }. Take over as leader (handover step 2).
POST/replication/auto-failover/enableBody: { silenceSeconds, newLeaderPort }.
POST/replication/auto-failover/disableStop watching for leader silence.

Sharding control is CLI-only on purpose. Cluster topology lives in a cluster.json file you version alongside your infra — not per-node state. Use dfdb cluster / dfdb health / dfdb rebalance to manage it. Each individual shard is still a regular node, so every endpoint above works on each one.

Last updated on