Install / run
There are two ways to run the CLI. Both accept identical subcommands.
Dev mode (source checkout)
dotnet run --project src/DocumentForge.Cli -- <command> [args]
Published binary
# Build the single-file binary for your platform ./scripts/publish-dfdb.sh # or publish-dfdb.ps1 on Windows # Then run it from dist/ — no .NET runtime required on the target ./dist/linux-x64/dfdb <command> [args] ./dist/win-x64/dfdb.exe <command> [args]
In the rest of this page we write dfdb; substitute whichever form fits your setup.
Top-level help
dfdb help # or --help, -h dfdb version # or --version, -v
serve
Start the REST API for a single node. This is what the admin UI, the LINQ client, and the HttpShardTransport talk to.
dfdb serve [--config node.json]
[--node-name NAME]
[--port N]
[--data-dir DIR]
[--api-key KEY]
[--replication-secret SECRET]
[--bind-all]
[--replication-role leader|follower]
[--replication-port N]
[--leader-host H] [--leader-port N]
[--auto-failover-seconds N] [--auto-failover-new-port N]
| Flag | Default | Notes |
|---|---|---|
--config | — | Path to a node.json file. Loaded first; other flags override. |
--node-name | node-1 | Shows up in logs, health output, and shard descriptors. |
--port | 5000 | HTTP (or HTTPS, with TLS configured) port. |
--data-dir | ./data | Directory for data.dfdb, WAL, and recovery log. |
--api-key | off | If set, every request must carry Authorization: Bearer <key>. |
--replication-secret | off | Shared secret enforced on the replication handshake. |
--bind-all | loopback | Bind to 0.0.0.0 instead of 127.0.0.1. Only enable behind a trusted network / firewall. |
--replication-role | off | leader or follower. Omit for a single-node setup. |
--replication-port | 5500 | Leader: TCP port for the replication listener. Must differ from --port. |
--leader-host | — | Follower: hostname of the leader. |
--leader-port | — | Follower: replication port on the leader. |
--auto-failover-seconds | off | Follower: promote if the leader is silent this many seconds. |
--auto-failover-new-port | = leader port | Follower: which port to bind as leader once promoted. |
Examples
# Single-node dev dfdb serve --port 5000 --data-dir ./data # Production node with API key + shared secret dfdb serve --config ./node-shard-a.json \ --api-key sk_prod_abc123 \ --replication-secret repl_shared_xyz # Everything via env vars (useful in containers) DFDB_NODE_NAME=shard-a \ DFDB_PORT=5001 \ DFDB_DATA_DIR=/var/lib/dfdb/shard-a \ DFDB_API_KEY=sk_prod_abc123 \ dfdb serve # Leader with replication enabled dfdb serve --port 5000 --data-dir ./data/leader \ --replication-role leader --replication-port 5500 # Follower with auto-failover dfdb serve --port 5010 --data-dir ./data/replica \ --replication-role follower \ --leader-host localhost --leader-port 5500 \ --auto-failover-seconds 10
--replication-role (and the follower flags) and dfdb serve stands up the replication listener or follower loop alongside the HTTP API. See Replication → From the CLI for the full protocol and failure modes.
inspect
Offline introspection. Opens a .dfdb file locally and prints stats — size, page counts, collections, and indexes. Does not start a server.
dfdb inspect <file>
# Example
dfdb inspect ./data/shard-a/data.dfdb
query
Run a single SQL statement against a local file and print the results. Great for scripts and sanity checks.
dfdb query <file> "<sql>"
# Examples
dfdb query ./data/shard-a/data.dfdb "SELECT COUNT(*) FROM orders"
dfdb query ./data/shard-a/data.dfdb "SELECT * FROM orders WHERE pnr = 'ABC123'"
repl
Interactive SQL console. Opens the file, gives you a prompt, executes statements on Enter. Type stats to see storage info, exit to quit.
dfdb repl <file> dfdb> SELECT * FROM orders LIMIT 5 dfdb> SELECT status, COUNT(*) FROM orders GROUP BY status dfdb> CREATE INDEX idx_pnr ON orders (pnr) dfdb> stats dfdb> exit
compact
Reclaim physical space left by DELETE. Rewrites the collection's pages in-place, releasing freed slots back to the allocator.
dfdb compact <file> <collection>
# Example
dfdb compact ./data/shard-a/data.dfdb orders
dfdb serve has open — stop the server first, compact, then restart.
seed
Loads realistic airline-reservation sample data (orders with nested passengers and flights). Useful for demos and smoke tests.
dfdb seed <file> [count] # Examples dfdb seed ./data/demo.dfdb # default 10,000 orders dfdb seed ./data/demo.dfdb 250000 # a quarter million
cluster
Build and inspect the cluster.json topology file. This file describes every shard and what's sharded vs replicated.
dfdb cluster init <config> dfdb cluster show <config> dfdb cluster add-shard <config> <name> <endpoint> dfdb cluster add-collection <config> <name> (hash <path>|replicated)
Examples
# Fresh cluster config dfdb cluster init cluster.json # Register three shards dfdb cluster add-shard cluster.json shard-a http://localhost:5001 dfdb cluster add-shard cluster.json shard-b http://localhost:5002 dfdb cluster add-shard cluster.json shard-c http://localhost:5003 # orders are hash-sharded on pnr; airports are fully replicated to every shard dfdb cluster add-collection cluster.json orders hash pnr dfdb cluster add-collection cluster.json airports replicated # Inspect what we built dfdb cluster show cluster.json
health
Ping every shard in a cluster config and report status + basic stats. Intended to be safe to run from anywhere — it only reads.
dfdb health <cluster-config>
# Example output: three green dots, response times, doc counts per shard
dfdb health cluster.json
rebalance
Plan or execute a topology change (adding or removing shards). --plan-only prints what would move without touching data.
dfdb rebalance <old-config> <new-config> [--plan-only] # Preview only — no data moves dfdb rebalance old-cluster.json new-cluster.json --plan-only # Execute the migration dfdb rebalance old-cluster.json new-cluster.json
See the Sharding guide for the online-rebalance protocol and what happens to reads/writes during migration.
node.json format
Full schema. Every field except nodeName, port, and dataDir is optional.
{
"nodeName": "shard-a",
"port": 5001,
"dataDir": "/var/lib/dfdb/shard-a",
"bindAllInterfaces": false,
"security": {
"apiKey": "sk_prod_abc123",
"replicationSecret": "repl_shared_xyz",
"tls": {
"certPath": "/etc/dfdb/cert.pfx",
"certPassword": "env:DFDB_CERT_PASSWORD"
}
},
"replication": {
"role": "follower",
"port": 5500,
"leaderHost": "leader.internal",
"leaderPort": 5500,
"autoFailover": {
"silenceSeconds": 10,
"newLeaderPort": 5500
}
}
}
Notes:
bindAllInterfaces: trueis equivalent to--bind-all— only enable behind a firewall.certPasswordsupportsenv:VAR_NAMEindirection so you don't commit passwords to the config file.- Omit the
securityblock entirely for local dev. - Omit the
replicationblock for a single-node setup.roleis either"leader"or"follower". - On a leader, only
roleandportare read. On a follower,leaderHost+leaderPortare required;autoFailoveris optional.
Environment variables
Every flag on dfdb serve has an env-var equivalent. Useful for containers and orchestrators.
| Flag | Env var |
|---|---|
--node-name | DFDB_NODE_NAME |
--port | DFDB_PORT |
--data-dir | DFDB_DATA_DIR |
--api-key | DFDB_API_KEY |
--replication-secret | DFDB_REPLICATION_SECRET |
--replication-role | DFDB_REPLICATION_ROLE |
--replication-port | DFDB_REPLICATION_PORT |
--leader-host | DFDB_LEADER_HOST |
--leader-port | DFDB_LEADER_PORT |
--auto-failover-seconds | DFDB_AUTO_FAILOVER_SECONDS |
The admin UI also reads one env var at build time:
NEXT_PUBLIC_DFDB_URL=http://localhost:5000 npm run dev
Config precedence
When the same setting is provided by multiple sources, the later one wins:
--config node.json— loaded first- CLI flags (
--port,--data-dir, etc.) — override the config file - Env vars (
DFDB_*) — fill in anything still at its default - Defaults — localhost:5000,
./data, no security
nodeName, port, dataDir) in the config file, and pass secrets (--api-key, DFDB_REPLICATION_SECRET, cert password) via env vars or a secret manager.
REST API endpoints
Once dfdb serve is running, every database operation is available over HTTP. The Postman collection covers them all with working examples.
Application (CRUD + query)
The natural way to use the API: address documents by their business key (PNR, email, SKU…), not by DocumentForge's internal id.
| Method | Path | Purpose |
|---|---|---|
GET | /health | Liveness + node name, version, read-only flag, uptime. Always public — never gated by the API key, so platform health checks (Render, Docker HEALTHCHECK, K8s probes) don't need credentials. |
POST | /query | Run arbitrary SQL. Body: { "sql": "..." }. Default response is the full materialised JSON envelope. For large result sets, request NDJSON streaming via Accept: application/x-ndjson or ?stream=true — line 1 is the meta envelope, then one document per line. Plan / count / executionMs are also returned in X-DFDB-Plan / X-DFDB-Count / X-DFDB-ExecutionMs headers. |
GET | /stats | File size, pages, per-collection counts, all indexes. |
GET | /collections | List all collection names. |
POST | /collections/{name} | Insert a single JSON document. Returns { id }. |
POST | /collections/{name}/bulk | Bulk insert (JSON array). Rebuilds indexes after by default. Pass ?skipIndexes=true for cold-load throughput; you must then call POST /admin/rebuild-indexes/{name} before any indexed query. |
GET | /collections/{name} | List documents. Supports ?limit=N. |
GET | /collections/{name}/by/{field}/{value} | Find by your own key — uses an index when one exists. Field name accepts dot/bracket paths (e.g. passenger.lastName, flights[0].departureAirport). |
PUT | /collections/{name}/by/{field}/{value} | Replace by your own key. Body is the full new document JSON. The internal _id of the matched document is preserved server-side — you don't need to include it. |
DELETE | /collections/{name}/by/{field}/{value} | Delete every document matching that field value. |
DELETE | /collections/{name} | Drop the entire collection. Requires header X-Confirm: true. |
GET | /indexes/{collection} | List indexes on a collection. |
POST | /index | Create an index. Body: { collection, path, name?, unique? }. |
POST | /seed | Populate the demo airline dataset. Body: { "orders": 500 }. |
Application (advanced — by internal _id)
The internal _id is a 16-byte sequential identifier returned by inserts. You normally don't want this — prefer /by/{field}/{value} above. Use these only when you've cached the _id from a prior write and want to skip the index lookup.
| Method | Path | Purpose |
|---|---|---|
GET | /collections/{name}/{_id} | Direct location-map lookup by the internal id (the value of _id, formatted as a Guid). |
PUT | /collections/{name}/{_id} | Replace the document at that internal id. Body is the full new JSON. The _id is preserved on the new doc automatically. |
DELETE | /collections/{name}/{_id} | Direct delete by internal id. |
Admin
| Method | Path | Purpose |
|---|---|---|
POST | /admin/flush | Flush dirty pages to disk, truncate recovery log. |
POST | /admin/checkpoint | Drain the WAL to a consistent on-disk state. |
POST | /admin/compact/{collection} | Reclaim space from deletes. Returns pages compacted + bytes reclaimed. |
POST | /admin/rebuild-indexes/{collection} | Rebuild every index on the collection from scratch. Needed after a bulk insert that used ?skipIndexes=true, or any time you suspect index drift. |
Replication
| Method | Path | Purpose |
|---|---|---|
GET | /replication/status | Role, seqs, follower count, gaps, auto-failover state. Safe to poll. |
POST | /replication/start-leader | Body: { port, sharedSecret? }. Opens the replication listener. |
POST | /replication/start-follower | Body: { host, port, sharedSecret? }. |
POST | /replication/read-only | Enter read-only mode (planned handover step 1). |
POST | /replication/read-write | Exit read-only mode (recover / abort handover). |
POST | /replication/promote | Body: { port }. Take over as leader (handover step 2). |
POST | /replication/auto-failover/enable | Body: { silenceSeconds, newLeaderPort }. |
POST | /replication/auto-failover/disable | Stop watching for leader silence. |
cluster.json file you version alongside your infra — not per-node state. Use dfdb cluster / dfdb health / dfdb rebalance to manage it. Each individual shard is still a regular node, so every endpoint above works on each one.
Postman collection
A ready-to-import Postman collection lives at docs/postman/DocumentForge.postman_collection.json. Two folders:
- Application — how an app uses the database: seed, insert, bulk-insert, find-by-id, SQL queries, create index, delete.
- Database Operations — how a management app / SRE uses it: admin (flush, checkpoint, compact, drop), replication (status, start leader/follower, promote, auto-failover), and a sharding-via-CLI reference.
Setting it up
- Open Postman → File → Import → choose the
DocumentForge.postman_collection.jsonfile. - On the collection, set
baseUrl(defaulthttp://localhost:5000) and, if you started the node with--api-key, setapiKey. - Run Application → Seed sample airline data to get ~500 realistic orders + flights with 5 indexes.
- Pre-request scripts auto-save returned ids to collection variables so Insert → Find → Delete chains work without hand-editing URLs.
See docs/postman/README.md for the variable reference and recommended first-run order.