Five minutes, end to end

Quickstart

Download the binary, seed ten thousand orders, run a benchmark, and see real numbers on your own machine. No .NET install, no server config, no schema migration. Everything happens against a single file.

Run it with Claude — or any AI agent

The fastest path to real numbers: hand the prompt below to your coding agent (Claude Code, Claude.ai, Cursor, …). It clones the repo, runs the benchmark, and reads the results back to you. Prefer to drive yourself? The manual steps 1–8 are right below.

▣ Paste this to your AI agent
Clone the DocumentForge repo and run its benchmark, then summarise the results for me.

Repo: https://github.com/aerotoysio/documentforge
Quick mode is ~40s on 100k airline-order documents:

  git clone https://github.com/aerotoysio/documentforge.git
  cd documentforge
  DFDB_BENCH_QUICK=1 dotnet run -c Release --project samples/DocumentForge.Benchmark

You'll need the .NET 9 SDK. When it finishes, show me the QPS + latency
table it prints and tell me whether the indexed point lookup really is
sub-millisecond.

Or run one of these yourself:

# ~40s · 100k docs · needs the .NET 9 SDK
git clone https://github.com/aerotoysio/documentforge.git
cd documentforge
DFDB_BENCH_QUICK=1 dotnet run -c Release --project samples/DocumentForge.Benchmark
# 10M docs · 30s per query · ~2 GB RAM free · .NET 9 SDK
git clone https://github.com/aerotoysio/documentforge.git
cd documentforge
dotnet run -c Release --project samples/DocumentForge.Benchmark
# No .NET needed — grab the dfdb release binary
curl -L -o dfdb.zip https://github.com/aerotoysio/documentforge/releases/latest/download/dfdb-linux-x64.zip
unzip dfdb.zip && chmod +x dfdb
./dfdb seed  ./bench.dfdb 100000
./dfdb query ./bench.dfdb "SELECT * FROM orders LIMIT 5"
# Windows · PowerShell · no .NET needed
Invoke-WebRequest https://github.com/aerotoysio/documentforge/releases/latest/download/dfdb-win-x64.zip -OutFile dfdb.zip
Expand-Archive dfdb.zip -DestinationPath dfdb; cd dfdb
.\dfdb.exe seed  .\bench.dfdb 100000
.\dfdb.exe query .\bench.dfdb "SELECT * FROM orders LIMIT 5"

1. Download the binary

Self-contained, no .NET runtime needed on the target machine. Pick your platform:

macOS & Linux: after unzipping, you may need to mark the binary executable: chmod +x dfdb. On macOS the first run also needs xattr -d com.apple.quarantine dfdb.

2. Unzip and verify

Open a terminal in the folder where you put the binary and check it runs:

.\dfdb.exe --version
# dfdb 0.1.0

.\dfdb.exe --help
# Lists every subcommand: serve, repl, query, seed, cluster, health, rebalance, ...
./dfdb --version
# dfdb 0.1.0

./dfdb --help
# Lists every subcommand: serve, repl, query, seed, cluster, health, rebalance, ...

3. Seed 10,000 orders

The seed command writes ten thousand realistic, IATA-shaped order documents into a fresh .dfdb file. This is your sandbox dataset for the rest of the walkthrough.

mkdir data
.\dfdb.exe seed .\data\airline.dfdb 10000
# Seeded 10,000 orders in 0.18s (55,500 docs/sec)
# File: .\data\airline.dfdb (3.2 MB)
mkdir -p data
./dfdb seed ./data/airline.dfdb 10000
# Seeded 10,000 orders in 0.18s (55,500 docs/sec)
# File: ./data/airline.dfdb (3.2 MB)

You now have a single file containing ten thousand JSON orders, each with passenger, journey, fare, and ancillary nested structure. The file is portable — copy it, mail it, mount it on another machine; it just works.

4. Query it with SQL

Open the interactive REPL and run SQL directly against the file. No server, no port, no daemon.

./dfdb repl ./data/airline.dfdb

dfdb> SELECT COUNT(*) FROM orders
10000   (0.4 ms)

dfdb> SELECT * FROM orders WHERE pnr = 'ORD000042' LIMIT 1
{ "pnr": "ORD000042", "passenger": { "lastName": "Smith" }, ... }   (0.0 ms)

dfdb> SELECT cabin, COUNT(*) FROM orders GROUP BY cabin
ECONOMY 6312
PREMIUM_ECONOMY 1844
BUSINESS 1318
FIRST 526   (12 ms)

dfdb> .quit

5. Run the benchmark

The seed command above gave you a small dataset for kicking the tires. The included full benchmark scales it up to 10 million documents and measures bulk-insert rate, indexed point lookups, and range queries. It runs end to end in about ten minutes on a modern laptop, and produces a numbered table you can paste into a memo.

Run it directly with the binary

./dfdb seed ./data/bench.dfdb 10000000
# Builds the 10M-doc dataset (~5 minutes, ~5 GB on disk)

./dfdb query ./data/bench.dfdb "SELECT COUNT(*) FROM orders"
./dfdb query ./data/bench.dfdb "SELECT * FROM orders WHERE pnr = 'ORD000042'"
./dfdb query ./data/bench.dfdb "SELECT * FROM orders WHERE flights[0].fareAmount > 500 LIMIT 100"

Or run the full benchmark project

The repository ships a benchmark project that drives a more rigorous workload (sustained QPS over a 30-second window per query type, multi-phase insert, RAM tracking). Build from source:

git clone https://github.com/aerotoysio/documentforge.git
cd documentforge
dotnet run --project samples/DocumentForge.Benchmark -c Release

What you'll see, on a typical modern laptop with an NVMe SSD:

Workload250K docs10M docs
Bulk insert rate70K docs/sec41K docs/sec
Indexed point lookup210K QPS163K QPS
Range query LIMIT 100144 QPS47 QPS
File size on disk120 MB4.77 GB

Read the full performance methodology →

6. Talk to it over REST (any language)

If you want to drive DocumentForge from a non-.NET stack, run it as a server. One command, one port, JSON-in / JSON-out.

./dfdb serve --port 5000 --data-dir ./data
# DocumentForge listening on http://localhost:5000
# Health: GET /health   Query: POST /query   Docs: /collections/{name}/documents

From any language with an HTTP client:

# Insert
curl -X POST http://localhost:5000/collections/orders/documents \
     -H "Content-Type: application/json" \
     -d '{
       "pnr": "ABC123",
       "passenger": { "lastName": "Smith" },
       "flights": [{ "flightNumber": "AA100" }]
     }'

# SQL query
curl -X POST http://localhost:5000/query \
     -H "Content-Type: application/json" \
     -d '{"sql": "SELECT * FROM orders WHERE pnr = '\''ABC123'\''"}'
using System.Net.Http.Json;
var http = new HttpClient { BaseAddress = new("http://localhost:5000") };

// Insert
await http.PostAsJsonAsync("collections/orders/documents", new {
    pnr = "ABC123",
    passenger = new { lastName = "Smith" },
    flights = new[] { new { flightNumber = "AA100" } }
});

// SQL query
var r = await http.PostAsJsonAsync("query", new {
    sql = "SELECT * FROM orders WHERE pnr = 'ABC123'"
});
Console.WriteLine(await r.Content.ReadAsStringAsync());
var http = HttpClient.newHttpClient();

// Insert
http.send(HttpRequest.newBuilder()
    .uri(URI.create("http://localhost:5000/collections/orders/documents"))
    .header("Content-Type", "application/json")
    .POST(BodyPublishers.ofString("""
        {"pnr":"ABC123","passenger":{"lastName":"Smith"},"flights":[{"flightNumber":"AA100"}]}
        """))
    .build(), BodyHandlers.ofString());

// Query
var resp = http.send(HttpRequest.newBuilder()
    .uri(URI.create("http://localhost:5000/query"))
    .header("Content-Type", "application/json")
    .POST(BodyPublishers.ofString("""
        {"sql":"SELECT * FROM orders WHERE pnr = 'ABC123'"}
        """))
    .build(), BodyHandlers.ofString());
System.out.println(resp.body());
import httpx

# Insert
httpx.post(
    "http://localhost:5000/collections/orders/documents",
    json={
        "pnr": "ABC123",
        "passenger": {"lastName": "Smith"},
        "flights": [{"flightNumber": "AA100"}],
    },
)

# Query
r = httpx.post(
    "http://localhost:5000/query",
    json={"sql": "SELECT * FROM orders WHERE pnr = 'ABC123'"},
)
print(r.json())
// Node 20+ — global fetch
await fetch("http://localhost:5000/collections/orders/documents", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    pnr: "ABC123",
    passenger: { lastName: "Smith" },
    flights: [{ flightNumber: "AA100" }],
  }),
});

const r = await fetch("http://localhost:5000/query", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ sql: "SELECT * FROM orders WHERE pnr = 'ABC123'" }),
});
console.log(await r.json());

7. Or just import the Postman collection

Don't want to write code yet? Drop the included Postman collection on top of a running dfdb serve and click your way through every endpoint — health, seed, insert, point lookup by PNR, indexed queries, bulk-insert with ?atomic=true, replication status, admin actions, the lot.

In Postman: File → Import → drop the file. Set the baseUrl variable to http://localhost:5000 (or wherever your node is). Run Application → Health check first, then Seed sample airline data, then any of the queries — the test scripts auto-populate {{orderId}} for chained requests.

8. Embed as a library (.NET)

If your service is .NET, skip the REST hop entirely and use DocumentForge as an in-process library. The same data file format, the same SQL, but with sub-millisecond function-call latency instead of HTTP.

dotnet add package DocumentForge
using DocumentForge.Engine;

// Open or create
using var db = DocumentForgeDb.OpenOrCreate("airline.dfdb");

// Insert
db.Insert("orders", """
{
    "pnr": "ABC123",
    "passenger": { "firstName": "John", "lastName": "Smith" }
}
""");

// Indexes
db.CreateIndex("orders", "pnr", "idx_pnr", unique: true);

// Query
var r = db.Execute("SELECT * FROM orders WHERE pnr = 'ABC123'");
Console.WriteLine(r.Documents[0].ToJson());

// Or LINQ — strongly typed, captured variables work
var orders = db.Collection<Order>("orders");
var match = orders.Where(o => o.Pnr == "ABC123").FirstOrDefault();

Next steps