Run it with Claude — or any AI agent
The fastest path to real numbers: hand the prompt below to your coding agent (Claude Code, Claude.ai, Cursor, …). It clones the repo, runs the benchmark, and reads the results back to you. Prefer to drive yourself? The manual steps 1–8 are right below.
Clone the DocumentForge repo and run its benchmark, then summarise the results for me. Repo: https://github.com/aerotoysio/documentforge Quick mode is ~40s on 100k airline-order documents: git clone https://github.com/aerotoysio/documentforge.git cd documentforge DFDB_BENCH_QUICK=1 dotnet run -c Release --project samples/DocumentForge.Benchmark You'll need the .NET 9 SDK. When it finishes, show me the QPS + latency table it prints and tell me whether the indexed point lookup really is sub-millisecond.
Or run one of these yourself:
# ~40s · 100k docs · needs the .NET 9 SDK
git clone https://github.com/aerotoysio/documentforge.git
cd documentforge
DFDB_BENCH_QUICK=1 dotnet run -c Release --project samples/DocumentForge.Benchmark
# 10M docs · 30s per query · ~2 GB RAM free · .NET 9 SDK
git clone https://github.com/aerotoysio/documentforge.git
cd documentforge
dotnet run -c Release --project samples/DocumentForge.Benchmark
# No .NET needed — grab the dfdb release binary
curl -L -o dfdb.zip https://github.com/aerotoysio/documentforge/releases/latest/download/dfdb-linux-x64.zip
unzip dfdb.zip && chmod +x dfdb
./dfdb seed ./bench.dfdb 100000
./dfdb query ./bench.dfdb "SELECT * FROM orders LIMIT 5"
# Windows · PowerShell · no .NET needed
Invoke-WebRequest https://github.com/aerotoysio/documentforge/releases/latest/download/dfdb-win-x64.zip -OutFile dfdb.zip
Expand-Archive dfdb.zip -DestinationPath dfdb; cd dfdb
.\dfdb.exe seed .\bench.dfdb 100000
.\dfdb.exe query .\bench.dfdb "SELECT * FROM orders LIMIT 5"
1. Download the binary
Self-contained, no .NET runtime needed on the target machine. Pick your platform:
chmod +x dfdb. On macOS the first run also needs xattr -d com.apple.quarantine dfdb.
2. Unzip and verify
Open a terminal in the folder where you put the binary and check it runs:
.\dfdb.exe --version # dfdb 0.1.0 .\dfdb.exe --help # Lists every subcommand: serve, repl, query, seed, cluster, health, rebalance, ...
./dfdb --version # dfdb 0.1.0 ./dfdb --help # Lists every subcommand: serve, repl, query, seed, cluster, health, rebalance, ...
3. Seed 10,000 orders
The seed command writes ten thousand realistic, IATA-shaped order documents into a fresh .dfdb file. This is your sandbox dataset for the rest of the walkthrough.
mkdir data .\dfdb.exe seed .\data\airline.dfdb 10000 # Seeded 10,000 orders in 0.18s (55,500 docs/sec) # File: .\data\airline.dfdb (3.2 MB)
mkdir -p data ./dfdb seed ./data/airline.dfdb 10000 # Seeded 10,000 orders in 0.18s (55,500 docs/sec) # File: ./data/airline.dfdb (3.2 MB)
You now have a single file containing ten thousand JSON orders, each with passenger, journey, fare, and ancillary nested structure. The file is portable — copy it, mail it, mount it on another machine; it just works.
4. Query it with SQL
Open the interactive REPL and run SQL directly against the file. No server, no port, no daemon.
./dfdb repl ./data/airline.dfdb dfdb> SELECT COUNT(*) FROM orders 10000 (0.4 ms) dfdb> SELECT * FROM orders WHERE pnr = 'ORD000042' LIMIT 1 { "pnr": "ORD000042", "passenger": { "lastName": "Smith" }, ... } (0.0 ms) dfdb> SELECT cabin, COUNT(*) FROM orders GROUP BY cabin ECONOMY 6312 PREMIUM_ECONOMY 1844 BUSINESS 1318 FIRST 526 (12 ms) dfdb> .quit
5. Run the benchmark
The seed command above gave you a small dataset for kicking the tires. The included full benchmark scales it up to 10 million documents and measures bulk-insert rate, indexed point lookups, and range queries. It runs end to end in about ten minutes on a modern laptop, and produces a numbered table you can paste into a memo.
Run it directly with the binary
./dfdb seed ./data/bench.dfdb 10000000
# Builds the 10M-doc dataset (~5 minutes, ~5 GB on disk)
./dfdb query ./data/bench.dfdb "SELECT COUNT(*) FROM orders"
./dfdb query ./data/bench.dfdb "SELECT * FROM orders WHERE pnr = 'ORD000042'"
./dfdb query ./data/bench.dfdb "SELECT * FROM orders WHERE flights[0].fareAmount > 500 LIMIT 100"
Or run the full benchmark project
The repository ships a benchmark project that drives a more rigorous workload (sustained QPS over a 30-second window per query type, multi-phase insert, RAM tracking). Build from source:
git clone https://github.com/aerotoysio/documentforge.git cd documentforge dotnet run --project samples/DocumentForge.Benchmark -c Release
What you'll see, on a typical modern laptop with an NVMe SSD:
| Workload | 250K docs | 10M docs |
|---|---|---|
| Bulk insert rate | 70K docs/sec | 41K docs/sec |
| Indexed point lookup | 210K QPS | 163K QPS |
| Range query LIMIT 100 | 144 QPS | 47 QPS |
| File size on disk | 120 MB | 4.77 GB |
Read the full performance methodology →
6. Talk to it over REST (any language)
If you want to drive DocumentForge from a non-.NET stack, run it as a server. One command, one port, JSON-in / JSON-out.
./dfdb serve --port 5000 --data-dir ./data # DocumentForge listening on http://localhost:5000 # Health: GET /health Query: POST /query Docs: /collections/{name}/documents
From any language with an HTTP client:
# Insert curl -X POST http://localhost:5000/collections/orders/documents \ -H "Content-Type: application/json" \ -d '{ "pnr": "ABC123", "passenger": { "lastName": "Smith" }, "flights": [{ "flightNumber": "AA100" }] }' # SQL query curl -X POST http://localhost:5000/query \ -H "Content-Type: application/json" \ -d '{"sql": "SELECT * FROM orders WHERE pnr = '\''ABC123'\''"}'
using System.Net.Http.Json; var http = new HttpClient { BaseAddress = new("http://localhost:5000") }; // Insert await http.PostAsJsonAsync("collections/orders/documents", new { pnr = "ABC123", passenger = new { lastName = "Smith" }, flights = new[] { new { flightNumber = "AA100" } } }); // SQL query var r = await http.PostAsJsonAsync("query", new { sql = "SELECT * FROM orders WHERE pnr = 'ABC123'" }); Console.WriteLine(await r.Content.ReadAsStringAsync());
var http = HttpClient.newHttpClient(); // Insert http.send(HttpRequest.newBuilder() .uri(URI.create("http://localhost:5000/collections/orders/documents")) .header("Content-Type", "application/json") .POST(BodyPublishers.ofString(""" {"pnr":"ABC123","passenger":{"lastName":"Smith"},"flights":[{"flightNumber":"AA100"}]} """)) .build(), BodyHandlers.ofString()); // Query var resp = http.send(HttpRequest.newBuilder() .uri(URI.create("http://localhost:5000/query")) .header("Content-Type", "application/json") .POST(BodyPublishers.ofString(""" {"sql":"SELECT * FROM orders WHERE pnr = 'ABC123'"} """)) .build(), BodyHandlers.ofString()); System.out.println(resp.body());
import httpx # Insert httpx.post( "http://localhost:5000/collections/orders/documents", json={ "pnr": "ABC123", "passenger": {"lastName": "Smith"}, "flights": [{"flightNumber": "AA100"}], }, ) # Query r = httpx.post( "http://localhost:5000/query", json={"sql": "SELECT * FROM orders WHERE pnr = 'ABC123'"}, ) print(r.json())
// Node 20+ — global fetch await fetch("http://localhost:5000/collections/orders/documents", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ pnr: "ABC123", passenger: { lastName: "Smith" }, flights: [{ flightNumber: "AA100" }], }), }); const r = await fetch("http://localhost:5000/query", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ sql: "SELECT * FROM orders WHERE pnr = 'ABC123'" }), }); console.log(await r.json());
7. Or just import the Postman collection
Don't want to write code yet? Drop the included Postman collection on top of a running dfdb serve and click your way through every endpoint — health, seed, insert, point lookup by PNR, indexed queries, bulk-insert with ?atomic=true, replication status, admin actions, the lot.
In Postman: File → Import → drop the file. Set the baseUrl variable to http://localhost:5000 (or wherever your node is). Run Application → Health check first, then Seed sample airline data, then any of the queries — the test scripts auto-populate {{orderId}} for chained requests.
8. Embed as a library (.NET)
If your service is .NET, skip the REST hop entirely and use DocumentForge as an in-process library. The same data file format, the same SQL, but with sub-millisecond function-call latency instead of HTTP.
dotnet add package DocumentForge
using DocumentForge.Engine; // Open or create using var db = DocumentForgeDb.OpenOrCreate("airline.dfdb"); // Insert db.Insert("orders", """ { "pnr": "ABC123", "passenger": { "firstName": "John", "lastName": "Smith" } } """); // Indexes db.CreateIndex("orders", "pnr", "idx_pnr", unique: true); // Query var r = db.Execute("SELECT * FROM orders WHERE pnr = 'ABC123'"); Console.WriteLine(r.Documents[0].ToJson()); // Or LINQ — strongly typed, captured variables work var orders = db.Collection<Order>("orders"); var match = orders.Where(o => o.Pnr == "ABC123").FirstOrDefault();
Next steps
- Use Cases — why DocumentForge for Offer & Order Management.
- Reference — query language, data modeling, replication, sharding, deployment, security, CLI.
- Performance — full benchmark methodology and bottleneck analysis.
- GitHub — source, issues, releases.