Install
DocumentForge is a .NET 9 library with zero external dependencies. Add it to your solution by cloning the repository and referencing the DocumentForge.Engine project, or install the NuGet package once published.
From source
git clone https://github.com/tailwind-retailing/documentforge.git
cd documentforge
dotnet build
# Add a reference from your project
dotnet add reference path/to/src/DocumentForge.Engine
Requirements
- .NET 9 SDK or later
- Windows, macOS, or Linux
- Write access to the directory where your
.dfdbfile lives
Your first database
A DocumentForge database is a single file on disk. Opening one is a constructor call.
using DocumentForge.Engine; using var db = DocumentForgeDb.OpenOrCreate("airline.dfdb");
That's it. If airline.dfdb exists, it's opened; if not, it's created. The using ensures the file is closed and flushed on dispose.
airline.dfdb (data), airline.dfdb.wal (write-ahead log), airline.dfdb.recovery (crash recovery log). Keep them together when moving or backing up.
Insert documents
Documents are JSON. You can pass a JSON string, a BsonDocument, or an anonymous C# object (via BsonDocument.FromJson(JsonSerializer.Serialize(obj))).
// JSON string (simplest) db.Insert("orders", @"{ ""pnr"": ""ABC123"", ""status"": ""CONFIRMED"", ""passenger"": { ""firstName"": ""John"", ""lastName"": ""Smith"" }, ""flights"": [ { ""flightNumber"": ""AA100"", ""departureAirport"": ""JFK"" } ] }");
Collections are created automatically on first insert. The document's _id field is auto-generated (sequential, time-ordered) if you don't supply one.
Bulk insert
For loading large datasets, BulkInsert acquires the write lock once and defers index updates until the batch completes. Expect 50,000–70,000 docs/sec on a modern laptop.
var batch = new List<BsonDocument>(); for (int i = 0; i < 10_000; i++) { batch.Add(BsonDocument.FromJson($@"{{ ""pnr"": ""ORD{i:D6}"" }}")); } db.BulkInsert("orders", batch);
Query
SQL-like queries hit a single Execute method. Dot notation navigates nested objects. Bracket notation indexes arrays.
// Point lookup var r1 = db.Execute("SELECT * FROM orders WHERE pnr = 'ABC123'"); // Nested field var r2 = db.Execute("SELECT * FROM orders WHERE passenger.lastName = 'Smith'"); // Array element var r3 = db.Execute("SELECT * FROM orders WHERE flights[0].departureAirport = 'JFK'"); // Range, ordering, limit var r4 = db.Execute(@" SELECT pnr, passenger.lastName FROM orders WHERE flights[0].fareAmount > 500 ORDER BY flights[0].fareAmount DESC LIMIT 10 "); // Iterate results foreach (var doc in r4.Documents) Console.WriteLine(doc.ToJson());
Every query returns a QueryResult containing the documents, the plan used (INDEX_SCAN, COLLECTION_SCAN, etc.), and the execution time in milliseconds.
Create indexes
Indexes are the difference between a 1ms lookup and a 1-second scan. Create one for any JSON path you query frequently.
// Single-field index db.CreateIndex("orders", "pnr", "idx_pnr", unique: true); // Nested path db.CreateIndex("orders", "passenger.lastName", "idx_lastname"); // Composite - good for multi-field WHERE clauses db.Execute("CREATE INDEX idx_status_date ON orders (status, createdAt)");
Indexes are persistent — they survive database restart without being rebuilt. They're also incrementally maintained on every insert, update, and delete, so queries always see fresh results.
Interactive REPL
Included in the repo is an interactive console for experimenting with SQL queries against your database.
dfdb repl ./data/data.dfdb dfdb> SELECT * FROM orders WHERE pnr = 'ABC123' dfdb> SELECT status, COUNT(*) FROM orders GROUP BY status dfdb> stats
REST API
For testing from Postman or wiring other services in, use dfdb serve:
dfdb serve --port 5000 --data-dir ./data
# Then from another terminal
curl -X POST http://localhost:5000/query \
-H "Content-Type: application/json" \
-d '{"sql": "SELECT * FROM orders LIMIT 5"}'
The same endpoints power the admin UI, so pointing it at any dfdb serve instance just works.
Next steps
- Query Language reference — all the SQL features we support
- Data Modeling guide — when to embed, when to reference
- Replication — read scaling and zero-downtime handover
- Deployment — running across multiple machines and datacenters
- Sharding — horizontal scale with consistent hashing and online rebalance
- Security — API keys, replication secrets, TLS