Getting Started — DocumentForge

Install

DocumentForge is a .NET 9 library with zero external dependencies. Add it to your solution by cloning the repository and referencing the DocumentForge.Engine project, or install the NuGet package once published.

From source

git clone https://github.com/tailwind-retailing/documentforge.git
cd documentforge
dotnet build

# Add a reference from your project
dotnet add reference path/to/src/DocumentForge.Engine

Requirements

.NET 9 SDK or later
Windows, macOS, or Linux
Write access to the directory where your .dfdb file lives

Your first database

A DocumentForge database is a single file on disk. Opening one is a constructor call.

using DocumentForge.Engine;

using var db = DocumentForgeDb.OpenOrCreate("airline.dfdb");

That's it. If airline.dfdb exists, it's opened; if not, it's created. The using ensures the file is closed and flushed on dispose.

Files created alongside your database: airline.dfdb (data), airline.dfdb.wal (write-ahead log), airline.dfdb.recovery (crash recovery log). Keep them together when moving or backing up.

Insert documents

Documents are JSON. You can pass a JSON string, a BsonDocument, or an anonymous C# object (via BsonDocument.FromJson(JsonSerializer.Serialize(obj))).

// JSON string (simplest)
db.Insert("orders", @"{
    ""pnr"": ""ABC123"",
    ""status"": ""CONFIRMED"",
    ""passenger"": {
        ""firstName"": ""John"",
        ""lastName"": ""Smith""
    },
    ""flights"": [
        { ""flightNumber"": ""AA100"", ""departureAirport"": ""JFK"" }
    ]
}");

Collections are created automatically on first insert. The document's _id field is auto-generated (sequential, time-ordered) if you don't supply one.

Bulk insert

For loading large datasets, BulkInsert acquires the write lock once and defers index updates until the batch completes. Expect 50,000–70,000 docs/sec on a modern laptop.

var batch = new List<BsonDocument>();
for (int i = 0; i < 10_000; i++)
{
    batch.Add(BsonDocument.FromJson($@"{{ ""pnr"": ""ORD{i:D6}"" }}"));
}
db.BulkInsert("orders", batch);

Query

SQL-like queries hit a single Execute method. Dot notation navigates nested objects. Bracket notation indexes arrays.

// Point lookup
var r1 = db.Execute("SELECT * FROM orders WHERE pnr = 'ABC123'");

// Nested field
var r2 = db.Execute("SELECT * FROM orders WHERE passenger.lastName = 'Smith'");

// Array element
var r3 = db.Execute("SELECT * FROM orders WHERE flights[0].departureAirport = 'JFK'");

// Range, ordering, limit
var r4 = db.Execute(@"
    SELECT pnr, passenger.lastName
    FROM orders
    WHERE flights[0].fareAmount > 500
    ORDER BY flights[0].fareAmount DESC
    LIMIT 10
");

// Iterate results
foreach (var doc in r4.Documents)
    Console.WriteLine(doc.ToJson());

Every query returns a QueryResult containing the documents, the plan used (INDEX_SCAN, COLLECTION_SCAN, etc.), and the execution time in milliseconds.

Create indexes

Indexes are the difference between a 1ms lookup and a 1-second scan. Create one for any JSON path you query frequently.

// Single-field index
db.CreateIndex("orders", "pnr", "idx_pnr", unique: true);

// Nested path
db.CreateIndex("orders", "passenger.lastName", "idx_lastname");

// Composite - good for multi-field WHERE clauses
db.Execute("CREATE INDEX idx_status_date ON orders (status, createdAt)");

Indexes are persistent — they survive database restart without being rebuilt. They're also incrementally maintained on every insert, update, and delete, so queries always see fresh results.

Interactive REPL

Included in the repo is an interactive console for experimenting with SQL queries against your database.

dfdb repl ./data/data.dfdb

dfdb> SELECT * FROM orders WHERE pnr = 'ABC123'
dfdb> SELECT status, COUNT(*) FROM orders GROUP BY status
dfdb> stats

REST API

For testing from Postman or wiring other services in, use dfdb serve:

dfdb serve --port 5000 --data-dir ./data

# Then from another terminal
curl -X POST http://localhost:5000/query \
     -H "Content-Type: application/json" \
     -d '{"sql": "SELECT * FROM orders LIMIT 5"}'

The same endpoints power the admin UI, so pointing it at any dfdb serve instance just works.

Next steps

Query Language reference — all the SQL features we support
Data Modeling guide — when to embed, when to reference
Replication — read scaling and zero-downtime handover
Deployment — running across multiple machines and datacenters
Sharding — horizontal scale with consistent hashing and online rebalance
Security — API keys, replication secrets, TLS