Roadmap — what's coming next in DocumentForge

Just shipped

Recent work that's already landed in master. Pull the binary or sync your build to get them.

Atomic bulk inserts — POST /collections/{name}/bulk?atomic=true rolls back the whole batch on any failure, returning ids[] and errors[] for partial-mode batches.
SSMS-style Studio in the admin UI — collections explorer, tabbed query workspace, status bar with node + role.
Swarm — fleet-wide management, broadcast operations to selected nodes, RTT/role/uptime per node.
Multi-connection sidebar — register every endpoint you operate (dev, staging, prod-leader, shards), switch with one dropdown.
WHERE IN (…) and OR-of-equalities use a multi-key INDEX_SCAN instead of a collection scan.
NDJSON streaming on POST /query — set Accept: application/x-ndjson for low-memory streaming consumers.
Unknown collection in SQL — clear error with suggestions, no more silent empty results. Case-insensitive lookups across the catalog.
Drop cascade, per-index rebuild, unique-index atomicity — index lifecycle hardening.

Engine — next quarter

Top of the queue. These have either started, have a design doc, or are next pickup. Treat the sizing as a working estimate, not a contract.

Multi-statement transactions — BEGIN / COMMIT / ROLLBACK. The WAL, recovery log, and ?atomic=true bulk path already prove the durability and rollback machinery. Adding explicit transactions across arbitrary statements is the natural next step. Single-writer means serializable comes free; concurrent-reader MVCC is a separate, larger item below.
Live tail / change feed. A read endpoint that emits the WAL stream as JSON events — “document inserted”, “document updated”, “document deleted” — for downstream consumers (search index sync, audit pipelines, cache invalidation). The sequence numbers already exist; this is mostly an HTTP framing job.
Better query planner stats. Per-index histograms for cost-based WHERE ordering and join-order selection. Today the planner is rule-based and good-enough; this lifts it to be smart-enough on multi-predicate queries.
Bulk UPDATE / DELETE with row-count return. Already supported in SQL, but the API surface returns “ok” rather than the affected-rows count and a sample of impacted ids. Useful for OMS amend/cancel flows.
NuGet package on nuget.org. Today it's a project reference. We've held off until the API surface settles; we're close.

Engine — medium horizon

Real engineering, scoped but not yet started. Quarter or two of focused work each.

MVCC for concurrent readers. Today's writer holds a global lock; readers wait. Switching to multi-version snapshots gives concurrent reads zero contention with writers. Per-document version chains, snapshot timestamps, retired-version GC. Real work but well-trodden territory.
Multi-master replication (document-level last-writer-wins). Today's replication is single-leader. Bidirectional replication with (timestamp, nodeId) tagging on every write, last-writer-wins resolution at the document level. Two-to-four weeks of focused work plus the conflict-resolution test matrix.
Vector index — HNSW or IVF. Brute-force cosine similarity over a small corpus works today via SQL; an approximate-nearest-neighbour index is on the roadmap once a real workload lands. Likely HNSW behind a familiar VECTOR_DISTANCE() SQL function.
Compressed page format. Per-page LZ4 or Zstd to halve disk and memory footprint at the cost of ~15% CPU on hot reads. Optional, per-collection.
Online schema-style index drop without a write pause. Today drop-index is fast but takes the write lock briefly. Concurrent drop is a small but visible polish item.

Engine — on the horizon

Bigger calls. Each is feasible, each is a real commitment, each is on the table when a customer needs it badly enough to fund the work.

Per-field CRDT semantics. Document-level merge is straightforward; per-field merges over arbitrary nested JSON (G-counters, OR-sets, sequence types) is research-track but doable.
Geospatial indexes. R-tree or geohash-grid for “all flights within 200 km of LHR”-style queries.
Time-series collections. Specialised on-disk format for high-write append-mostly workloads with TTL — telemetry, metrics, audit logs.
JSON Schema validation on insert/update, with rejection or audit modes.
Encrypted-at-rest data files. AES-GCM page encryption with KMS-backed keys. Today the file is plaintext; OS-level encryption (LUKS, BitLocker) covers most threat models.
Read-replica routing inside the client. Library-level “send reads to whichever follower has the lowest RTT” without a separate proxy.

Admin UI — next

Studio and Swarm are the major recent landings. Next on the UI side:

Result-set virtualisation in Studio — render the first N rows immediately, lazy-load on scroll, so 100K-row results don't lock up the tab.
Saved queries / snippets per connection. Right now query tabs are ephemeral; saved snippets with a name and a one-line description make Studio a real working environment.
Index advisor. Run a query → see the plan → if it's a collection scan, surface a suggested CREATE INDEX right there in the results pane.
JSON document editor — open a row, edit fields with type-aware widgets, save back as an UPDATE. Today it's view-only.
Bulk-import wizard. Drag a JSON or NDJSON file onto Studio, pick a collection, optionally enable ?atomic=true, watch progress.

Admin UI — medium

Live query feed. The change-feed engine work above, surfaced as a streaming pane in Studio for “what's happening on this collection right now”.
Replication topology graph. A live node-and-edge view of leader/follower relationships across registered Connections, with replication lag per follower.
Schema inference. Sample N documents from a collection, infer a JSON Schema, surface field frequencies and types — useful for “what shape is this collection actually” on inherited systems.
Per-connection access control. Today a Connection is one bearer token. Role-based per-collection permissions surfaced in the UI when the engine supports it.
Cluster topology builder → live deploy. Today the builder exports a cluster.json; a future revision applies it directly to a swarm of registered Connections, reconfiguring shards in place.

Honest framing: we ship features when they earn their way in — a real workload needs them, the design holds up under scrutiny, and the test surface is sane. We'd rather under-promise here and surprise on the upside than miss publicly. If something on this list matters to you, tell us — funded use cases jump the queue.

Suggest something

This isn't a closed roadmap. The fastest way to get something onto it is to open an issue at github.com/aerotoysio/documentforge/issues with the use case, the workload shape, and what you'd be willing to test. We read all of them.

← Why DocumentForge for OMS · Try it now →

Roadmap.

Just shipped

Engine — next quarter

Engine — medium horizon

Engine — on the horizon

Admin UI — next

Admin UI — medium

Suggest something