Just shipped
Recent work that's already landed in master. Pull the binary or sync your build to get them.
- Atomic bulk inserts —
POST /collections/{name}/bulk?atomic=truerolls back the whole batch on any failure, returningids[]anderrors[]for partial-mode batches. - SSMS-style Studio in the admin UI — collections explorer, tabbed query workspace, status bar with node + role.
- Swarm — fleet-wide management, broadcast operations to selected nodes, RTT/role/uptime per node.
- Multi-connection sidebar — register every endpoint you operate (dev, staging, prod-leader, shards), switch with one dropdown.
- WHERE IN (…) and OR-of-equalities use a multi-key
INDEX_SCANinstead of a collection scan. - NDJSON streaming on
POST /query— setAccept: application/x-ndjsonfor low-memory streaming consumers. - Unknown collection in SQL — clear error with suggestions, no more silent empty results. Case-insensitive lookups across the catalog.
- Drop cascade, per-index rebuild, unique-index atomicity — index lifecycle hardening.
Engine — next quarter
Top of the queue. These have either started, have a design doc, or are next pickup. Treat the sizing as a working estimate, not a contract.
-
Multi-statement transactions —
BEGIN/COMMIT/ROLLBACK. The WAL, recovery log, and?atomic=truebulk path already prove the durability and rollback machinery. Adding explicit transactions across arbitrary statements is the natural next step. Single-writer means serializable comes free; concurrent-reader MVCC is a separate, larger item below. - Live tail / change feed. A read endpoint that emits the WAL stream as JSON events — “document inserted”, “document updated”, “document deleted” — for downstream consumers (search index sync, audit pipelines, cache invalidation). The sequence numbers already exist; this is mostly an HTTP framing job.
-
Better query planner stats.
Per-index histograms for cost-based
WHEREordering and join-order selection. Today the planner is rule-based and good-enough; this lifts it to be smart-enough on multi-predicate queries. -
Bulk
UPDATE/DELETEwith row-count return. Already supported in SQL, but the API surface returns “ok” rather than the affected-rows count and a sample of impacted ids. Useful for OMS amend/cancel flows. - NuGet package on nuget.org. Today it's a project reference. We've held off until the API surface settles; we're close.
Engine — medium horizon
Real engineering, scoped but not yet started. Quarter or two of focused work each.
- MVCC for concurrent readers. Today's writer holds a global lock; readers wait. Switching to multi-version snapshots gives concurrent reads zero contention with writers. Per-document version chains, snapshot timestamps, retired-version GC. Real work but well-trodden territory.
-
Multi-master replication (document-level last-writer-wins).
Today's replication is single-leader. Bidirectional replication with
(timestamp, nodeId)tagging on every write, last-writer-wins resolution at the document level. Two-to-four weeks of focused work plus the conflict-resolution test matrix. -
Vector index — HNSW or IVF.
Brute-force cosine similarity over a small corpus works today via SQL; an approximate-nearest-neighbour index is on the roadmap once a real workload lands. Likely HNSW behind a familiar
VECTOR_DISTANCE()SQL function. - Compressed page format. Per-page LZ4 or Zstd to halve disk and memory footprint at the cost of ~15% CPU on hot reads. Optional, per-collection.
- Online schema-style index drop without a write pause. Today drop-index is fast but takes the write lock briefly. Concurrent drop is a small but visible polish item.
Engine — on the horizon
Bigger calls. Each is feasible, each is a real commitment, each is on the table when a customer needs it badly enough to fund the work.
- Per-field CRDT semantics. Document-level merge is straightforward; per-field merges over arbitrary nested JSON (G-counters, OR-sets, sequence types) is research-track but doable.
- Geospatial indexes. R-tree or geohash-grid for “all flights within 200 km of LHR”-style queries.
- Time-series collections. Specialised on-disk format for high-write append-mostly workloads with TTL — telemetry, metrics, audit logs.
- JSON Schema validation on insert/update, with rejection or audit modes.
- Encrypted-at-rest data files. AES-GCM page encryption with KMS-backed keys. Today the file is plaintext; OS-level encryption (LUKS, BitLocker) covers most threat models.
- Read-replica routing inside the client. Library-level “send reads to whichever follower has the lowest RTT” without a separate proxy.
Admin UI — next
Studio and Swarm are the major recent landings. Next on the UI side:
- Result-set virtualisation in Studio — render the first N rows immediately, lazy-load on scroll, so 100K-row results don't lock up the tab.
- Saved queries / snippets per connection. Right now query tabs are ephemeral; saved snippets with a name and a one-line description make Studio a real working environment.
- Index advisor. Run a query → see the plan → if it's a collection scan, surface a suggested
CREATE INDEXright there in the results pane. - JSON document editor — open a row, edit fields with type-aware widgets, save back as an
UPDATE. Today it's view-only. - Bulk-import wizard. Drag a JSON or NDJSON file onto Studio, pick a collection, optionally enable
?atomic=true, watch progress.
Admin UI — medium
- Live query feed. The change-feed engine work above, surfaced as a streaming pane in Studio for “what's happening on this collection right now”.
- Replication topology graph. A live node-and-edge view of leader/follower relationships across registered Connections, with replication lag per follower.
- Schema inference. Sample N documents from a collection, infer a JSON Schema, surface field frequencies and types — useful for “what shape is this collection actually” on inherited systems.
- Per-connection access control. Today a Connection is one bearer token. Role-based per-collection permissions surfaced in the UI when the engine supports it.
- Cluster topology builder → live deploy. Today the builder exports a
cluster.json; a future revision applies it directly to a swarm of registered Connections, reconfiguring shards in place.
Suggest something
This isn't a closed roadmap. The fastest way to get something onto it is to open an issue at github.com/aerotoysio/documentforge/issues with the use case, the workload shape, and what you'd be willing to test. We read all of them.