SQL Reference
DocumentForge speaks a familiar SQL dialect extended with JSON paths: dots
navigate into objects, brackets index arrays. Everything below runs through
the dfdb CLI (query, repl) and over the
REST API POST /query endpoint.
SELECT
SELECT * FROM orders
SELECT pnr, passenger.lastName FROM orders
SELECT * FROM orders WHERE pnr = 'ABC123'
SELECT * FROM orders ORDER BY createdAt DESC LIMIT 100 OFFSET 50Returns matching documents. The * selector returns the full document,
including all nested fields. A comma-separated list projects specific fields,
which may be nested paths or array elements.
Collection names
Names are case-insensitive and may contain letters, digits, dots,
underscores, and hyphens. Dotted names like aerotoys.tax.users or
v2.audit.events work in every position a collection name appears (FROM, JOIN,
INSERT INTO, UPDATE, DELETE, CREATE INDEX) and through the REST API path:
-- All four refer to the SAME collection
SELECT * FROM aerotoys.tax.users
SELECT * FROM AeroToys.Tax.Users
POST /collections/aerotoys.tax.users
GET /collections/aerotoys.tax.usersInternally the catalog folds names to lowercase on store; lookups go through
the same normalisation, so case-mismatched calls resolve to the same physical
collection. The IndexDefinition.Name on indexes preserves the original
casing supplied at creation time, but lookups are also case-insensitive — see
CREATE / DROP INDEX.
Unknown collection
SQL against a collection that doesn’t exist returns a clear error rather than an empty result. The error message lists known collections and, when the only difference is case, suggests the canonical name:
{
"error": "Collection 'aerotoys.tax.environments' not found.
Known collections: orders, flights, users, aerotoys.tax.rules."
}{
"error": "Collection 'TaxVersions' not found.
Did you mean 'taxversions'? Names are stored case-insensitively."
}The HTTP status is 400 for these cases — no more silently-empty result sets
to debug.
DISTINCT
Strip duplicate rows from the result. Place DISTINCT right after SELECT and
before the field list. The dedupe key is the projected row, not the underlying
document, so SELECT DISTINCT airline FROM flights returns one row per unique
airline (the internal _id is omitted from the projection so duplicates
collapse).
SELECT DISTINCT airline FROM flights
SELECT DISTINCT airline, origin, destination FROM flights
SELECT DISTINCT passenger.lastName FROM orders WHERE status = 'CONFIRMED'
SELECT DISTINCT flights[0].departureAirport FROM orders LIMIT 10Plays nicely with WHERE, ORDER BY, LIMIT, and OFFSET. With
SELECT DISTINCT * every row is unique by definition (the _id field is
per-document), so the modifier is effectively a no-op.
Query plans gain a + DISTINCT suffix. LIMIT is not pushed down to the
scan when DISTINCT is present (otherwise the scan would short-circuit before
all duplicates are seen).
JSON paths
Any place you can reference a field, you can use a JSON path. Dots navigate into objects; brackets index arrays.
| Path | Meaning |
|---|---|
pnr | Top-level field |
passenger.lastName | Nested object field |
flights[0].flightNumber | First array element’s field |
flights[2].fare.total | Nested inside array |
flights[*].flightNumber | Any array element (index-only) |
The [*] wildcard is supported when creating indexes (index every element’s
value), but in WHERE clauses you need a specific index. Use arrays with known
positions for WHERE predicates, or combine with aggregations to
filter-by-contents.
WHERE clauses
Supported operators: =, !=, >, <, >=, <=, LIKE, IS NULL,
IS NOT NULL. Combine with AND / OR / NOT and parentheses.
SELECT * FROM orders WHERE status = 'CONFIRMED' AND passenger.age >= 18
SELECT * FROM users WHERE email LIKE '%@example.com'
SELECT * FROM orders WHERE cancelledAt IS NULL
SELECT * FROM orders WHERE (status = 'CONFIRMED' OR status = 'TICKETED') AND total > 1000Equality uses indexes
A = predicate on an indexed path uses INDEX_SCAN. Range predicates (>,
<, >=, <=) use INDEX_RANGE_SCAN when an index exists.
Composite indexes
An index on (status, createdAt) fires when both columns appear with equality
in an AND clause. See CREATE INDEX.
OR of equalities on the same indexed column
WHERE col = a OR col = b OR col = c on an indexed col uses
INDEX_SCAN_MULTI(idx_name, N keys) — one indexed lookup per branch, results
unioned with dedupe. No more accidental collection scans for set lookups.
IN clause
WHERE col IN (v1, v2, …, vN) is the natural shape for “any of these”.
Internally it lowers to col = v1 OR col = v2 OR …, so it fires the same
multi-key index scan when an index covers the column.
SELECT id FROM rules WHERE id IN ('rule-bag-policy', 'rule-tier-bonus')
SELECT * FROM orders WHERE status IN ('CONFIRMED', 'TICKETED')
SELECT * FROM users WHERE email IN ('a@example.com', 'b@example.com', 'c@example.com')
SELECT * FROM orders WHERE qty IN (1, 5, 10, 100)- Empty
IN ()is a parse error — at least one value is required. - Duplicate values (
IN ('x', 'y', 'x')) are deduped at the result level, not the input level — fine to pass them in. - Numeric, string, boolean, and null literals all work.
- Plan:
INDEX_SCAN_MULTI(idx_name, N keys)with an index,COLLECTION_SCANwithout.
ORDER BY · LIMIT · OFFSET
SELECT * FROM orders ORDER BY createdAt DESC LIMIT 50 OFFSET 100LIMIT without ORDER BY is pushed down into the scan, meaning only N
documents are materialized. With ORDER BY, the full matching set is
materialized before sorting.
JOIN
A hash join between two collections on an equality predicate. Useful when you’ve referenced a shared entity (like a flight) rather than embedding it.
SELECT * FROM orders
JOIN flights ON orders.bookedFlights[0].flightRef = flights.flightId
WHERE orders.pnr = 'ABC123'The result document has both collections nested under their name:
{
"orders": { "pnr": "ABC123", "bookedFlights": [] },
"flights": { "flightId": "AA100_2024-06-15", "status": "DELAYED" }
}In WHERE and SELECT, prefix paths with the collection name: orders.pnr,
flights.status.
Aggregations
Five aggregate functions: COUNT, SUM, AVG, MIN, MAX. Combine freely
in the SELECT list.
SELECT COUNT(*) FROM orders
SELECT SUM(tickets[0].fare.total) FROM orders WHERE status = 'CONFIRMED'
SELECT MIN(fare.total), MAX(fare.total), AVG(fare.total) FROM ticketsThe result key in the output document matches the aggregate expression:
COUNT(*), SUM(tickets[0].fare.total), etc.
GROUP BY
Bucket documents by one or more fields; aggregates run per bucket.
SELECT status, COUNT(*), SUM(fare.total)
FROM orders
GROUP BY statusSELECT airline, cabin, COUNT(*), AVG(fare.total)
FROM orders
GROUP BY airline, cabinEach result row includes the grouping columns plus all aggregates.
INSERT
INSERT INTO orders VALUES { "pnr": "XYZ789", "status": "PENDING" }The document is parsed as JSON. Missing _id fields are filled with a
sequential time-ordered ID.
UPDATE
UPDATE orders SET status = 'CANCELLED' WHERE pnr = 'ABC123'
UPDATE orders SET passenger.vip = true WHERE passenger.lastName = 'Smith'Paths in SET can be nested. All matching documents are updated; the affected-count is returned.
DELETE
DELETE FROM orders WHERE status = 'CANCELLED' AND createdAt < '2020-01-01'Deletes all matching documents. A DELETE without WHERE is supported but should be used with care.
CREATE / DROP INDEX
CREATE INDEX idx_pnr ON orders (pnr)
CREATE UNIQUE INDEX idx_pnr ON orders (pnr)
CREATE INDEX idx_status_created ON orders (status, createdAt)
DROP INDEX idx_pnr ON ordersIndexes are built from existing data synchronously, then incrementally maintained. Unique indexes reject duplicates on insert. See Concepts → Indexes for index strategy.
Query plans
Every QueryResult returns a QueryPlan string describing how the query was
executed.
| Plan | Meaning |
|---|---|
INDEX_SCAN(name) | Equality lookup via single-field index |
INDEX_RANGE_SCAN(name) | Range lookup via B-tree walk |
INDEX_SCAN_COMPOSITE(name) | Multi-field equality via composite index |
COLLECTION_SCAN | Full scan, no matching index |
... + HASH_JOIN(coll) | Hash join was applied to the result |
... + AGGREGATE | Aggregation / GROUP BY was applied |
Check the plan during development — if you see COLLECTION_SCAN on a query you
run often, add an index.