Chapter 02

Query Language

SQL syntax extended with JSON paths. Familiar on the left, flexible on the right.

SELECT

SELECT * FROM orders
SELECT pnr, passenger.lastName FROM orders
SELECT * FROM orders WHERE pnr = 'ABC123'
SELECT * FROM orders ORDER BY createdAt DESC LIMIT 100 OFFSET 50

Returns matching documents. The * selector returns the full document, including all nested fields. A comma-separated list projects specific fields.

DISTINCT

Strip duplicate rows from the result. Place DISTINCT right after SELECT and before the field list. The dedupe key is the projected row, not the underlying document, so SELECT DISTINCT airline FROM flights returns one row per unique airline (the internal _id is omitted from the projection so duplicates collapse).

SELECT DISTINCT airline FROM flights
SELECT DISTINCT airline, origin, destination FROM flights
SELECT DISTINCT passenger.lastName FROM orders WHERE status = 'CONFIRMED'
SELECT DISTINCT flights[0].departureAirport FROM orders LIMIT 10

Plays nicely with WHERE, ORDER BY, LIMIT, and OFFSET. With SELECT DISTINCT * every row is unique by definition (the _id field is per-document), so the modifier is effectively a no-op.

Query plans gain a + DISTINCT suffix. LIMIT is not pushed down to the scan when DISTINCT is present (otherwise the scan would short-circuit before all duplicates are seen).

JSON paths

Any place you can reference a field, you can use a JSON path. Dots navigate into objects; brackets index arrays.

PathMeaning
pnrTop-level field
passenger.lastNameNested object field
flights[0].flightNumberFirst array element's field
flights[2].fare.totalNested inside array
flights[*].flightNumberAny array element (index-only)
Tip: [*] wildcard is supported when creating indexes (index every element's value) but in WHERE clauses you need a specific index. Use arrays with known positions for WHERE predicates, or combine with aggregations to filter-by-contents.

WHERE clauses

Supported operators: =, !=, >, <, >=, <=, LIKE, IS NULL, IS NOT NULL. Combine with AND / OR / NOT and parentheses.

SELECT * FROM orders WHERE status = 'CONFIRMED' AND passenger.age >= 18
SELECT * FROM users  WHERE email LIKE '%@example.com'
SELECT * FROM orders WHERE cancelledAt IS NULL
SELECT * FROM orders WHERE (status = 'CONFIRMED' OR status = 'TICKETED') AND total > 1000

Equality uses indexes

A = predicate on an indexed path uses INDEX_SCAN. Range predicates (>, <, >=, <=) use INDEX_RANGE_SCAN when an index exists.

Composite indexes

An index on (status, createdAt) fires when both columns appear with equality in an AND clause. See CREATE INDEX.

ORDER BY · LIMIT · OFFSET

SELECT * FROM orders ORDER BY createdAt DESC LIMIT 50 OFFSET 100

LIMIT without ORDER BY is pushed down into the scan, meaning only N documents are materialized. With ORDER BY, the full matching set is materialized before sorting.

JOIN

A hash join between two collections on an equality predicate. Useful when you've referenced a shared entity (like a flight) rather than embedding it.

SELECT * FROM orders
JOIN flights ON orders.bookedFlights[0].flightRef = flights.flightId
WHERE orders.pnr = 'ABC123'

The result document has both collections nested under their name:

{
  "orders":  { "pnr": "ABC123", "bookedFlights": [...] },
  "flights": { "flightId": "AA100_2024-06-15", "status": "DELAYED" }
}

In WHERE and SELECT, prefix paths with the collection name: orders.pnr, flights.status.

Aggregations

Five aggregate functions: COUNT, SUM, AVG, MIN, MAX. Combine freely in the SELECT list.

SELECT COUNT(*) FROM orders
SELECT SUM(tickets[0].fare.total) FROM orders WHERE status = 'CONFIRMED'
SELECT MIN(fare.total), MAX(fare.total), AVG(fare.total) FROM tickets

The result key in the output document matches the aggregate expression: COUNT(*), SUM(tickets[0].fare.total), etc.

GROUP BY

Bucket documents by one or more fields; aggregates run per bucket.

SELECT status, COUNT(*), SUM(fare.total)
FROM orders
GROUP BY status

SELECT airline, cabin, COUNT(*), AVG(fare.total)
FROM orders
GROUP BY airline, cabin

Each result row includes the grouping columns plus all aggregates.

INSERT

INSERT INTO orders VALUES { "pnr": "XYZ789", "status": "PENDING" }

The document is parsed as JSON. Missing _id fields are filled with a sequential time-ordered ID.

UPDATE

UPDATE orders SET status = 'CANCELLED' WHERE pnr = 'ABC123'
UPDATE orders SET passenger.vip = true WHERE passenger.lastName = 'Smith'

Paths in SET can be nested. All matching documents are updated; affected-count is returned.

DELETE

DELETE FROM orders WHERE status = 'CANCELLED' AND createdAt < '2020-01-01'

Deletes all matching documents. A DELETE without WHERE is supported but should be used with care.

CREATE / DROP INDEX

CREATE INDEX idx_pnr ON orders (pnr)
CREATE UNIQUE INDEX idx_pnr ON orders (pnr)
CREATE INDEX idx_status_created ON orders (status, createdAt)
DROP INDEX idx_pnr ON orders

Indexes are built from existing data synchronously, then incrementally maintained. Unique indexes reject duplicates on insert.

Query plans

Every QueryResult returns a QueryPlan string describing how the query was executed.

PlanMeaning
INDEX_SCAN(name)Equality lookup via single-field index
INDEX_RANGE_SCAN(name)Range lookup via B-tree walk
INDEX_SCAN_COMPOSITE(name)Multi-field equality via composite index
COLLECTION_SCANFull scan, no matching index
... + HASH_JOIN(coll)Hash join was applied to the result
... + AGGREGATEAggregation / GROUP BY was applied

Check the plan during development — if you see COLLECTION_SCAN on a query you run often, add an index.