Two patterns
Every document database gives you two ways to relate entities. Both are valid. The mistake is picking one dogmatically.
Embed — nest the related data inside the parent
{
"pnr": "ABC123",
"passenger": { "firstName": "John", "lastName": "Smith" },
"flights": [
{ "flightNumber": "AA100", "cabin": "BUSINESS", "fare": 450 }
]
}
Reference — store an ID; use JOIN when needed
// orders collection
{ "pnr": "ABC123", "flightIds": ["AA100_2024-06-15"] }
// flights collection
{ "flightId": "AA100_2024-06-15", "status": "DELAYED", "gate": "B12" }
The rule of thumb
| Factor | Embed | Reference |
|---|---|---|
| Lifetime | Dies with parent | Lives independently |
| Access pattern | Always read together | Sometimes alone |
| Shared by many | No — owned | Yes |
| Update frequency | Rarely / as a whole | Often, independently |
| Bounded size | Yes (small N) | Can be large |
| Atomic read | Single query | Needs JOIN |
Case study: Airline PNR
This is where the two patterns really shine together. Real production airline systems combine them — and it's elegant.
Embed inside the order (snapshot at booking time)
{
"pnr": "ABC123",
"createdAt": "2024-06-01T10:30:00Z",
"passenger": { // embed — owned by this order
"firstName": "John",
"lastName": "Smith",
"frequentFlyerNumber": "AA123456"
},
"bookedFlights": [ // embed — SNAPSHOT of what was booked
{
"flightNumber": "AA100",
"cabin": "BUSINESS",
"fareAtBooking": 450,
"flightRef": "AA100_2024-06-15" // pointer to current state
}
],
"payment": { // embed — belongs to this order
"method": "CREDIT_CARD",
"last4": "4242",
"amount": 450
}
}
Reference separately (current state)
// flights — current operational state, changes independently
{
"flightId": "AA100_2024-06-15",
"status": "DELAYED",
"actualDepartureTime": "...",
"gate": "B23",
"aircraftRegistration": "N12345"
}
// airports — master data
{ "code": "JFK", "name": "John F. Kennedy", "timezone": "America/New_York" }
// aircraft — fleet data
{ "registration": "N12345", "type": "B738", "status": "IN_SERVICE" }
Why this is the right design
- One query reads an order — all customer-facing info is embedded.
- Flight status updates don't rewrite every order — one row in
flights. - History is preserved — "I booked business class for $450" even if the flight is cancelled.
- JOIN when you need live status — see below.
Joining them
-- Get order with current flight status SELECT * FROM orders JOIN flights ON orders.bookedFlights[0].flightRef = flights.flightId WHERE orders.pnr = 'ABC123' -- Find orders affected by a cancelled flight SELECT orders.pnr, orders.passenger FROM orders JOIN flights ON orders.bookedFlights[0].flightRef = flights.flightId WHERE flights.status = 'CANCELLED'
IDs & references
Every document has an _id field. If you don't supply one on insert, a sequential time-ordered 128-bit ID is generated. Time-ordering keeps recently-written documents near each other on disk, improving cache locality.
For references, you can:
- Use the auto-generated
_idfrom the target document - Use a domain-meaningful ID (like
flightId: "AA100_2024-06-15") - Index both sides of the reference for fast JOIN performance
Index patterns
Indexing decisions should follow your query patterns. Here are the common ones:
Primary lookup
CREATE UNIQUE INDEX idx_pnr ON orders (pnr)
Sub-field lookup (search by passenger)
CREATE INDEX idx_lastname ON orders (passenger.lastName)
First array element (common in reservations)
CREATE INDEX idx_first_flight ON orders (flights[0].flightNumber)
Composite for multi-field filters
CREATE INDEX idx_status_created ON orders (status, createdAt)
Range queries
-- Index on a numeric/date field, then use range predicates
CREATE INDEX idx_fare ON orders (fare.total)
SELECT * FROM orders WHERE fare.total > 1000 ORDER BY fare.total LIMIT 100
Schema evolution
DocumentForge is schemaless. You can add new fields to documents at any time without migrating existing data. Old documents simply don't have the new field.
When you query a field that some documents don't have, they evaluate as null. This is usually what you want:
-- Find documents that DO have the new field SELECT * FROM orders WHERE loyaltyTier IS NOT NULL -- Find orders from before the loyaltyTier field existed SELECT * FROM orders WHERE loyaltyTier IS NULL
History as a feature
The embed-the-snapshot pattern gives you something surprisingly valuable: a historical record. An order booked in 2023 remembers the 2023 fare, the 2023 cabin class, the 2023 passenger name. Even if the airline re-fared the flight or the passenger changed their email.
Don't fight this — embrace it. Call the embedded bit bookedFlights or snapshotAtBooking to make the intent explicit.
When you need current information, query the referenced collection. When you need history, read the embedded snapshot. Both are right.