ADR-003: Mercury Storage
| Date | Author | Status |
|---|---|---|
| 2026-04-27 | Fabian Beyerlein | Accepted |
Context
The initial implementation of Mercury's storage was to use a simple, file-based approach of storing the required data in JSON files.
Data stored:
- Agent Identity (sensitive)
- Integration Configuration (sensitive)
In addition, Mercury currently has no redundancy mechanism for outbound messages
to Nexus. Those should also be stored and removed once we receive an ACK from
the cloud.
Out of Scope
While this ADR lays the groundwork for an outbox implementation for the TunnelService (by providing the storage port it would build on), the outbox design itself is covered in ADR-004.
Decision
Introduce etcd-io/bbolt1 as an embedded database with namespace (Bucket)
and encryption support.
Port
// ports/storage.go
type KVIteratorFn func (key, value []byte) error
type KVStore interface {
Get(ctx context.Context, key []byte) ([]byte, error)
Put(ctx context.Context, key, value []byte) error
Delete(ctx context.Context, key []byte) error
Iterate(ctx context.Context, opts KVIterateOpts, fn KVIteratorFn) error
}
type KVIterateOpts struct {
Prefix []byte // optional key prefix filter
Reverse bool
Limit int // 0 = unlimited
}
Namespacing
The App Shell (app/app.go) will instantiate the Storage module and hand out
namespaced instances to other consuming modules.
// modules/storage/infra/bolt/store.go
package bolt
type Store struct {
db *bbolt.DB
cipher domain.Cipher
}
func (s *Store) Namespace(name string) ports.KVStore {
return &namespacedStore{
db: s.db,
bucket: []byte(name),
cipher: s.cipher
}
}
type namespacedStore struct {
db *bbolt.DB
bucket []byte
cipher domain.Cipher
}
func (s *namespacedStore) Get(ctx context.Context, key []byte) ([]byte, error) {
var out []byte
err := ns.db.View(func(tx *bbolt.Tx) error {
b := tx.Bucket(ns.bucket)
if b == nil {
return domain.NewBucketNotFoundErr(string(s.bucket))
}
v := b.Get(key)
if v == nil {
return domain.NewNotFoundErr(string(key))
}
decrypted, err := ns.cipher.Decrypt(v)
if err != nil {
return domain.NewDecryptionErr(err)
}
out = make([]byte, len(decrypted))
copy(out, decrypted)
return nil
})
return out, err
}
func (ns *namespacedStore) Put(ctx context.Context, key, value []byte) error {
return ns.db.Update(func(tx *bbolt.Tx) error {
b, err := tx.CreateBucketIfNotExists(ns.bucket)
if err != nil {
return err
}
encrypted, err := ns.cipher.Encrypt(value)
if err != nil {
return domain.NewEncryptionErr(err)
}
return b.Put(key, encrypted)
})
}
// ...
Encryption
Implement a Cipher interface in pkg/crypto/cipher.go using
ChaCha20-Poly13052 AEAD 3. Replace the two existing usages of
ChaCha20-Poly1305 in Nexus' and Mercury's EncryptionServices respectively.
Also implement a Master Key mechanism for Mercury that will be read from an
environment variable (CA_MASTER_KEY or CA_MASTER_KEY_FILE).
Ensure all encryption material is zeroed when no longer needed.
Key Rotation
On startup, if CA_MASTER_KEY_OLD is set, Mercury attempts decryption
with the current key first. If that fails, it falls back to the old key,
re-encrypts all values under the new key in a single atomic transaction,
and continues boot. An audit event is emitted to the cloud once the
tunnel is established.
Mercury refuses to start if both keys are present but identical, to prevent stale configuration.
Schema Evolution
bbolt has no schema in the "SQL" sense - the "schema" is bucket existence and
key encoding conventions. Required buckets are created idempotently on startup
via CreateBucketIfNotExists.
Consumer-level data evolution is owned by each consuming module via versioned
key roots (e.g. identity-v1 -> identity-v2) with backward-compatible reads.
Key Encoding Conventions
- Keys are not encrypted - encryption applies to values only, preserving bbolt's B+ tree sort order for prefix scans ordered iteration
- Sequence-based keys (e.g. outbox) use big-endian
uint64encoding for natural ordering - String-based keys (e.g. config, identity) use plain UTF-8 byte representation
Alternatives Considered
SQLite (via modernc.org/sqlite or mattn/go-sqlite3)
Full relational model with SQL query capabilities. Rejected because:
- Mercury's data access patterns are pure key-value (identity blob, config blob, ordered outbox queue). SQL adds complexity without benefit.
mattn/go-sqlite3requires CGO, complicating cross-compilation for on-premise targets. The pure-Go port (modernc.org) is significantly slower and less battle-tested.- Schema migrations require tooling (Atlas, goose, etc.) — overhead that bbolt avoids entirely.
Flat JSON files (status quo)
The current approach. Rejected because:
- No atomicity — a crash mid-write can corrupt the file. Workarounds (write-tmp-then-rename) are fragile across OS/filesystem combinations encountered in hospital IT environments.
- No built-in encryption at rest — would require wrapping every read/write with ad-hoc crypto, duplicating what a storage module would centralize.
- Cannot serve as a reliable outbox/WAL. Ordered iteration, atomic delete-after-ACK, and crash recovery are not feasible without effectively reimplementing a database.
BadgerDB
Pure-Go LSM-tree KV store with more features (TTLs, transactions, value log separation). Rejected because:
- LSM architecture creates multiple files and requires background compaction — operationally more complex for hospital IT to back up and reason about compared to bbolt's single-file model.
- Higher memory footprint due to in-memory tables and bloom filters.
- Mercury's dataset is small (< 1 MB typically) — BadgerDB's write optimization for large volumes is unnecessary overhead.
Consequences
Positive
- Single-file persistence — the entire database is one file, trivial to back up, restore, or relocate in hospital environments.
- Transparent encryption — consuming modules interact with a
KVStoreport and never handle cryptographic material. Sensitive data (agent identity, integration config) is encrypted at rest by default. - Crash safety — bbolt's B+ tree with copy-on-write semantics provides ACID transactions. A crash mid-write cannot corrupt the database.
- Zero external dependencies — no database server, no CGO, no runtime configuration beyond the master key. Simplifies on-premise deployment.
- Shared
Cipherimplementation — extracting ChaCha20-Poly1305 intopkg/cryptoeliminates the duplicate cipher code currently in both Nexus and Mercury. - Outbox foundation — the storage port provides the ordered-iteration and atomic-delete primitives needed for a reliable outbox/WAL (ADR-004).
Negative
- Single-writer constraint — bbolt allows only one write transaction at a time (readers are concurrent). Acceptable for Mercury's low write volume but would not scale to high-throughput scenarios.
- No query language — modules that need filtering or aggregation must
implement it in Go on top of
Iterate. This is a conscious trade-off: Mercury's access patterns are simple enough that this is preferable to carrying a SQL engine. - Master key operational burden — hospital IT must provision and secure the encryption key. Loss of the key means the database is unrecoverable. This must be clearly documented in the deployment guide.
Operational Notes
- The database file should be included in the host's backup strategy. Since
bbolt supports concurrent readers, file-level snapshots are safe during
normal operation (not mid-compaction, but bbolt does not compact
in-background — only on explicit
db.Update). - For key rotation, a CLI command or migration step that re-encrypts all values under a new key should be planned as follow-up work.