Skip to content

ADR-003: Mercury Storage

Date Author Status
2026-04-27 Fabian Beyerlein Accepted

Context

The initial implementation of Mercury's storage was to use a simple, file-based approach of storing the required data in JSON files.

Data stored:

  • Agent Identity (sensitive)
  • Integration Configuration (sensitive)

In addition, Mercury currently has no redundancy mechanism for outbound messages to Nexus. Those should also be stored and removed once we receive an ACK from the cloud.

Out of Scope

While this ADR lays the groundwork for an outbox implementation for the TunnelService (by providing the storage port it would build on), the outbox design itself is covered in ADR-004.

Decision

Introduce etcd-io/bbolt1 as an embedded database with namespace (Bucket) and encryption support.

Port

// ports/storage.go

type KVIteratorFn func (key, value []byte) error

type KVStore interface {
    Get(ctx context.Context, key []byte) ([]byte, error)
    Put(ctx context.Context, key, value []byte) error
    Delete(ctx context.Context, key []byte) error
    Iterate(ctx context.Context, opts KVIterateOpts, fn KVIteratorFn) error
}

type KVIterateOpts struct {
    Prefix []byte // optional key prefix filter
    Reverse bool
    Limit   int  // 0 = unlimited
}

Namespacing

The App Shell (app/app.go) will instantiate the Storage module and hand out namespaced instances to other consuming modules.

// modules/storage/infra/bolt/store.go
package bolt

type Store struct {
    db     *bbolt.DB
    cipher domain.Cipher
}

func (s *Store) Namespace(name string) ports.KVStore {
    return &namespacedStore{
        db:     s.db,
        bucket: []byte(name),
        cipher: s.cipher
    }
}

type namespacedStore struct {
    db     *bbolt.DB
    bucket []byte
    cipher domain.Cipher
}

func (s *namespacedStore) Get(ctx context.Context, key []byte) ([]byte, error) {
    var out []byte
    err := ns.db.View(func(tx *bbolt.Tx) error {
        b := tx.Bucket(ns.bucket)
        if b == nil {
            return domain.NewBucketNotFoundErr(string(s.bucket))
        }

        v := b.Get(key)
        if v == nil {
            return domain.NewNotFoundErr(string(key))
        }

        decrypted, err := ns.cipher.Decrypt(v)
        if err != nil {
            return domain.NewDecryptionErr(err)
        }

        out = make([]byte, len(decrypted))
        copy(out, decrypted)
        return nil
    })
    return out, err
}

func (ns *namespacedStore) Put(ctx context.Context, key, value []byte) error {
    return ns.db.Update(func(tx *bbolt.Tx) error {
        b, err := tx.CreateBucketIfNotExists(ns.bucket)
        if err != nil {
            return err
        }

        encrypted, err := ns.cipher.Encrypt(value)
        if err != nil {
            return domain.NewEncryptionErr(err)
        }

        return b.Put(key, encrypted)
    })
}

// ...

Encryption

Implement a Cipher interface in pkg/crypto/cipher.go using ChaCha20-Poly13052 AEAD 3. Replace the two existing usages of ChaCha20-Poly1305 in Nexus' and Mercury's EncryptionServices respectively.

Also implement a Master Key mechanism for Mercury that will be read from an environment variable (CA_MASTER_KEY or CA_MASTER_KEY_FILE).

Ensure all encryption material is zeroed when no longer needed.

Key Rotation

On startup, if CA_MASTER_KEY_OLD is set, Mercury attempts decryption with the current key first. If that fails, it falls back to the old key, re-encrypts all values under the new key in a single atomic transaction, and continues boot. An audit event is emitted to the cloud once the tunnel is established.

Mercury refuses to start if both keys are present but identical, to prevent stale configuration.

Schema Evolution

bbolt has no schema in the "SQL" sense - the "schema" is bucket existence and key encoding conventions. Required buckets are created idempotently on startup via CreateBucketIfNotExists.

Consumer-level data evolution is owned by each consuming module via versioned key roots (e.g. identity-v1 -> identity-v2) with backward-compatible reads.

Key Encoding Conventions

  • Keys are not encrypted - encryption applies to values only, preserving bbolt's B+ tree sort order for prefix scans ordered iteration
  • Sequence-based keys (e.g. outbox) use big-endian uint64 encoding for natural ordering
  • String-based keys (e.g. config, identity) use plain UTF-8 byte representation

Alternatives Considered

SQLite (via modernc.org/sqlite or mattn/go-sqlite3)

Full relational model with SQL query capabilities. Rejected because:

  • Mercury's data access patterns are pure key-value (identity blob, config blob, ordered outbox queue). SQL adds complexity without benefit.
  • mattn/go-sqlite3 requires CGO, complicating cross-compilation for on-premise targets. The pure-Go port (modernc.org) is significantly slower and less battle-tested.
  • Schema migrations require tooling (Atlas, goose, etc.) — overhead that bbolt avoids entirely.

Flat JSON files (status quo)

The current approach. Rejected because:

  • No atomicity — a crash mid-write can corrupt the file. Workarounds (write-tmp-then-rename) are fragile across OS/filesystem combinations encountered in hospital IT environments.
  • No built-in encryption at rest — would require wrapping every read/write with ad-hoc crypto, duplicating what a storage module would centralize.
  • Cannot serve as a reliable outbox/WAL. Ordered iteration, atomic delete-after-ACK, and crash recovery are not feasible without effectively reimplementing a database.

BadgerDB

Pure-Go LSM-tree KV store with more features (TTLs, transactions, value log separation). Rejected because:

  • LSM architecture creates multiple files and requires background compaction — operationally more complex for hospital IT to back up and reason about compared to bbolt's single-file model.
  • Higher memory footprint due to in-memory tables and bloom filters.
  • Mercury's dataset is small (< 1 MB typically) — BadgerDB's write optimization for large volumes is unnecessary overhead.

Consequences

Positive

  • Single-file persistence — the entire database is one file, trivial to back up, restore, or relocate in hospital environments.
  • Transparent encryption — consuming modules interact with a KVStore port and never handle cryptographic material. Sensitive data (agent identity, integration config) is encrypted at rest by default.
  • Crash safety — bbolt's B+ tree with copy-on-write semantics provides ACID transactions. A crash mid-write cannot corrupt the database.
  • Zero external dependencies — no database server, no CGO, no runtime configuration beyond the master key. Simplifies on-premise deployment.
  • Shared Cipher implementation — extracting ChaCha20-Poly1305 into pkg/crypto eliminates the duplicate cipher code currently in both Nexus and Mercury.
  • Outbox foundation — the storage port provides the ordered-iteration and atomic-delete primitives needed for a reliable outbox/WAL (ADR-004).

Negative

  • Single-writer constraint — bbolt allows only one write transaction at a time (readers are concurrent). Acceptable for Mercury's low write volume but would not scale to high-throughput scenarios.
  • No query language — modules that need filtering or aggregation must implement it in Go on top of Iterate. This is a conscious trade-off: Mercury's access patterns are simple enough that this is preferable to carrying a SQL engine.
  • Master key operational burden — hospital IT must provision and secure the encryption key. Loss of the key means the database is unrecoverable. This must be clearly documented in the deployment guide.

Operational Notes

  • The database file should be included in the host's backup strategy. Since bbolt supports concurrent readers, file-level snapshots are safe during normal operation (not mid-compaction, but bbolt does not compact in-background — only on explicit db.Update).
  • For key rotation, a CLI command or migration step that re-encrypts all values under a new key should be planned as follow-up work.