fix(acl): enforce token grant lifecycle live at sign time (#24, #25) #27

Merged
padreug merged 6 commits from issue-25-live-grant-lifecycle into dev 2026-06-19 16:05:19 +00:00
Owner

Implements the Option D redesign ratified in #25, closing the token-TTL bug #24 and the materialization-drift family behind it.

The bug (#24)

Backend.applyToken photocopied a token's policy rules into SigningCondition rows at redeem. checkIfPubkeyAllowed matched those rows (step 3) and short-circuited before the live Token join (step 4) — so an expired or revoked token kept signing forever, because the copy carried no lifecycle. The live join existed but was dead code for token-paired users, and even it only filtered revokedAt, never expiresAt.

Root cause: a cache (the materialized SigningCondition set) with no invalidation. The same shape produced siblings — token-revoke (spirekeeper#22) and unenforced usage caps. Upstream Signet re-shipped the identical bug (see the prior-art survey doc), which is what convinced us to fix the family, not the instance.

The fix (Option D)

  • grantIsLive(grant, now) — the single "valid right now?" predicate (revokedAt null ∧ not past expiresAt), extracted to lib/acl/lifecycle.ts and used identically at redeem (Backend.validateToken) and sign (checkIfPubkeyAllowed). The two can no longer drift — that drift was the root of #24.
  • Backend.applyToken records only the KeyUser ← Token binding. No more photocopy. The token's policy is evaluated live, every request.
  • checkIfPubkeyAllowed step 4 filters tokens through liveWhere(now) (revoke + expiry) and grants connect off a live bound token; the manual-override layer (step 3) now honors SigningCondition.expiresAt/revokedAt too, with denials beating grants.

A new lifecycle rule is now one more predicate, never a forgotten photocopy.

Commits

  1. docs(#25) — lnbits/nostr_bunker comparison
  2. docs(#25) — source-verified ACL prior-art survey (Signet/Amber/FROSTR/promenade/NDK) + keep-our-fork decision
  3. feat(schema)Request.keyUserId + SigningCondition lifecycle columns (additive migration 20260619125847_live_grant_lifecycle_schema)
  4. fix(acl) — the live-enforcement gatekeeper rewrite + de-materialization
  5. test(acl) — extract pure grantIsLive/liveWhere + node:test unit tests (npm test → 7 passing)

Verification

  • npm test → 7 passing (boundary conditions: past expiry → dead, expiry == now → dead, revoke beats future expiry, liveWhere pinned in lockstep with grantIsLive).
  • tsc --noEmit → no new errors (3 pre-existing, unrelated, in authorize.ts/web/authorize.ts).
  • npm run build (tsup/esbuild) → clean.

Deferred (follow-ups, called out deliberately)

  • Usage-cap enforcement. PolicyRule.maxUsageCount is still unenforced (as before — no regression). Doing it right needs a durable record of allowed signings to COUNT: today Request rows are ephemeral (only written on manual approval, auto-deleted after 60s) and Log is never written. The schema groundwork (Request.keyUserId + index) is in place. Needs its own issue.
  • DB integration tests for checkIfPubkeyAllowed (expired/revoked token denied end-to-end, override lifecycle, connect-off-live-token). Blocked on adding a real test harness — pnpm add -D vitest fails against the nix-built node_modules hoist pattern, so this PR ships node:test unit coverage of the predicate only. Needs its own issue.

Notes

  • Strict-from-the-start (pre-launch): no migration of historical materialized SigningCondition rows from already-redeemed tokens. A clean dev DB is assumed; leftover rows have no provenance flag to target.
  • Related: #26 (NDK get_public_key bypasses the permit seam).

🤖 Generated with Claude Code

Implements the Option D redesign ratified in #25, closing the token-TTL bug #24 and the materialization-drift family behind it. ## The bug (#24) `Backend.applyToken` photocopied a token's policy rules into `SigningCondition` rows at redeem. `checkIfPubkeyAllowed` matched those rows (step 3) and **short-circuited before the live `Token` join (step 4)** — so an expired or revoked token kept signing forever, because the copy carried no lifecycle. The live join existed but was dead code for token-paired users, and even it only filtered `revokedAt`, never `expiresAt`. Root cause: a cache (the materialized `SigningCondition` set) with no invalidation. The same shape produced siblings — token-revoke (`spirekeeper#22`) and unenforced usage caps. Upstream Signet re-shipped the identical bug (see the prior-art survey doc), which is what convinced us to fix the *family*, not the instance. ## The fix (Option D) - **`grantIsLive(grant, now)`** — the single "valid right now?" predicate (`revokedAt` null ∧ not past `expiresAt`), extracted to `lib/acl/lifecycle.ts` and used **identically at redeem** (`Backend.validateToken`) **and sign** (`checkIfPubkeyAllowed`). The two can no longer drift — that drift was the root of #24. - **`Backend.applyToken` records only the `KeyUser ← Token` binding.** No more photocopy. The token's policy is evaluated live, every request. - **`checkIfPubkeyAllowed` step 4** filters tokens through `liveWhere(now)` (revoke + expiry) and grants `connect` off a live bound token; the manual-override layer (step 3) now honors `SigningCondition.expiresAt/revokedAt` too, with denials beating grants. A new lifecycle rule is now one more predicate, never a forgotten photocopy. ## Commits 1. `docs(#25)` — lnbits/nostr_bunker comparison 2. `docs(#25)` — source-verified ACL prior-art survey (Signet/Amber/FROSTR/promenade/NDK) + keep-our-fork decision 3. `feat(schema)` — `Request.keyUserId` + `SigningCondition` lifecycle columns (additive migration `20260619125847_live_grant_lifecycle_schema`) 4. `fix(acl)` — the live-enforcement gatekeeper rewrite + de-materialization 5. `test(acl)` — extract pure `grantIsLive`/`liveWhere` + `node:test` unit tests (`npm test` → 7 passing) ## Verification - `npm test` → 7 passing (boundary conditions: past expiry → dead, expiry == now → dead, revoke beats future expiry, `liveWhere` pinned in lockstep with `grantIsLive`). - `tsc --noEmit` → no new errors (3 pre-existing, unrelated, in `authorize.ts`/`web/authorize.ts`). - `npm run build` (tsup/esbuild) → clean. ## Deferred (follow-ups, called out deliberately) - **Usage-cap enforcement.** `PolicyRule.maxUsageCount` is still unenforced (as before — no regression). Doing it right needs a durable record of *allowed* signings to `COUNT`: today `Request` rows are ephemeral (only written on manual approval, auto-deleted after 60s) and `Log` is never written. The schema groundwork (`Request.keyUserId` + index) is in place. Needs its own issue. - **DB integration tests** for `checkIfPubkeyAllowed` (expired/revoked token denied end-to-end, override lifecycle, connect-off-live-token). Blocked on adding a real test harness — `pnpm add -D vitest` fails against the nix-built `node_modules` hoist pattern, so this PR ships `node:test` unit coverage of the predicate only. Needs its own issue. ## Notes - Strict-from-the-start (pre-launch): no migration of historical materialized `SigningCondition` rows from already-redeemed tokens. A clean dev DB is assumed; leftover rows have no provenance flag to target. - Related: #26 (NDK `get_public_key` bypasses the permit seam). 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Surveys Signet, Amber, FROSTR, promenade, NDK/rust-nostr/nak against
actual source; records the decision to keep our fork and treat Signet
as a parts donor (NIP-46 wire boundary keeps the signer substitutable).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Additive, non-breaking schema prep for the Option D live-evaluation ACL:

- Request gains keyUserId (FK) + @@index([keyUserId, method]) so token
  usage caps can be derived live by COUNTing allowed Requests, replacing
  the never-enforced mutable PolicyRule.currentUsageCount (derive-don't-count,
  per lnbits/nostr_bunker prior art).
- SigningCondition gains createdAt/expiresAt/revokedAt so the manual-override
  layer carries its own lifecycle and runs through the same grantIsLive(now)
  predicate as token grants (D1: two typed sources, one shared rule).

No behavior change yet; the ACL hot path and applyToken de-materialization
follow in subsequent commits.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The bug (#24): applyToken photocopied a token's policy rules into
SigningCondition rows at redeem; checkIfPubkeyAllowed matched those rows
(step 3) and short-circuited before the live Token join (step 4), so an
expired or revoked token kept signing forever — the copy carried no
lifecycle. Same cause re-shipped by upstream Signet (see docs survey).

Option D fix:
- grantIsLive(grant, now): the single 'valid right now?' predicate
  (revokedAt null AND not past expiresAt), used identically at redeem
  (Backend.validateToken) and sign (checkIfPubkeyAllowed). Redeem and
  sign can no longer disagree.
- Backend.applyToken records ONLY the KeyUser<-Token binding; it no
  longer materializes SigningCondition rows. Token policy is evaluated
  live every request.
- checkIfPubkeyAllowed step 4 filters tokens through liveWhere(now)
  (revoke + expiry) and grants connect off a live bound token; the
  manual-override layer (step 3) now honors SigningCondition
  expiresAt/revokedAt too (denials beat grants).

Closes the materialization-drift family: a new lifecycle rule is one
more predicate, never a forgotten photocopy. Token-revoke sibling
(spirekeeper#22) falls out of the same seam. Usage caps deferred (no
durable signing log exists yet to count) — follow-up.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
test(acl)(#25): extract pure grantIsLive/liveWhere + unit tests
Some checks failed
Docker image / build-and-push-image (push) Has been cancelled
e2cf10a66d
Move the lifecycle predicate into lib/acl/lifecycle.ts (re-exported from
the ACL module) so it can be unit-tested without a database. Adds Node
built-in test-runner coverage for the boundary conditions that define
the fix: past expiry -> dead, expiry == now -> dead (exclusive), revoke
beats a future expiry, and liveWhere kept in lockstep with grantIsLive.

Runner is node:test via ts-node (no new dependency; pnpm add is blocked
by the nix-built node_modules hoist pattern). 'npm test' -> 7 passing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
padreug left a comment

Review — approve with nits

Faithful implementation of the Option D direction from #25. The core fix is correct and well-built: grantIsLive is the single predicate used at both redeem (validateToken) and sign (checkIfPubkeyAllowed), de-materialization is complete, and the live token join now filters on expiry+revoke — which actually closes #24. The findings below are one deploy-safety gate, one testing gap, and two pre-existing things this PR is well-placed to clean up. None block the design; #1 should be resolved before merge.

What's right

  • The keystone landed. grantIsLive/liveWhere extracted to a pure, DB-free lifecycle.ts, used identically at redeem and sign — exactly the "define valid-right-now in one place" discipline #25 demanded. Boundary semantics are consistent (<= now dead in the predicate ↔ gt: now live in the where-fragment), with a test pinning the shape so the two can't silently drift.
  • De-materialization is complete and correct. applyToken records only the binding; the per-rule + connect photocopy loop is gone. Multi-token users are now handled correctly — an expired token stops matching findFirst({keyUserId, ...live}) while a second live token still pairs, which the old flat photocopy couldn't express.
  • Deny-beats-grant preserved and made explicit (liveDeny then liveGrant); the bootstrap path (allowAllRequestsFromKey(..., {kind:'all'})) still matches via kind: {in:[K,'all']}.
  • Honest, well-scoped deferral of usage-caps and DB tests, with real reasons.

Findings

1. (Resolve before merge — deploy safety) Leftover materialized rows silently re-open #24 on any non-clean DB. The migration recreates SigningCondition and copies existing rows forward (INSERT INTO new_SigningCondition SELECT … FROM SigningCondition). Any token redeemed before this PR left allowed=true photocopies with expiresAt=NULL → those are now permanent live manual-grants via step 3b, bypassing the new live token check entirely. So for already-paired clients the bug persists, invisibly. The pre-launch clean-DB assumption is consistent with our strict-from-the-start posture, but two asks:

  • Confirm no instance (incl. the regtest/dev bunker DB) has redeemed tokens — otherwise testing this PR there could look fixed because leftover photocopies grant old pairings while new pairings are clean. That mixed state masks the regression.
  • No provenance flag exists to target the stale rows (correctly noted), so move the DB-wipe/verify step out of the PR prose and into the explicit merge checklist.

2. (Medium — testing) The actual #24 fix is untested; only the pure predicate is. Good boundary coverage on grantIsLive/liveWhere, but the wiring that closes the bug — step 4 applying liveWhere to the Token join so an expired/revoked token is denied at sign time — has no assertion. For a security fix I'd want at least one end-to-end "expired token → checkIfPubkeyAllowed returns falsy". Deferred for a real reason (vitest vs the nix node_modules hoist), but I'd treat it as a high-priority follow-up, not a parked one — the predicate passing tells us little about whether the query is shaped right.

3. (Low — pre-existing, but this is the moment) The explicit-reject sentinel doesn't match its writer. Step 3a queries method: '*', and the docstring attributes it to rejectAllRequestsFromKey — but that function writes a row with no method (null), and appears to have no callers. So "reject all" looks like it's been a no-op. Since the new docstring re-asserts the method='*' invariant, either wire reject-all correctly (write method:'*') or drop the dead branch so comment matches reality.

4. (Low — confirm intent) Expired manual denies now go inert. Applying ...live to the deny queries means a lapsed deny stops denying — correct under D1, but it means a lapsed deny could re-expose access a token grants. In practice rejectAllRequestsFromKey/web-approval denies are written with null expiry, so they stay permanent; only an explicitly-TTL'd deny lapses. Just confirming that's intended (I believe it is).

5. (Nit) Request.keyUserId + index ship ahead of their consumer (usage-caps deferred). Fine as groundwork; just noting it.

Recommendation

Land after resolving #1 (clean-DB gate → verify/wipe, in the merge steps) and deciding #3 (wire or remove reject-all). #2 as a high-priority follow-up issue; #4 a one-line confirm. The design is right, and the inline comments are genuinely excellent — they tie each change back to #24/#25 so the next reader inherits the reasoning.

## Review — approve with nits Faithful implementation of the Option D direction from #25. The core fix is correct and well-built: `grantIsLive` is the single predicate used at **both** redeem (`validateToken`) and sign (`checkIfPubkeyAllowed`), de-materialization is complete, and the live token join now filters on expiry+revoke — which actually closes #24. The findings below are one deploy-safety gate, one testing gap, and two pre-existing things this PR is well-placed to clean up. None block the design; **#1 should be resolved before merge.** ### What's right - **The keystone landed.** `grantIsLive`/`liveWhere` extracted to a pure, DB-free `lifecycle.ts`, used identically at redeem and sign — exactly the "define valid-right-now in one place" discipline #25 demanded. Boundary semantics are consistent (`<= now` dead in the predicate ↔ `gt: now` live in the where-fragment), with a test pinning the shape so the two can't silently drift. - **De-materialization is complete and correct.** `applyToken` records only the binding; the per-rule + `connect` photocopy loop is gone. Multi-token users are now handled correctly — an expired token stops matching `findFirst({keyUserId, ...live})` while a second live token still pairs, which the old flat photocopy couldn't express. - **Deny-beats-grant** preserved and made explicit (`liveDeny` then `liveGrant`); the bootstrap path (`allowAllRequestsFromKey(..., {kind:'all'})`) still matches via `kind: {in:[K,'all']}`. - Honest, well-scoped deferral of usage-caps and DB tests, with real reasons. ### Findings **1. (Resolve before merge — deploy safety) Leftover materialized rows silently re-open #24 on any non-clean DB.** The migration recreates `SigningCondition` and copies existing rows forward (`INSERT INTO new_SigningCondition SELECT … FROM SigningCondition`). Any token redeemed *before* this PR left `allowed=true` photocopies with `expiresAt=NULL` → those are now permanent **live manual-grants** via step 3b, bypassing the new live token check entirely. So for already-paired clients the bug persists, invisibly. The pre-launch clean-DB assumption is consistent with our strict-from-the-start posture, but two asks: - Confirm no instance (incl. the **regtest/dev bunker DB**) has redeemed tokens — otherwise testing this PR there could *look* fixed because leftover photocopies grant old pairings while new pairings are clean. That mixed state masks the regression. - No provenance flag exists to target the stale rows (correctly noted), so move the DB-wipe/verify step out of the PR prose and into the explicit merge checklist. **2. (Medium — testing) The actual #24 fix is untested; only the pure predicate is.** Good boundary coverage on `grantIsLive`/`liveWhere`, but the wiring that *closes the bug* — step 4 applying `liveWhere` to the `Token` join so an expired/revoked token is denied at sign time — has no assertion. For a security fix I'd want at least one end-to-end "expired token → `checkIfPubkeyAllowed` returns falsy". Deferred for a real reason (vitest vs the nix `node_modules` hoist), but I'd treat it as a **high-priority** follow-up, not a parked one — the predicate passing tells us little about whether the query is shaped right. **3. (Low — pre-existing, but this is the moment) The explicit-reject sentinel doesn't match its writer.** Step 3a queries `method: '*'`, and the docstring attributes it to `rejectAllRequestsFromKey` — but that function writes a row with **no `method`** (null), and appears to have no callers. So "reject all" looks like it's been a no-op. Since the new docstring re-asserts the `method='*'` invariant, either wire reject-all correctly (write `method:'*'`) or drop the dead branch so comment matches reality. **4. (Low — confirm intent) Expired manual *denies* now go inert.** Applying `...live` to the deny queries means a lapsed deny stops denying — correct under D1, but it means a lapsed deny could re-expose access a token grants. In practice `rejectAllRequestsFromKey`/web-approval denies are written with null expiry, so they stay permanent; only an explicitly-TTL'd deny lapses. Just confirming that's intended (I believe it is). **5. (Nit)** `Request.keyUserId` + index ship ahead of their consumer (usage-caps deferred). Fine as groundwork; just noting it. ### Recommendation Land after resolving **#1** (clean-DB gate → verify/wipe, in the merge steps) and deciding **#3** (wire or remove reject-all). **#2** as a high-priority follow-up issue; **#4** a one-line confirm. The design is right, and the inline comments are genuinely excellent — they tie each change back to #24/#25 so the next reader inherits the reasoning.
refactor(acl)(#27 review): remove dead reject-all sentinel
Some checks failed
Docker image / build-and-push-image (push) Has been cancelled
7dcf97a296
PR #27 review finding #3: step 3a queried SigningCondition method='*'
and the docstring attributed it to rejectAllRequestsFromKey — but that
function writes method=null (never '*') and has zero callers, so the
'reject all' branch could never match. Subject-level reject is already
KeyUser.revokedAt (step 2, via the revoke_user admin command).

Drop the dead step-3a branch and the orphaned rejectAllRequestsFromKey
so the code matches reality. Per-(method,kind) denies (step 3, written
by add_signing_condition) are unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Author
Owner

Thanks — sharp review. Addressed below; pushed 7dcf97a.

#1 (deploy safety) — investigated + merge checklist added

Verified the local dev bunker DB (prisma/nsecbunker.db): .tables is empty — uninitialized, no Token table, no redeemed tokens, no SigningCondition rows. So there's no leftover-photocopy masking risk locally, and the local clean-DB assumption holds.

I can't reach the regtest / any deployed dev bunker DB from here, so per your ask I'm moving the wipe/verify out of prose into an explicit gate. Merge checklist:

  • For every instance this lands on (regtest bunker volume, any deployed dev bunker): confirm SELECT COUNT(*) FROM Token WHERE redeemedAt IS NOT NULL and SELECT COUNT(*) FROM SigningCondition are both 0.
  • If either is non-zero: there's no provenance flag to target stale photocopies, so wipe SigningCondition (or the whole bunker DB on a pre-launch instance) before deploying — otherwise pre-PR pairings keep signing via permanent expiresAt=NULL grants and mask the fix (clean new pairings look correct while old ones bypass the live check).
  • Re-pair any clients whose grants were wiped.

Strict-from-the-start / pre-launch means a DB wipe is acceptable; the gate just makes sure we don't deploy onto a mixed state that hides the regression.

#3 (dead reject-all) — fixed in 7dcf97a

Confirmed your read exactly: rejectAllRequestsFromKey writes method=null (never '*') and has zero callers; revoke_user does subject-level revoke via KeyUser.revokedAt (step 2). So step 3a could never match. Removed the dead step-3a branch and the orphaned function so the code matches reality. Per-(method,kind) denies (step 3, written by add_signing_condition) are untouched.

#4 (expired denies go inert) — confirmed intended

Yes. Verified in source: add_signing_condition and the web-approval path write conditions with null expiresAt/revokedAt → permanent, never lapse. Only a deliberately TTL'd deny decays — which is the D1 behavior we want (a lapsed grant and a lapsed deny both fall back to the live evaluation). Default denies stay sticky.

#2 (untested wiring) — agreed, bumping priority

You're right that the predicate passing says little about whether the query is shaped right — the actual #24 closure (step 4 applying liveWhere to the Token join) is the untested part. Tracked in #29; I'll treat it as high-priority, not parked. The blocker is real (pnpm add vitest vs the nix node_modules hoist pattern, + .js-specifier resolution under CommonJS), and #29 spells out the harness path and the exact case list (incl. the expired-token regression guard and the "applyToken writes no SigningConditions" assertion).

#5 (groundwork ahead of consumer) — acknowledged

Request.keyUserId + index ship for the deferred usage-cap work (#28); intentional groundwork, no consumer yet.


Net: #1 is now a checklist gate rather than an assumption, #3 is fixed, #4 confirmed. Re-running locally after the change: npm test → 7 passing, tsc → only the 3 pre-existing unrelated errors, npm run build → clean. Ready for re-review / merge at your discretion (I'll leave the merge to you per the Forgejo-UI ritual).

Thanks — sharp review. Addressed below; pushed `7dcf97a`. ### #1 (deploy safety) — investigated + merge checklist added Verified the **local dev bunker DB** (`prisma/nsecbunker.db`): `.tables` is empty — uninitialized, **no `Token` table, no redeemed tokens, no `SigningCondition` rows**. So there's no leftover-photocopy masking risk locally, and the local clean-DB assumption holds. I can't reach the **regtest / any deployed dev bunker** DB from here, so per your ask I'm moving the wipe/verify out of prose into an explicit gate. **Merge checklist:** - [ ] For every instance this lands on (regtest bunker volume, any deployed dev bunker): confirm `SELECT COUNT(*) FROM Token WHERE redeemedAt IS NOT NULL` and `SELECT COUNT(*) FROM SigningCondition` are both `0`. - [ ] If either is non-zero: there's no provenance flag to target stale photocopies, so **wipe `SigningCondition`** (or the whole bunker DB on a pre-launch instance) before deploying — otherwise pre-PR pairings keep signing via permanent `expiresAt=NULL` grants and **mask the fix** (clean new pairings look correct while old ones bypass the live check). - [ ] Re-pair any clients whose grants were wiped. Strict-from-the-start / pre-launch means a DB wipe is acceptable; the gate just makes sure we don't deploy onto a mixed state that hides the regression. ### #3 (dead reject-all) — fixed in `7dcf97a` Confirmed your read exactly: `rejectAllRequestsFromKey` writes `method=null` (never `'*'`) and has **zero callers**; `revoke_user` does subject-level revoke via `KeyUser.revokedAt` (step 2). So step 3a could never match. **Removed** the dead step-3a branch and the orphaned function so the code matches reality. Per-(method,kind) denies (step 3, written by `add_signing_condition`) are untouched. ### #4 (expired denies go inert) — confirmed intended Yes. Verified in source: `add_signing_condition` and the web-approval path write conditions with **null `expiresAt`/`revokedAt`** → permanent, never lapse. Only a deliberately TTL'd deny decays — which is the D1 behavior we want (a lapsed grant *and* a lapsed deny both fall back to the live evaluation). Default denies stay sticky. ### #2 (untested wiring) — agreed, bumping priority You're right that the predicate passing says little about whether the query is shaped right — the actual #24 closure (step 4 applying `liveWhere` to the `Token` join) is the untested part. Tracked in **#29**; I'll treat it as high-priority, not parked. The blocker is real (`pnpm add vitest` vs the nix `node_modules` hoist pattern, + `.js`-specifier resolution under CommonJS), and #29 spells out the harness path and the exact case list (incl. the expired-token regression guard and the "applyToken writes no SigningConditions" assertion). ### #5 (groundwork ahead of consumer) — acknowledged `Request.keyUserId` + index ship for the deferred usage-cap work (**#28**); intentional groundwork, no consumer yet. --- Net: **#1** is now a checklist gate rather than an assumption, **#3** is fixed, **#4** confirmed. Re-running locally after the change: `npm test` → 7 passing, `tsc` → only the 3 pre-existing unrelated errors, `npm run build` → clean. Ready for re-review / merge at your discretion (I'll leave the merge to you per the Forgejo-UI ritual).
Author
Owner

Note: the "clean DB assumed" stance has a shelf life (it ends at launch / #18)

This PR's remediation posture — strict-from-the-start, assume a clean DB, no migration of historical materialized rows — is correct now, but it's a property of being pre-launch, not a durable guarantee. Flagging it explicitly so a future deploy doesn't lean on it after it's expired.

It holds only while both are true:

  1. Everything in the bunker DB is derivable/disposable — keys + admin live in nsecbunker.json (config), not the DB; policies are trivially recreatable; there's no audit trail to retain.
  2. Every client can be re-paired on demand — no unattended field devices, no third-party clients, no zero-downtime requirement.

Confirmed on cfaun (first real instance)

Non-clean (15 redeemed tokens, 390 SigningCondition), but a pure photocopy state: all allowed=1, all on token-paired KeyUsers, one shared Policy (25 rules), User/Request/Log empty, keys + admin in config (keys matches SELECT DISTINCT keyName FROM Token). So DELETE FROM SigningCondition; is equivalent to a full wipe here, and a full wipe + re-pair is also harmless. The review's merge-checklist gate applies per-instance.

Where the stance expires — the #18 endgame

Once LNbits operators and the server identity route signing through this bunker, neither condition survives:

  • clients become live payment infra (ATMs / LNbits instances) that can't be casually re-paired, under a zero-downtime requirement;
  • a durable signing log (#28) becomes must-retain audit data inside the DB — a wipe destroys it outright, and even archiving it first doesn't preserve continuity across a re-pair (audit rows attribute by the autoincrement KeyUser.id; the FK is ON DELETE SET NULL, so deleting+recreating a binding nulls/orphans the history even though the client's npub is unchanged);
  • other hosts may not keep keys/admin in config the way cfaun does — on a deployment that stores key material in the DB, a wipe is irrecoverable.

At that point the only correct path is a provenance-aware data migration (keep manual overrides + bindings, drop only derived photocopies) plus a zero-downtime cutover — not a wipe.

Ask

The provenance-aware migration needs a flag this PR doesn't add. Before launch, either:

  • add a provenance marker on SigningCondition (e.g. source: 'token' | 'manual') so derived rows are targetable by a real migration, or
  • commit to wiping every instance while it's still clean, and treat "no instance survives to launch un-wiped" as an explicit precondition.

Cross-refs: #28 (durable log is what makes this urgent), #29 (integration harness), #18 (the endgame that triggers the expiry).

## Note: the "clean DB assumed" stance has a shelf life (it ends at launch / #18) This PR's remediation posture — *strict-from-the-start, assume a clean DB, no migration of historical materialized rows* — is correct **now**, but it's a property of being pre-launch, not a durable guarantee. Flagging it explicitly so a future deploy doesn't lean on it after it's expired. It holds only while **both** are true: 1. **Everything in the bunker DB is derivable/disposable** — keys + admin live in `nsecbunker.json` (config), not the DB; policies are trivially recreatable; there's no audit trail to retain. 2. **Every client can be re-paired on demand** — no unattended field devices, no third-party clients, no zero-downtime requirement. ### Confirmed on cfaun (first real instance) Non-clean (15 redeemed tokens, 390 `SigningCondition`), but a **pure photocopy** state: all `allowed=1`, all on token-paired KeyUsers, one shared `Policy` (25 rules), `User`/`Request`/`Log` empty, keys + admin in config (`keys` matches `SELECT DISTINCT keyName FROM Token`). So `DELETE FROM SigningCondition;` is equivalent to a full wipe here, and a full wipe + re-pair is also harmless. The review's merge-checklist gate applies per-instance. ### Where the stance expires — the #18 endgame Once LNbits operators *and* the server identity route signing through this bunker, neither condition survives: - clients become live payment infra (ATMs / LNbits instances) that can't be casually re-paired, under a zero-downtime requirement; - a durable signing log (#28) becomes must-retain audit data **inside the DB** — a wipe destroys it outright, and even archiving it first doesn't preserve continuity across a re-pair (audit rows attribute by the autoincrement `KeyUser.id`; the FK is `ON DELETE SET NULL`, so deleting+recreating a binding nulls/orphans the history even though the client's npub is unchanged); - other hosts may not keep keys/admin in config the way cfaun does — on a deployment that stores key material in the DB, a wipe is irrecoverable. At that point the only correct path is a **provenance-aware data migration** (keep manual overrides + bindings, drop only derived photocopies) plus a **zero-downtime cutover** — not a wipe. ### Ask The provenance-aware migration needs a flag this PR doesn't add. Before launch, either: - add a provenance marker on `SigningCondition` (e.g. `source: 'token' | 'manual'`) so derived rows are targetable by a real migration, **or** - commit to wiping every instance while it's still clean, and treat "no instance survives to launch un-wiped" as an explicit precondition. Cross-refs: #28 (durable log is what makes this urgent), #29 (integration harness), #18 (the endgame that triggers the expiry).
padreug deleted branch issue-25-live-grant-lifecycle 2026-06-19 16:05:19 +00:00
Author
Owner

Correction: on LNbits-connected instances, the remediation is targeted delete only — never a full wipe

The review checklist and my shelf-life note both offered "wipe SigningCondition or the whole bunker DB" as equivalent remediations. On any instance where LNbits is the bunker client, they are not equivalent — a full wipe breaks signing and forces identity-changing re-provisioning. Learned the hard way deploying this to cfaun today; details so the next deploy doesn't repeat it.

Why a full wipe is wrong here

The nsecbunkerd↔LNbits pairing is split across both systems:

  • Bunker (nsecbunker.db): the KeyUser binding (keyed by LNbits's stable client pubkey) + the redeemed Token.
  • LNbits (accounts.signer_config, RemoteBunkerSigner): { token, client_nsec, policy_id }.

At runtime, RemoteBunkerSigner.sign_event() sends sign_event directly with the stored client_nsec — no connect/re-redeem — relying on the bunker binding already existing. provision() runs only at new-account creation; there's no auto-repair, and restarting LNbits does not re-pair. So a full bunker-DB wipe deletes the KeyUser bindings → LNbits's stored configs dangle → every account fails to sign. Re-provisioning isn't a clean fix either: the standard provision() calls create_new_key → a new npub, changing the user's identity.

The correct remediation

-- LNbits-connected bunker: strip ONLY the #24 photocopies.
-- Keeps KeyUser + Token + Policy, so live-token clients keep signing untouched
-- (post-#27 the token grant is evaluated live at ACL step 4).
DELETE FROM SigningCondition;

This is safe precisely when the rows are pure token photocopies (all allowed=1, all on token-paired KeyUsers, no manual overrides) — which is exactly what cfaun had (390 rows = 15 clients × (1 connect + 25 policy-rule grants), 0 denies, 0 non-token rows).

cfaun worked example (2026-06-19)

  1. Deployed #27. Migration applied at deploy time (_prisma_migrations had 20260619125847_live_grant_lifecycle_schema); the daemon's own npm run prisma:migrate ENOENTs harmlessly (#31). Schema correct, data intact: KeyUser=15, Token=15, SigningCondition=390, Policy=1.
  2. Mistake: ran a full data wipe (all tables). Bunker came up clean — but this orphaned all 15 LNbits bindings. A restart would not have re-paired them.
  3. Recovery: restored the pre-wipe nsecbunker.db from backup, then ran the targeted DELETE FROM SigningCondition; instead. Result: KeyUser=15, Token=15, SigningCondition=0, Policy=1, all 15 tokens live (null expiresAt/revokedAt). LNbits signs again with zero re-provisioning, and #24 is closed (grants now come from the live token join, SigningCondition stays 0 after a fresh pairing).

Checklist correction

  • For LNbits-connected instances: only DELETE FROM SigningCondition; (never a full DB wipe). Take a DB backup first so a mis-step is recoverable.
  • A full wipe + re-pair is acceptable only on a throwaway/no-client instance.
  • Always back up nsecbunker.db before either; keys live in nsecbunker.json config and are never at risk from a DB operation.

(Reinforces the shelf-life point above: once LNbits is a client, "re-pairing is cheap" no longer holds — the pairing state is split and a wipe is destructive. The provenance-flag ask stands.)

## Correction: on LNbits-connected instances, the remediation is **targeted delete only — never a full wipe** The review checklist and my shelf-life note both offered "wipe `SigningCondition` **or** the whole bunker DB" as equivalent remediations. On any instance where **LNbits** is the bunker client, they are **not** equivalent — a full wipe breaks signing and forces identity-changing re-provisioning. Learned the hard way deploying this to **cfaun** today; details so the next deploy doesn't repeat it. ### Why a full wipe is wrong here The nsecbunkerd↔LNbits pairing is **split across both systems**: - **Bunker** (`nsecbunker.db`): the `KeyUser` binding (keyed by LNbits's stable client pubkey) + the redeemed `Token`. - **LNbits** (`accounts.signer_config`, `RemoteBunkerSigner`): `{ token, client_nsec, policy_id }`. At runtime, `RemoteBunkerSigner.sign_event()` sends `sign_event` **directly with the stored `client_nsec` — no `connect`/re-redeem** — relying on the bunker binding already existing. `provision()` runs **only at new-account creation**; there's no auto-repair, and **restarting LNbits does not re-pair**. So a full bunker-DB wipe deletes the `KeyUser` bindings → LNbits's stored configs dangle → every account fails to sign. Re-provisioning isn't a clean fix either: the standard `provision()` calls `create_new_key` → a **new npub**, changing the user's identity. ### The correct remediation ```sql -- LNbits-connected bunker: strip ONLY the #24 photocopies. -- Keeps KeyUser + Token + Policy, so live-token clients keep signing untouched -- (post-#27 the token grant is evaluated live at ACL step 4). DELETE FROM SigningCondition; ``` This is safe precisely when the rows are pure token photocopies (all `allowed=1`, all on token-paired KeyUsers, no manual overrides) — which is exactly what cfaun had (390 rows = 15 clients × (1 connect + 25 policy-rule grants), 0 denies, 0 non-token rows). ### cfaun worked example (2026-06-19) 1. Deployed #27. Migration applied at deploy time (`_prisma_migrations` had `20260619125847_live_grant_lifecycle_schema`); the daemon's own `npm run prisma:migrate` ENOENTs harmlessly (#31). Schema correct, data intact: `KeyUser=15, Token=15, SigningCondition=390, Policy=1`. 2. **Mistake:** ran a full data wipe (all tables). Bunker came up clean — but this orphaned all 15 LNbits bindings. A restart would *not* have re-paired them. 3. **Recovery:** restored the pre-wipe `nsecbunker.db` from backup, then ran the targeted `DELETE FROM SigningCondition;` instead. Result: `KeyUser=15, Token=15, SigningCondition=0, Policy=1`, all 15 tokens live (null `expiresAt`/`revokedAt`). LNbits signs again with **zero re-provisioning**, and #24 is closed (grants now come from the live token join, `SigningCondition` stays 0 after a fresh pairing). ### Checklist correction - For **LNbits-connected** instances: **only** `DELETE FROM SigningCondition;` (never a full DB wipe). Take a DB backup first so a mis-step is recoverable. - A full wipe + re-pair is acceptable **only** on a throwaway/no-client instance. - Always back up `nsecbunker.db` before either; keys live in `nsecbunker.json` config and are never at risk from a DB operation. (Reinforces the shelf-life point above: once LNbits is a client, "re-pairing is cheap" no longer holds — the pairing state is split and a wipe is destructive. The provenance-flag ask stands.)
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
aiolabs/nsecbunkerd!27
No description provided.