aiolabs/nsecbunkerd

Fork 0

Design discussion / RFC: enforce token + grant lifecycle at sign time (the root behind #24) #25

New issue

Closed

opened 2026-06-19 06:49:30 +00:00 by padreug · 4 comments

padreug commented

2026-06-19 06:49:30 +00:00

Owner

Why this is a discussion, not a patch

#24 (token expiresAt ignored after connect) looks like a one-line fix. It isn't. It's one symptom of a structural decision, and that decision has at least three symptoms already shipped — so before patching #24 in isolation, let's agree on the foundation, because the choice we make here decides whether this class of bug can recur.

This is pre-release / pre-public-launch code. We have time, no stored-data migration burden, and the freedom to make the strict-from-the-start choice now. That's exactly when this kind of decision is cheap to get right and expensive to defer.

📎 Onboarding explainer attached (token-ttl-acl-explainer.pdf, below): a diagram-heavy, jargon-free walkthrough of this bug and the C-vs-D decision — no Nostr or codebase background assumed. It establishes the shared vocabulary (materialization, cache-without-invalidation, Options C/D, D1/D2) the rest of this thread builds on. The engineering detail is in the comments below. (A companion onboarding doc covering the prior-art survey and how we chose a direction is attached on the survey comment further down.)

The pattern (one cause, three symptoms)

At connect time, applyToken (src/daemon/backend/index.ts:99) materializes a token's policy rules into per-KeyUser SigningCondition rows. At sign time, checkIfPubkeyAllowed (src/daemon/lib/acl/index.ts:23) reads those materialized rows first (step 3b) and short-circuits — so anything that lives on the Token/Policy but isn't copied onto the SigningCondition is invisible to signing:

Lifecycle rule	Source of truth	Enforced at sign time?
Token expiry (TTL)	`Token.expiresAt`	No — this issue (#24)
Token revoke	`Token.revokedAt`	No — sibling, `aiolabs/spirekeeper#22`
Usage caps	`PolicyRule.maxUsageCount`	No — written/displayed, never decremented or checked

SigningCondition has no expiresAt/revokedAt/usage column at all (prisma/schema.prisma:59). Compounding it: "is this token valid right now?" is defined in two places — validateToken (redeem, checks expiry) and ACL step 4 (sign, doesn't) — and they have already drifted. That drift is the bug.

Root cause, stated once: the materialized SigningCondition set is a cache of a derivation, and it has no invalidation. Every lifecycle feature anyone adds will arrive broken for the same reason.

The decision

Two principled ways to make a cached system correct:

Option C — keep the cache, invalidate it. Reaper job for expiry + a revoke hook + a usage hook + a new hook for every future lifecycle rule. Smallest diff today; reuses the partly-working revoke path. But: correctness depends on never forgetting a hook, the reaper leaves a timer-window where an expired token still signs, and tearing down a user's copies is coarse (can knock out other still-valid grants). This is the design that generated the bug family.
Option D — don't cache; decide live. applyToken records only the binding (this pubkey is paired via this token) and nothing else. Every sign request computes the answer fresh from the real Token/Policy, checking revoke and expiry and usage in one indexed join. The whole bug family becomes impossible by construction — a new lifecycle rule is just one more predicate, never a forgotten photocopy. The usual "live is slow" objection doesn't apply: it's one indexed join on a path already spending ~5–10 ms on NIP-44 + Schnorr per round-trip.

Recommendation: Option D, with one non-negotiable discipline that prevents the original drift from returning:

Define "is this token/grant valid right now?" in exactly one place, and use that same predicate at both redeem time and sign time. The bug existed because those two disagreed; make them one definition and they cannot.

A clean result that falls out of D

The model also forces a useful distinction we should adopt in the vocabulary:

Revoke = subject-level ban (KeyUser.revokedAt): sticky, beats everything; un-banning is a deliberate admin action; re-pairing a banned subject should fail loudly (today applyToken's update:{} would silently create a dead binding).
Expiry = grant-level lapse (Token.expiresAt): not a ban; re-pairing with a fresh token simply adds a new live grant alongside. Naturally correct because expiry lives on the grant, never on the subject.

This directly answers the re-pair wrinkle raised on #24 / aiolabs/spirekeeper#22.

The open question for the team

If we take D, one genuinely hard-to-reverse modelling choice remains — this is the part I want input on:

D1 — Two typed sources, one shared predicate. Keep policy-derived grants (Token+Policy) and manual admin overrides (add_signing_condition, the web-approval allowAllRequestsFromKey path, create_account bootstrap) as distinct kinds, but give the override layer a real lifecycle (createdAt/expiresAt/revokedAt) and run both through the one shared grantIsLive(now) predicate. Lower ceremony on the interactive path; clean audit trail. (My lean.)
D2 — Everything is a Token+Policy. Even an interactive "allow this" mints a synthetic single-rule, short-lived policy, so there's exactly one evaluation path. Maximally uniform, at the cost of ceremony on the manual path and a lot of trivial one-rule policies cluttering the data.

The robustness win (killing the bug family) comes from D itself, not from this sub-choice — so D1 vs D2 is a taste call about uniformity vs. simplicity. Both are defensible; I'd like a second opinion before committing to a schema.

Scope note

Strict-from-the-start: no back-compat shims, no "absent → pass" graceful paths. If we adopt D, the same change also fixes the revoke no-op (aiolabs/spirekeeper#22) and unlocks live usage-cap enforcement — three findings closed at one seam.

Cross-refs

#24 — the TTL finding (sign-time ACL ignores expiresAt).
#11 — the ACL algorithm this builds on.
aiolabs/spirekeeper#22 — sibling finding: token-revoke is a post-redeem no-op for the same materialization reason.

Please weigh in on Option C vs D, and (if D) on D1 vs D2.

## Why this is a discussion, not a patch #24 (token `expiresAt` ignored after connect) looks like a one-line fix. It isn't. It's one symptom of a structural decision, and that decision has at least **three** symptoms already shipped — so before patching #24 in isolation, let's agree on the *foundation*, because the choice we make here decides whether this class of bug can recur. This is pre-release / pre-public-launch code. We have time, no stored-data migration burden, and the freedom to make the strict-from-the-start choice now. That's exactly when this kind of decision is cheap to get right and expensive to defer. > 📎 **Onboarding explainer attached** (`token-ttl-acl-explainer.pdf`, below): a diagram-heavy, jargon-free walkthrough of this bug and the C-vs-D decision — no Nostr or codebase background assumed. It establishes the shared vocabulary (materialization, cache-without-invalidation, Options C/D, D1/D2) the rest of this thread builds on. The engineering detail is in the comments below. *(A companion onboarding doc covering the prior-art survey and how we chose a direction is attached on the survey comment further down.)* ## The pattern (one cause, three symptoms) At connect time, `applyToken` (`src/daemon/backend/index.ts:99`) **materializes** a token's policy rules into per-`KeyUser` `SigningCondition` rows. At sign time, `checkIfPubkeyAllowed` (`src/daemon/lib/acl/index.ts:23`) reads those materialized rows first (step 3b) and **short-circuits** — so anything that lives on the `Token`/`Policy` but isn't copied onto the `SigningCondition` is invisible to signing: | Lifecycle rule | Source of truth | Enforced at sign time? | |---|---|---| | Token **expiry** (TTL) | `Token.expiresAt` | **No** — this issue (#24) | | Token **revoke** | `Token.revokedAt` | **No** — sibling, `aiolabs/spirekeeper#22` | | **Usage caps** | `PolicyRule.maxUsageCount` | **No** — written/displayed, never decremented or checked | `SigningCondition` has no `expiresAt`/`revokedAt`/usage column at all (`prisma/schema.prisma:59`). Compounding it: "is this token valid right now?" is defined in **two places** — `validateToken` (redeem, *checks* expiry) and ACL step 4 (sign, *doesn't*) — and they have **already drifted**. That drift is the bug. **Root cause, stated once:** the materialized `SigningCondition` set is a *cache of a derivation*, and it has no invalidation. Every lifecycle feature anyone adds will arrive broken for the same reason. ## The decision Two principled ways to make a cached system correct: - **Option C — keep the cache, invalidate it.** Reaper job for expiry + a revoke hook + a usage hook + a new hook for every future lifecycle rule. Smallest diff today; reuses the partly-working revoke path. **But:** correctness depends on never forgetting a hook, the reaper leaves a timer-window where an expired token still signs, and tearing down a user's copies is coarse (can knock out other still-valid grants). This is the design that *generated* the bug family. - **Option D — don't cache; decide live.** `applyToken` records only the *binding* (this pubkey is paired via this token) and nothing else. Every sign request computes the answer fresh from the real `Token`/`Policy`, checking revoke **and** expiry **and** usage in one indexed join. **The whole bug family becomes impossible by construction** — a new lifecycle rule is just one more predicate, never a forgotten photocopy. The usual "live is slow" objection doesn't apply: it's one indexed join on a path already spending ~5–10 ms on NIP-44 + Schnorr per round-trip. **Recommendation: Option D**, with one non-negotiable discipline that prevents the original drift from returning: > Define "is this token/grant valid right now?" in **exactly one place**, and use that same predicate at *both* redeem time and sign time. The bug existed because those two disagreed; make them one definition and they *cannot*. ### A clean result that falls out of D The model also forces a useful distinction we should adopt in the vocabulary: - **Revoke = subject-level ban** (`KeyUser.revokedAt`): sticky, beats everything; un-banning is a deliberate admin action; re-pairing a banned subject should **fail loudly** (today `applyToken`'s `update:{}` would silently create a dead binding). - **Expiry = grant-level lapse** (`Token.expiresAt`): not a ban; re-pairing with a fresh token simply adds a new live grant alongside. Naturally correct because expiry lives on the grant, never on the subject. This directly answers the re-pair wrinkle raised on #24 / `aiolabs/spirekeeper#22`. ## The open question for the team If we take D, one genuinely hard-to-reverse modelling choice remains — **this is the part I want input on:** - **D1 — Two typed sources, one shared predicate.** Keep policy-derived grants (`Token`+`Policy`) and manual admin overrides (`add_signing_condition`, the web-approval `allowAllRequestsFromKey` path, `create_account` bootstrap) as distinct kinds, but give the override layer a real lifecycle (`createdAt`/`expiresAt`/`revokedAt`) and run **both** through the one shared `grantIsLive(now)` predicate. Lower ceremony on the interactive path; clean audit trail. *(My lean.)* - **D2 — Everything is a Token+Policy.** Even an interactive "allow this" mints a synthetic single-rule, short-lived policy, so there's exactly one evaluation path. Maximally uniform, at the cost of ceremony on the manual path and a lot of trivial one-rule policies cluttering the data. The robustness win (killing the bug family) comes from **D itself**, not from this sub-choice — so D1 vs D2 is a taste call about uniformity vs. simplicity. Both are defensible; I'd like a second opinion before committing to a schema. ## Scope note Strict-from-the-start: no back-compat shims, no "absent → pass" graceful paths. If we adopt D, the same change also fixes the revoke no-op (`aiolabs/spirekeeper#22`) and unlocks live usage-cap enforcement — three findings closed at one seam. ## Cross-refs - #24 — the TTL finding (sign-time ACL ignores `expiresAt`). - #11 — the ACL algorithm this builds on. - `aiolabs/spirekeeper#22` — sibling finding: token-revoke is a post-redeem no-op for the same materialization reason. **Please weigh in on Option C vs D, and (if D) on D1 vs D2.**

token-ttl-acl-explainer.pdf

740 KiB

padreug commented

2026-06-19 06:50:45 +00:00

Author

Owner

Prior art: `lnbits/nostr_bunker` is a shipping Option D

I went and read upstream lnbits/nostr_bunker (services.py/models.py/crud.py, verified against main on 2026-06-19) to see whether anyone in the NIP-46 space has already made this exact decision. They have — and they landed on D, which I think strengthens the recommendation above with an empirical existence-proof.

Their design is the degenerate-but-instructive case: the grant is the bunker:// URL record (UrlData). There's no materialization step — every signing request re-reads the live grant. Concretely, all three of our broken lifecycle rules are enforced live per request:

Lifecycle rule	nsecbunkerd (today)	`lnbits/nostr_bunker`
Expiry	ignored at sign time (#24)	`_assert_url_is_active()` checks `expires_at` every request
Revoke	post-redeem no-op (spirekeeper#22)	no photocopy to outlive the original — disable the row, next request sees it
Usage cap	written, never checked	`_assert_post_rate_limit()` enforced live

So a real, shipping NIP-46 bunker enforces the exact trio we drop, with zero invalidation machinery. That's the direct answer to the "won't deciding live be too fiddly / too slow?" objection against D — somebody already runs it in production on the same per-request crypto path.

Two mechanisms worth stealing regardless of D1/D2

1. Usage caps by counting the source of truth, not maintaining a counter. This is the one I'd actually change in our schema. Our PolicyRule.maxUsageCount/currentUsageCount is a mutable counter — a second cache you have to remember to decrement, which is its own drift hazard (and is partly why the usage sibling is broken). Upstream instead counts signing-request rows in the trailing window (get_signing_requests_since(24h)) — nothing to increment, nothing to invalidate, the count is derived from records we already write. If we go D, I'd drop currentUsageCount entirely and count Request rows the same way. This is the same "source of truth, don't re-derive a copy" principle the RFC argues, applied to the usage rule.

2. _assert_url_is_active() as a single named predicate is literally the "define 'valid right now' in exactly one place" discipline from the RFC, already factored out. Concrete template for our grantIsLive(now).

What it does NOT settle — and why it nudges D1, not D2

Upstream dodges the entire family by having no indirection to drift: no token redeem/handoff, no manual admin-grant path, one grant type, one tenant per wallet, flat permission strings on the grant. It's effectively the limit case of D2 ("one evaluation path") — but only because it deleted the manual path, not because it unified it.

That's not evidence for D2 in our context; it's evidence that uniformity is free when you have one grant kind, which we don't (per-device redeemable tokens and interactive admin overrides are both real requirements). What transfers is the evaluation strategy (live read, single predicate, derive-don't-count), not the schema. If anything it reinforces my lean toward D1: keep token-derived and manual grants as distinct typed sources, give the override layer a real lifecycle, and run both through one shared grantIsLive(now) — you get upstream's robustness without amputating the manual path the way they did.

tl;dr — upstream validates D outright and hands us the counting-not-counter fix for the usage sibling for free; it stays neutral-to-favorable on D1.

## Prior art: `lnbits/nostr_bunker` is a shipping Option D I went and read upstream [`lnbits/nostr_bunker`](https://github.com/lnbits/nostr_bunker) (`services.py`/`models.py`/`crud.py`, verified against `main` on 2026-06-19) to see whether anyone in the NIP-46 space has already made this exact decision. They have — and they landed on **D**, which I think strengthens the recommendation above with an empirical existence-proof. Their design is the degenerate-but-instructive case: **the grant _is_ the `bunker://` URL record** (`UrlData`). There's no materialization step — every signing request re-reads the live grant. Concretely, all three of our broken lifecycle rules are enforced live per request: | Lifecycle rule | nsecbunkerd (today) | `lnbits/nostr_bunker` | |---|---|---| | Expiry | ignored at sign time (#24) | `_assert_url_is_active()` checks `expires_at` **every request** | | Revoke | post-redeem no-op (spirekeeper#22) | no photocopy to outlive the original — disable the row, next request sees it | | Usage cap | written, never checked | `_assert_post_rate_limit()` enforced live | So a real, shipping NIP-46 bunker enforces the exact trio we drop, with **zero invalidation machinery**. That's the direct answer to the "won't deciding live be too fiddly / too slow?" objection against D — somebody already runs it in production on the same per-request crypto path. ### Two mechanisms worth stealing regardless of D1/D2 **1. Usage caps by _counting the source of truth_, not maintaining a counter.** This is the one I'd actually change in our schema. Our `PolicyRule.maxUsageCount`/`currentUsageCount` is a *mutable counter* — a second cache you have to remember to decrement, which is its own drift hazard (and is partly why the usage sibling is broken). Upstream instead counts signing-request rows in the trailing window (`get_signing_requests_since(24h)`) — nothing to increment, nothing to invalidate, the count is *derived* from records we already write. If we go D, I'd **drop `currentUsageCount` entirely** and count `Request` rows the same way. This is the same "source of truth, don't re-derive a copy" principle the RFC argues, applied to the usage rule. **2. `_assert_url_is_active()` as a single named predicate** is literally the "define 'valid right now' in exactly one place" discipline from the RFC, already factored out. Concrete template for our `grantIsLive(now)`. ### What it does NOT settle — and why it nudges D1, not D2 Upstream dodges the *entire* family by having **no indirection to drift**: no token redeem/handoff, no manual admin-grant path, one grant type, one tenant per wallet, flat permission strings on the grant. It's effectively the limit case of **D2** ("one evaluation path") — but only because it *deleted* the manual path, not because it unified it. That's not evidence for D2 in our context; it's evidence that uniformity is free *when you have one grant kind*, which we don't (per-device redeemable tokens **and** interactive admin overrides are both real requirements). What transfers is the **evaluation strategy** (live read, single predicate, derive-don't-count), not the schema. If anything it reinforces my lean toward **D1**: keep token-derived and manual grants as distinct typed sources, give the override layer a real lifecycle, and run both through one shared `grantIsLive(now)` — you get upstream's robustness without amputating the manual path the way they did. tl;dr — upstream validates **D** outright and hands us the counting-not-counter fix for the usage sibling for free; it stays neutral-to-favorable on **D1**.

COMPARISON-lnbits-nostr_bunker.md

6.3 KiB

padreug commented

2026-06-19 08:41:17 +00:00

Author

Owner

Prior art #2: `Letdown2491/signet` — a re-architecture of our own codebase, and a cautionary one

Surveyed the OSS NIP-46 field for daemons with a real policy model. The standout is Signet (TS daemon + React UI + Android companion, MIT, very active — v1.11.0, 2026-06): an extensive fork of the same kind-0/nsecbunkerd codebase we maintain, re-architected around exactly this ACL/lifecycle problem. I read acl.ts, nip46-backend.ts, and schema.prisma directly. The conclusion is more useful than "copy it" — Signet independently shipped our #24 bug, which is strong corroboration of the root-cause framing above.

What it solved, what it didn't (verified against source)

applyToken (nip46-backend.ts:807) is structurally identical to ours:

checks Token.expiresAt once, at redeem (nip46-backend.ts:895),
materializes policy.rules into SigningCondition rows (:845-862) — carrying method/kind/allowed, no expiry, no usage (their SigningCondition is byte-identical to ours),
sign-time (acl.ts:checkRequestPermission) never reads Token again.

Grep confirms the blast radius: maxUsageCount/currentUsageCount are touched only in the policy CRUD route — never decremented, never checked on the hot path. So Signet ships the exact #24 (token TTL ignored after connect) and dead usage-caps. A second team fell into the same materialization trap → independent confirmation this is structural, not a one-off oversight.

What Signet does add over us: a coarse-cache layer for subject-level state on KeyUser — revokedAt, suspendedAt/suspendUntil, trustLevel — read live every request, invalidated on change (invalidateAclCache). That genuinely fixes live revoke (our sibling spirekeeper#22). Notably it puts revoke on KeyUser, not Token — corroborating the revoke=subject-level / expiry=grant-level split proposed above.

Why this sharpens the C-vs-D call

The fix cleaves exactly along the revoke/expiry line:

Subject-level (revoke/suspend/trust — one KeyUser row): Signet's coarse-cache-with-invalidation is the right, cheap tool. This is "Option C done carefully," and it works because the cached state is a single row whose every mutation has an invalidation hook.
Grant-level (token expiry + usage, living on Token/PolicyRule): caching a materialized photocopy cannot work — Signet is the proof. This half needs the Option D live join.

So Signet is not a drop-in: it solves the half we'd half-solved and leaves #24 proper open. The synthesis I'd propose for D (leaning D1):

Adopt Signet's KeyUser subject-state: + suspendedAt, + suspendUntil, optional + trustLevel, with @@index([revokedAt])/@@index([suspendedAt]). Coarse-cache it + invalidate on change. (We already have KeyUser.revokedAt.)
Adopt Signet's Request indexing — + keyUserId FK, @@index([allowed, createdAt]), @@index([keyUserId]) — to enable usage = COUNT(Request) in window (the lnbits/nostr_bunker derive-don't-count pattern). Drop PolicyRule.currentUsageCount — the mutable counter is itself a drift-prone cache.
Reject Signet's applyToken materialization. applyToken records only the Token.keyUserId binding; sign-time joins Token → Policy → PolicyRule live and runs everything through one grantIsLive(now) predicate (Token.expiresAt ∧ Token.revokedAt ∧ subject state ∧ usage). This is the line Signet kept and we should delete.
D1 in schema form: Signet already separates one-time ConnectionToken (handshake — validates, never auto-approves) from durable policy-backed Token. That's our two-typed-sources model; the manual-override SigningCondition layer then needs its own lifecycle (+ createdAt/expiresAt/revokedAt) so both sources run through the same predicate.

Full schema diff + the rest of the survey (promenade/FROST can't do NIP-04/44 decrypt — relevant to the #18 server-decrypt need; FROSTR's 3-layer revocation model; Amber's per-(app×method×kind×relay) grants; NDK's Nip46PermitCallback seam we sit behind) captured offline — happy to drop the schema-diff doc here if useful.

tl;dr: Signet confirms (a) revoke belongs on the subject and must be live — adopt their cache; (b) grant-level TTL/usage cannot be materialized — they proved it by re-shipping #24. That's the strongest case yet for D, and their schema is a ready-made reference for D1 minus the one applyToken line we must not copy.

## Prior art #2: `Letdown2491/signet` — a re-architecture of *our own* codebase, and a cautionary one Surveyed the OSS NIP-46 field for daemons with a real policy model. The standout is **[Signet](https://github.com/Letdown2491/signet)** (TS daemon + React UI + Android companion, MIT, very active — v1.11.0, 2026-06): an extensive **fork of the same kind-0/nsecbunkerd codebase we maintain**, re-architected around exactly this ACL/lifecycle problem. I read `acl.ts`, `nip46-backend.ts`, and `schema.prisma` directly. The conclusion is more useful than "copy it" — **Signet independently shipped our #24 bug**, which is strong corroboration of the root-cause framing above. ### What it solved, what it didn't (verified against source) `applyToken` (`nip46-backend.ts:807`) is structurally identical to ours: 1. checks `Token.expiresAt` **once, at redeem** (`nip46-backend.ts:895`), 2. **materializes** `policy.rules` into `SigningCondition` rows (`:845-862`) — carrying `method`/`kind`/`allowed`, **no expiry, no usage** (their `SigningCondition` is byte-identical to ours), 3. sign-time (`acl.ts:checkRequestPermission`) **never reads `Token` again**. Grep confirms the blast radius: `maxUsageCount`/`currentUsageCount` are touched **only in the policy CRUD route** — never decremented, never checked on the hot path. So Signet ships **the exact #24** (token TTL ignored after connect) **and** dead usage-caps. A second team fell into the same materialization trap → independent confirmation this is structural, not a one-off oversight. **What Signet *does* add over us:** a coarse-cache layer for **subject-level** state on `KeyUser` — `revokedAt`, `suspendedAt`/`suspendUntil`, `trustLevel` — read live every request, invalidated on change (`invalidateAclCache`). That genuinely fixes live **revoke** (our sibling `spirekeeper#22`). Notably it puts revoke on **`KeyUser`, not `Token`** — corroborating the revoke=subject-level / expiry=grant-level split proposed above. ### Why this sharpens the C-vs-D call The fix cleaves exactly along the revoke/expiry line: - **Subject-level** (revoke/suspend/trust — one `KeyUser` row): Signet's coarse-cache-with-invalidation is the right, cheap tool. This is "Option C done carefully," and it works *because the cached state is a single row whose every mutation has an invalidation hook*. - **Grant-level** (token expiry + usage, living on `Token`/`PolicyRule`): caching a materialized photocopy **cannot** work — Signet is the proof. This half needs the **Option D** live join. So Signet is not a drop-in: it solves the half we'd half-solved and **leaves #24 proper open**. The synthesis I'd propose for D (leaning D1): 1. **Adopt** Signet's `KeyUser` subject-state: `+ suspendedAt`, `+ suspendUntil`, optional `+ trustLevel`, with `@@index([revokedAt])`/`@@index([suspendedAt])`. Coarse-cache it + invalidate on change. (We already have `KeyUser.revokedAt`.) 2. **Adopt** Signet's `Request` indexing — `+ keyUserId` FK, `@@index([allowed, createdAt])`, `@@index([keyUserId])` — to enable **usage = `COUNT(Request)` in window** (the `lnbits/nostr_bunker` derive-don't-count pattern). **Drop `PolicyRule.currentUsageCount`** — the mutable counter is itself a drift-prone cache. 3. **Reject** Signet's `applyToken` materialization. `applyToken` records only the `Token.keyUserId` binding; sign-time joins `Token → Policy → PolicyRule` live and runs everything through one `grantIsLive(now)` predicate (`Token.expiresAt` ∧ `Token.revokedAt` ∧ subject state ∧ usage). This is the line Signet kept and we should delete. 4. **D1 in schema form:** Signet already separates one-time `ConnectionToken` (handshake — validates, never auto-approves) from durable policy-backed `Token`. That's our two-typed-sources model; the manual-override `SigningCondition` layer then needs its own lifecycle (`+ createdAt/expiresAt/revokedAt`) so both sources run through the same predicate. Full schema diff + the rest of the survey (promenade/FROST can't do NIP-04/44 decrypt — relevant to the #18 server-decrypt need; FROSTR's 3-layer revocation model; Amber's per-(app×method×kind×relay) grants; NDK's `Nip46PermitCallback` seam we sit behind) captured offline — happy to drop the schema-diff doc here if useful. **tl;dr:** Signet confirms (a) revoke belongs on the subject and must be live — adopt their cache; (b) grant-level TTL/usage **cannot** be materialized — they proved it by re-shipping #24. That's the strongest case yet for **D**, and their schema is a ready-made reference for **D1** minus the one `applyToken` line we must not copy.

padreug referenced this issue

2026-06-19 12:41:41 +00:00

NDK NIP-46 backend: get_public_key bypasses the permit callback — pubkey disclosure is ungated/unauditable through our ACL seam #26

padreug commented

2026-06-19 12:42:22 +00:00

Author

Owner

Prior-art survey, source-verified — the complete picture

Read the actual source of every other NIP-46 signer worth learning from (clones at the commits cited; an initial automated pass overstated several of these, so each claim below is checked against code). Full writeup with all file:line citations lands in docs/acl-prior-art-survey.md. Net: nothing unseats Option D, leaning D1 — and we now have a verified reference implementation for each piece.

Does anyone enforce grant lifecycle live at sign time?

Impl	Live grant-expiry per request?	One-line
Amber (greenart7c3)	✅ yes — the reference	recomputes `acceptUntil > now()` every request; sweep is cleanup-only
Signet (our fork-cousin)	❌ re-ships #24	materializes a lifecycle-free photocopy; live only for subject-level revoke/suspend
FROSTR (igloo-server)	❌ no layer does	3 clean revocation layers, but zero time-based grant expiry
promenade (fiatjaf)	⚠️ per-profile `Until` only	no revoke API at all — revoke = re-key
NDK (we embed)	n/a — blank seam	we own 100% of policy

Amber is the positive template (verified)

IntentUtils.isRemembered() (IntentUtils.kt:1087-1101) is the per-request verdict and recomputes the deadline against now() every call; expired → returns null → prompt. The 24h updateExpiredPermissions sweep (ApplicationDao.kt:51) is non-load-bearing — correctness doesn't depend on it firing. Three things worth lifting straight into our D design:

Absolute deadline on the grant row + pure-function verdict recomputed per request. That is Option D, shipping in production.
Denials are time-boxed too (acceptUntil and rejectUntil): "reject for 5 min" decays back to a prompt instead of a permanent no.
Cache rows, never verdicts (CachingApplicationDao): keeps the now() re-check on every cache hit. Same lesson Signet's coarse-cache teaches for subject state.

Corrections to the earlier secondhand summary

Signet does NOT enforce all lifecycle live (prior comment) — it re-ships our exact #24 for token TTL/usage; it only fixed subject-level revoke/suspend. Confirmed in source.
promenade "FROST can't do ECDH/encrypted DMs" is false — frost/ecdh.go implements threshold ECDH; promenade chooses not to wire it (AuthorizeEncryption → false, GroupContext.Encrypt → "not implemented"). Relevant to the #18 "bunker for everything" endgame: threshold-protecting the server identity wouldn't mathematically preclude DM decryption — but keeping ECDH on a separate non-threshold key is the cheaper path. The functional "promenade can't decrypt" stands; the reason was wrong.
FROSTR PBKDF2 is 600k iters, not ~200k; its peer policy is default-allow + explicit deny, not "deny-override"; session revoke is explicit (status='revoked'), only per-grant revoke is implicit.

What each implementation contributes to our redesign

Amber → live-evaluation reference (deadline-on-row, recompute-vs-now, time-boxed denials, wildcard-as-distinct-tier).
Signet → schema reference for the Token/Policy/ConnectionToken decomposition (its ConnectionToken-vs-Token split is D1) — minus the one applyToken materialization line we must not copy.
FROSTR → revocation decomposition (app-grant ≠ transport ≠ key-rotation) + auditable, revocable credentials (revoked_at checked first, last-used tracking, audit-event-on-grant-change).
promenade → the revoke = re-key anti-pattern to avoid: keep grant-revoke independent of key rotation; never force touching the master nsec to drop one capability.
NDK → confirmed blank seam: all lifecycle logic lives in our callback. One gotcha filed separately as #26 — get_public_key bypasses pubkeyAllowed entirely, so identity disclosure is ungated/unauditable through our ACL seam (every other method gates; this one doesn't). Worth a deliberate accept-or-override decision as part of the "one predicate on every request" goal.

Bottom line for the open decision

Option D is the only design that closes the grant-level family, and now has a production existence-proof (Amber) plus a cautionary re-ship of our bug (Signet) on either side of it. D1 is corroborated by Signet's two-source schema and avoids promenade's revoke=re-key trap.

📎 Onboarding narrative attached (token-ttl-acl-decision-explainer.pdf, below): the same discovery → reasoning → direction story told for non-specialists (no Nostr/codebase background assumed), with diagrams — a companion to the bug/decision explainer on the issue body above. The full source-verified survey with all file:line citations (tiers C/D, key-at-rest references, the steal-list) is in docs/acl-prior-art-survey.md.

## Prior-art survey, source-verified — the complete picture Read the actual source of every other NIP-46 signer worth learning from (clones at the commits cited; an initial automated pass overstated several of these, so each claim below is checked against code). Full writeup with all file:line citations lands in `docs/acl-prior-art-survey.md`. Net: **nothing unseats Option D, leaning D1** — and we now have a verified reference implementation for each piece. ### Does anyone enforce grant lifecycle live at sign time? | Impl | Live grant-expiry per request? | One-line | |---|---|---| | **Amber** (greenart7c3) | ✅ **yes — the reference** | recomputes `acceptUntil > now()` every request; sweep is cleanup-only | | **Signet** (our fork-cousin) | ❌ re-ships #24 | materializes a lifecycle-free photocopy; live only for subject-level revoke/suspend | | **FROSTR** (igloo-server) | ❌ no layer does | 3 clean revocation layers, but zero time-based grant expiry | | **promenade** (fiatjaf) | ⚠️ per-profile `Until` only | no revoke API at all — revoke = re-key | | **NDK** (we embed) | n/a — blank seam | we own 100% of policy | ### Amber is the positive template (verified) `IntentUtils.isRemembered()` (`IntentUtils.kt:1087-1101`) is the per-request verdict and recomputes the deadline against `now()` every call; expired → returns `null` → prompt. The 24h `updateExpiredPermissions` sweep (`ApplicationDao.kt:51`) is **non-load-bearing** — correctness doesn't depend on it firing. Three things worth lifting straight into our D design: 1. **Absolute deadline on the grant row + pure-function verdict recomputed per request.** That *is* Option D, shipping in production. 2. **Denials are time-boxed too** (`acceptUntil` *and* `rejectUntil`): "reject for 5 min" decays back to a prompt instead of a permanent no. 3. **Cache rows, never verdicts** (`CachingApplicationDao`): keeps the `now()` re-check on every cache hit. Same lesson Signet's coarse-cache teaches for subject state. ### Corrections to the earlier secondhand summary - **Signet does NOT enforce all lifecycle live** (prior comment) — it re-ships our exact #24 for token TTL/usage; it only fixed subject-level revoke/suspend. Confirmed in source. - **promenade "FROST can't do ECDH/encrypted DMs" is false** — `frost/ecdh.go` implements threshold ECDH; promenade *chooses* not to wire it (`AuthorizeEncryption → false`, `GroupContext.Encrypt → "not implemented"`). Relevant to the #18 "bunker for everything" endgame: threshold-protecting the server identity wouldn't *mathematically* preclude DM decryption — but keeping ECDH on a separate non-threshold key is the cheaper path. The functional "promenade can't decrypt" stands; the reason was wrong. - **FROSTR PBKDF2 is 600k iters, not ~200k**; its peer policy is *default-allow + explicit deny*, not "deny-override"; session revoke is explicit (`status='revoked'`), only per-grant revoke is implicit. ### What each implementation contributes to our redesign - **Amber** → live-evaluation reference (deadline-on-row, recompute-vs-now, time-boxed denials, wildcard-as-distinct-tier). - **Signet** → schema reference for the `Token`/`Policy`/`ConnectionToken` decomposition (its `ConnectionToken`-vs-`Token` split *is* D1) — minus the one `applyToken` materialization line we must not copy. - **FROSTR** → revocation decomposition (app-grant ≠ transport ≠ key-rotation) + auditable, revocable credentials (`revoked_at` checked first, last-used tracking, audit-event-on-grant-change). - **promenade** → the **revoke = re-key anti-pattern to avoid**: keep grant-revoke independent of key rotation; never force touching the master nsec to drop one capability. - **NDK** → confirmed *blank* seam: all lifecycle logic lives in our callback. One gotcha filed separately as **#26** — `get_public_key` bypasses `pubkeyAllowed` entirely, so identity disclosure is ungated/unauditable through our ACL seam (every other method gates; this one doesn't). Worth a deliberate accept-or-override decision as part of the "one predicate on every request" goal. ### Bottom line for the open decision **Option D** is the only design that closes the grant-level family, and now has a production existence-proof (Amber) plus a cautionary re-ship of our bug (Signet) on either side of it. **D1** is corroborated by Signet's two-source schema and avoids promenade's revoke=re-key trap. > 📎 **Onboarding narrative attached** (`token-ttl-acl-decision-explainer.pdf`, below): the same discovery → reasoning → direction story told for non-specialists (no Nostr/codebase background assumed), with diagrams — a companion to the bug/decision explainer on the issue body above. The full source-verified survey with all file:line citations (tiers C/D, key-at-rest references, the steal-list) is in `docs/acl-prior-art-survey.md`.

token-ttl-acl-decision-explainer.pdf

552 KiB

padreug referenced this issue from a commit

2026-06-19 13:17:25 +00:00

docs(#25): add lnbits/nostr_bunker comparison (prior art)

padreug referenced this issue from a commit

2026-06-19 13:17:25 +00:00

docs(#25): source-verified ACL prior-art survey + keep-our-fork decision

padreug referenced this issue from a commit

2026-06-19 13:17:25 +00:00

feat(schema)(#25): Request.keyUserId + SigningCondition lifecycle for live grant eval

padreug referenced this issue from a commit

2026-06-19 13:17:25 +00:00

test(acl)(#25): extract pure grantIsLive/liveWhere + unit tests

padreug referenced this issue

2026-06-19 13:17:59 +00:00

fix(acl): enforce token grant lifecycle live at sign time (#24, #25) #27

padreug referenced this issue

2026-06-19 13:29:51 +00:00

Enforce PolicyRule.maxUsageCount live at sign time (needs a durable signing log) #28

padreug referenced this issue

2026-06-19 13:30:02 +00:00

Add a DB-backed test harness + integration tests for checkIfPubkeyAllowed #29

padreug referenced this issue from a commit

2026-06-19 16:05:20 +00:00

Merge pull request 'fix(acl): enforce token grant lifecycle live at sign time (#24, #25)' (#27) from issue-25-live-grant-lifecycle into dev

padreug commented

2026-06-19 20:55:15 +00:00

Author

Owner

Option D (leaning D1) implemented and deployed to all servers via #27 (merge 992c6a8):

single grantIsLive(now) predicate used identically at redeem (validateToken) and sign (checkIfPubkeyAllowed)
applyToken de-materialized — token grants evaluated live off Token → Policy → PolicyRule
manual-override SigningCondition layer carries its own lifecycle (D1)

The materialization-drift family is closed by construction. Spinoffs tracked separately: usage-cap enforcement #28, DB integration tests #29, NDK get_public_key seam #26. Prior-art survey + keep-our-fork decision landed in docs/.

Closing the design RFC as delivered.

Option D (leaning D1) implemented and **deployed to all servers** via #27 (merge `992c6a8`): - single `grantIsLive(now)` predicate used identically at redeem (`validateToken`) and sign (`checkIfPubkeyAllowed`) - `applyToken` de-materialized — token grants evaluated live off `Token → Policy → PolicyRule` - manual-override `SigningCondition` layer carries its own lifecycle (D1) The materialization-drift family is closed by construction. Spinoffs tracked separately: usage-cap enforcement #28, DB integration tests #29, NDK `get_public_key` seam #26. Prior-art survey + keep-our-fork decision landed in `docs/`. Closing the design RFC as delivered.

padreug closed this issue

2026-06-19 20:55:39 +00:00

padreug referenced this issue from a commit

2026-06-20 19:51:20 +00:00

feat(acl)(#28): per-rule windowed usage caps enforced live at sign time

padreug referenced this issue

2026-06-20 19:52:19 +00:00

feat(acl): per-rule windowed usage caps enforced live at sign time (#28) #34

padreug referenced this issue

2026-06-20 19:52:43 +00:00

SigningLog retention/pruning — the usage-cap log grows unbounded #35

padreug referenced this issue from a commit