Permissions are unmodifiable after issue: shift sign-time auth from materialized SigningCondition snapshots to live policy lookup #11

Closed
opened 2026-05-30 10:40:29 +00:00 by padreug · 1 comment
Owner

Summary

nsecbunkerd's admin RPC surface is insert-only for everything that affects authorization:

Operation Available?
Create policy create_new_policy
Add a rule to an existing policy
Remove a rule from an existing policy
Update a rule on an existing policy
Grant new permission to existing KeyUser
Revoke a single permission from a KeyUser
Revoke a whole KeyUser binding revoke_user (binary)
Revoke a token (without nuking the KeyUser)

In practical terms: once applyToken runs for a KeyUser, the set of (method, kind) pairs that user is allowed to sign is frozen forever. The Policy and its rules are template data consulted only at that one moment; after that, the auth check (lib/acl/checkIfPubkeyAllowed) reads SigningCondition rows scoped to keyUserId and never re-consults the source.

That's the wrong model for any organization-style deployment where permissions evolve. Adding a new event kind to a policy (e.g. NIP-52 calendar kinds 31922/31923 for an LNbits events-extension integration — a real case that surfaced this) does nothing for any KeyUser already bound to that policy. They'd need to be revoked + re-onboarded, which on the lnbits side means re-provisioning the account (new wallet, new identity, etc) — an unacceptable cost.

Repro of the staleness

  1. create_new_policy with rules: [{method: "sign_event", kind: 0}]. Policy id = N.
  2. create_new_token under policy N. Take the token.
  3. As a NIP-46 client, connect with that token. Bunker creates KeyUser + a single SigningCondition row for (method=sign_event, kind=0).
  4. sign_event for kind 0 → allowed.
  5. sign_event for kind 7 → denied (no SigningCondition row).
  6. Now imagine the operator wants to add kind 7 to the policy:
    • No add_policy_rule RPC exists. They'd have to wipe + recreate the policy.
    • Even if they did, the KeyUser from step 3 already has its SigningCondition snapshot. Its row set wouldn't change.
    • There's no add_signing_condition RPC to fix that KeyUser either.
  7. The only path is revoke_user + re-onboard. For a deployment with N existing accounts, that's N re-onboardings.

Proposed change — auth-by-policy at sign-time

Shift checkIfPubkeyAllowed from snapshot-of-rules to live-lookup-of-rules:

Current (src/daemon/lib/acl/index.ts:35-45):

const signingCondition = await prisma.signingCondition.findFirst({
    where: { keyUserId: keyUser.id, ...signingConditionQuery }
});

if (!signingCondition) return undefined;

Proposed:

// 1. Check for explicit deny on this KeyUser (per-user override, takes precedence)
const explicitDeny = await prisma.signingCondition.findFirst({
    where: { keyUserId: keyUser.id, allowed: false, ...signingConditionQuery }
});
if (explicitDeny) return false;

// 2. Check for explicit allow on this KeyUser (per-user grant beyond policy)
const explicitAllow = await prisma.signingCondition.findFirst({
    where: { keyUserId: keyUser.id, allowed: true, ...signingConditionQuery }
});
if (explicitAllow) return true;

// 3. Fall back to the policy attached to this KeyUser
const policyRule = await prisma.policyRule.findFirst({
    where: {
        policy: { tokens: { some: { keyUserId: keyUser.id } } },
        method,
        kind: signingConditionQuery.kind ?? undefined,
    },
});
return policyRule ? true : undefined;

SigningCondition becomes a per-user override layer (deny-list or extra-grant), not the primary auth source. Policy.rules becomes the live source of truth.

What this unlocks

Once auth is policy-driven at sign-time:

  1. Adding a kind to a policy is instant for every KeyUser bound to it. Just write a row.
  2. Revoking a kind from a policy is instant for everyone simultaneously.
  3. Per-user overrides still possible via SigningCondition (e.g. a specific user gets a kind their policy doesn't, or is denied a kind their policy allows).
  4. Revoke a token can finally exist as its own operation — delete the token's KeyUser (or mark it revoked) without needing per-policy gymnastics.
  5. Organizational permission flows (the LNbits operator-IdP use case): admins manage policies in one place, every account auto-inherits changes.

Companion admin RPCs

With (B) landed, the missing-but-not-blocking RPCs become straightforward:

  • add_policy_rule(policyId, {method, kind}) — INSERT into PolicyRule. Effective immediately for every bound KeyUser.
  • remove_policy_rule(policyId, ruleId) — DELETE. Same.
  • update_policy(policyId, {name?, expiresAt?}) — UPDATE. Affects bound users only at next sign attempt.
  • add_signing_condition(keyUserId, {method, kind, allowed}) — INSERT into SigningCondition. Per-user override.
  • remove_signing_condition(signingConditionId) — DELETE. Per-user.
  • revoke_token(tokenId) — UPDATE token's revokedAt field; auth check then ignores SigningConditions / policy bindings sourced from this token.

These are all single-table mutations with no migration story because the schema already supports them.

Migration concerns

The auth flip is backwards-compatible for existing deployments:

  • Existing SigningCondition rows continue to work — they're still consulted (now as the override layer).
  • New KeyUsers won't get materialized SigningCondition rows on applyToken (or we can keep that behavior — they're redundant but harmless).
  • An existing KeyUser whose Policy gains a new rule will start passing auth checks for that new rule's (method, kind) immediately. That's the whole point — but worth noting as a behavior change for any deployment that was depending on the snapshot freeze (we don't know of any, but worth flagging in release notes).

For the LNbits IdP integration specifically: dropping applyToken's SigningCondition-fanout loop becomes the cleanest implementation. The KeyUser binding remains (carries keyName, userPubkey, description, revokedAt), but the for (rule of policy.rules) await prisma.signingCondition.create(...) block at src/daemon/backend/index.ts:60-75 can be deleted entirely once auth is live-from-policy.

Cross-references

  • aiolabs/lnbits PR #33 — eager-bind chain that surfaced the LNbits-specific need for permission additions across already-onboarded accounts.
  • aiolabs/lnbits issue forthcoming — DEFAULT_POLICY_RULES is missing NIP-52 kinds 31922/31923. Today that's a hard block; with this change, fixing it would be a one-line bump + RPC call.
  • Cross-session coordination log: ~/dev/coordination/log.md entries 2026-05-30T10:30Z and later.

Out of scope for this issue

  • Concrete API design for the companion admin RPCs above — that's a follow-up once auth model lands.
  • UI for managing policies — separate concern.
  • Per-rule usage limits (maxUsageCount) — the schema has these but they're not enforced anywhere I could find. Worth a separate audit but unrelated to the propagation model.
## Summary `nsecbunkerd`'s admin RPC surface is **insert-only** for everything that affects authorization: | Operation | Available? | |---|---| | Create policy | ✅ `create_new_policy` | | Add a rule to an existing policy | ❌ | | Remove a rule from an existing policy | ❌ | | Update a rule on an existing policy | ❌ | | Grant new permission to existing KeyUser | ❌ | | Revoke a single permission from a KeyUser | ❌ | | Revoke a whole KeyUser binding | ✅ `revoke_user` (binary) | | Revoke a token (without nuking the KeyUser) | ❌ | In practical terms: once `applyToken` runs for a KeyUser, the set of `(method, kind)` pairs that user is allowed to sign is **frozen forever**. The Policy and its rules are template data consulted only at that one moment; after that, the auth check (`lib/acl/checkIfPubkeyAllowed`) reads `SigningCondition` rows scoped to `keyUserId` and never re-consults the source. That's the wrong model for any organization-style deployment where permissions evolve. Adding a new event kind to a policy (e.g. NIP-52 calendar kinds 31922/31923 for an LNbits events-extension integration — a real case that surfaced this) does **nothing** for any KeyUser already bound to that policy. They'd need to be revoked + re-onboarded, which on the lnbits side means re-provisioning the account (new wallet, new identity, etc) — an unacceptable cost. ## Repro of the staleness 1. `create_new_policy` with `rules: [{method: "sign_event", kind: 0}]`. Policy id = N. 2. `create_new_token` under policy N. Take the token. 3. As a NIP-46 client, `connect` with that token. Bunker creates KeyUser + a single `SigningCondition` row for `(method=sign_event, kind=0)`. 4. `sign_event` for kind 0 → ✅ allowed. 5. `sign_event` for kind 7 → ❌ denied (no `SigningCondition` row). 6. Now imagine the operator wants to add kind 7 to the policy: - No `add_policy_rule` RPC exists. They'd have to wipe + recreate the policy. - Even if they did, the KeyUser from step 3 already has its `SigningCondition` snapshot. Its row set wouldn't change. - There's no `add_signing_condition` RPC to fix that KeyUser either. 7. The only path is `revoke_user` + re-onboard. For a deployment with N existing accounts, that's N re-onboardings. ## Proposed change — auth-by-policy at sign-time Shift `checkIfPubkeyAllowed` from snapshot-of-rules to live-lookup-of-rules: **Current** (`src/daemon/lib/acl/index.ts:35-45`): ```typescript const signingCondition = await prisma.signingCondition.findFirst({ where: { keyUserId: keyUser.id, ...signingConditionQuery } }); if (!signingCondition) return undefined; ``` **Proposed**: ```typescript // 1. Check for explicit deny on this KeyUser (per-user override, takes precedence) const explicitDeny = await prisma.signingCondition.findFirst({ where: { keyUserId: keyUser.id, allowed: false, ...signingConditionQuery } }); if (explicitDeny) return false; // 2. Check for explicit allow on this KeyUser (per-user grant beyond policy) const explicitAllow = await prisma.signingCondition.findFirst({ where: { keyUserId: keyUser.id, allowed: true, ...signingConditionQuery } }); if (explicitAllow) return true; // 3. Fall back to the policy attached to this KeyUser const policyRule = await prisma.policyRule.findFirst({ where: { policy: { tokens: { some: { keyUserId: keyUser.id } } }, method, kind: signingConditionQuery.kind ?? undefined, }, }); return policyRule ? true : undefined; ``` `SigningCondition` becomes a **per-user override layer** (deny-list or extra-grant), not the primary auth source. `Policy.rules` becomes the live source of truth. ## What this unlocks Once auth is policy-driven at sign-time: 1. **Adding a kind to a policy is instant** for every KeyUser bound to it. Just write a row. 2. **Revoking a kind from a policy is instant** for everyone simultaneously. 3. **Per-user overrides** still possible via `SigningCondition` (e.g. a specific user gets a kind their policy doesn't, or is denied a kind their policy allows). 4. **Revoke a token** can finally exist as its own operation — delete the token's `KeyUser` (or mark it revoked) without needing per-policy gymnastics. 5. **Organizational permission flows** (the LNbits operator-IdP use case): admins manage policies in one place, every account auto-inherits changes. ## Companion admin RPCs With (B) landed, the missing-but-not-blocking RPCs become straightforward: - `add_policy_rule(policyId, {method, kind})` — INSERT into `PolicyRule`. Effective immediately for every bound KeyUser. - `remove_policy_rule(policyId, ruleId)` — DELETE. Same. - `update_policy(policyId, {name?, expiresAt?})` — UPDATE. Affects bound users only at next sign attempt. - `add_signing_condition(keyUserId, {method, kind, allowed})` — INSERT into `SigningCondition`. Per-user override. - `remove_signing_condition(signingConditionId)` — DELETE. Per-user. - `revoke_token(tokenId)` — UPDATE token's `revokedAt` field; auth check then ignores SigningConditions / policy bindings sourced from this token. These are all single-table mutations with no migration story because the schema already supports them. ## Migration concerns The auth flip is backwards-compatible for existing deployments: - Existing `SigningCondition` rows continue to work — they're still consulted (now as the override layer). - New KeyUsers won't get materialized `SigningCondition` rows on `applyToken` (or we can keep that behavior — they're redundant but harmless). - An existing KeyUser whose Policy gains a new rule will start passing auth checks for that new rule's `(method, kind)` immediately. That's the whole point — but worth noting as a behavior change for any deployment that was depending on the snapshot freeze (we don't know of any, but worth flagging in release notes). For the LNbits IdP integration specifically: dropping `applyToken`'s `SigningCondition`-fanout loop becomes the cleanest implementation. The `KeyUser` binding remains (carries `keyName`, `userPubkey`, `description`, `revokedAt`), but the `for (rule of policy.rules) await prisma.signingCondition.create(...)` block at `src/daemon/backend/index.ts:60-75` can be deleted entirely once auth is live-from-policy. ## Cross-references - `aiolabs/lnbits` PR #33 — eager-bind chain that surfaced the LNbits-specific need for permission additions across already-onboarded accounts. - `aiolabs/lnbits` issue forthcoming — `DEFAULT_POLICY_RULES` is missing NIP-52 kinds 31922/31923. Today that's a hard block; with this change, fixing it would be a one-line bump + RPC call. - Cross-session coordination log: `~/dev/coordination/log.md` entries `2026-05-30T10:30Z` and later. ## Out of scope for this issue - Concrete API design for the companion admin RPCs above — that's a follow-up once auth model lands. - UI for managing policies — separate concern. - Per-rule usage limits (`maxUsageCount`) — the schema has these but they're not enforced anywhere I could find. Worth a separate audit but unrelated to the propagation model.
Author
Owner

Follow-up: three gaps from the original issue body worth fleshing out before implementation starts

Three things I deferred or under-specified in the issue body. Adding them here so the design has a complete picture.

1. Concrete API shape for the companion admin RPCs

The original body just named the RPCs. Proposed parameter shapes, all following the existing create_new_policy.ts / create_new_token.ts pattern (single JSON-stringified param, response shipped over kind-24134):

// Policy-level mutations — affect every KeyUser bound to the policy
add_policy_rule(
    JSON.stringify({
        policyId: number,
        rule: { method: string, kind?: number, maxUsageCount?: number },
    })
)  "ok"

remove_policy_rule(
    JSON.stringify({ ruleId: number })
)  "ok"

update_policy(
    JSON.stringify({
        policyId: number,
        patch: { name?: string, expiresAt?: string | null },
    })
)  "ok"

// Per-KeyUser overrides — only affect one user
add_signing_condition(
    JSON.stringify({
        keyUserId: number,
        condition: { method: string, kind?: number | "all", allowed: boolean },
    })
)  "ok"

remove_signing_condition(
    JSON.stringify({ conditionId: number })
)  "ok"

// Token-level revocation without nuking the whole KeyUser
revoke_token(
    JSON.stringify({ tokenId: number })
)  "ok"

Symmetrical with the existing revoke_user (binary update on a single field). Each is one prisma mutation, no migration. The clients in aiolabs/lnbits#35's reconciliation pass would call these in a loop after get_policies returns the current state.

2. Multi-lnbits-instance bunker sharing

The current aiolabs/lnbits provisioning convention shares one bunker across multiple lnbits instances by converging on policy_name = "lnbits-default". The existing docstring in lnbits/core/signers/remote_bunker.py:90-93 acknowledges this:

Multiple lnbits instances sharing one bunker should all use the same policy_name; the max(id) tiebreak handles the rare case where they raced to create.

Implication for (B): once auth is policy-driven at sign-time, a rule change on the shared policy propagates instantly to every lnbits instance's user base simultaneously. That's a feature, not a bug — but it has two operational consequences worth surfacing in the implementation:

  • Concurrent rule additions are safe because they're additive. An lnbits-v1 instance reconciling its rule set against the live policy can call add_policy_rule for any kinds it has in its DEFAULT_POLICY_RULES that aren't yet on the policy. An lnbits-v2 instance doing the same with a broader set just adds the extra rules. Both converge harmlessly.

  • Concurrent rule removals can race between instances on differing versions. If lnbits-v1's DEFAULT_POLICY_RULES shrinks (operator drops some kinds), and lnbits-v1 calls remove_policy_rule, every other lnbits instance using the same shared bunker loses those permissions for its user base too. Recommendation: lnbits-side reconciliation should ONLY ADD rules, never remove them, leaving deletions as a manual admin op. Worth noting in lnbits#35's reconciliation pass.

This is purely operational, not a schema concern. The bunker doesn't need to know about instance identity.

3. Test plan for the auth-shift change

Regression cases that should cover the migration safely:

Test What it proves
Pre-existing KeyUser w/ snapshotted SigningCondition + matching policy rule → allowed Backwards-compat path works
Pre-existing KeyUser w/ snapshotted SigningCondition + NO matching policy rule (e.g. orphan from a deleted rule) → still allowed via override layer SigningCondition retained as override doesn't break legacy auth
Per-user add_signing_condition(allowed: false) denies request even when policy allows Per-user deny precedence
Per-user add_signing_condition(allowed: true) allows request even when policy doesn't Per-user grant beyond policy
add_policy_rule → immediately following sign request for that kind from a previously-bound KeyUser succeeds Live propagation (the headline feature)
remove_policy_rule → immediately following sign request for that kind from a previously-bound KeyUser is rejected (assuming no per-user override) Live revocation
revoke_user continues to deny all subsequent ops for that KeyUser regardless of policy Binary revoke still works
revoke_token denies ops bound to that specific token without affecting other tokens issued to the same KeyUser New surgical revocation works
maxUsageCount enforcement Out of scope — note the field isn't currently consulted anywhere, separate audit needed

The first two are critical for any deployment with already-onboarded users (the migration safety net). The next three exercise the new live-policy semantic. The last three guard the existing revoke flows.

A property-based test would also be reasonable here — generate a random policy + KeyUser graph, evaluate the auth check against both the snapshot model and the live model, assert results match for all already-bound rules. That would catch any subtle divergence from the existing behavior.

Where these gaps live now

These three additions cover the design surface that the original body deferred or implied. Implementation can proceed off the issue body for the auth shift itself; this comment is the supplement for the companion RPCs + operational + test concerns.

## Follow-up: three gaps from the original issue body worth fleshing out before implementation starts Three things I deferred or under-specified in the issue body. Adding them here so the design has a complete picture. ### 1. Concrete API shape for the companion admin RPCs The original body just named the RPCs. Proposed parameter shapes, all following the existing `create_new_policy.ts` / `create_new_token.ts` pattern (single JSON-stringified param, response shipped over kind-24134): ```typescript // Policy-level mutations — affect every KeyUser bound to the policy add_policy_rule( JSON.stringify({ policyId: number, rule: { method: string, kind?: number, maxUsageCount?: number }, }) ) → "ok" remove_policy_rule( JSON.stringify({ ruleId: number }) ) → "ok" update_policy( JSON.stringify({ policyId: number, patch: { name?: string, expiresAt?: string | null }, }) ) → "ok" // Per-KeyUser overrides — only affect one user add_signing_condition( JSON.stringify({ keyUserId: number, condition: { method: string, kind?: number | "all", allowed: boolean }, }) ) → "ok" remove_signing_condition( JSON.stringify({ conditionId: number }) ) → "ok" // Token-level revocation without nuking the whole KeyUser revoke_token( JSON.stringify({ tokenId: number }) ) → "ok" ``` Symmetrical with the existing `revoke_user` (binary update on a single field). Each is one prisma mutation, no migration. The clients in `aiolabs/lnbits#35`'s reconciliation pass would call these in a loop after `get_policies` returns the current state. ### 2. Multi-lnbits-instance bunker sharing The current `aiolabs/lnbits` provisioning convention shares one bunker across multiple lnbits instances by converging on `policy_name = "lnbits-default"`. The existing docstring in `lnbits/core/signers/remote_bunker.py:90-93` acknowledges this: > Multiple lnbits instances sharing one bunker should all use the same policy_name; the `max(id)` tiebreak handles the rare case where they raced to create. Implication for (B): once auth is policy-driven at sign-time, a rule change on the shared policy propagates **instantly to every lnbits instance's user base simultaneously**. That's a feature, not a bug — but it has two operational consequences worth surfacing in the implementation: - **Concurrent rule additions are safe** because they're additive. An lnbits-v1 instance reconciling its rule set against the live policy can call `add_policy_rule` for any kinds it has in its `DEFAULT_POLICY_RULES` that aren't yet on the policy. An lnbits-v2 instance doing the same with a broader set just adds the extra rules. Both converge harmlessly. - **Concurrent rule removals can race** between instances on differing versions. If lnbits-v1's `DEFAULT_POLICY_RULES` shrinks (operator drops some kinds), and lnbits-v1 calls `remove_policy_rule`, every other lnbits instance using the same shared bunker loses those permissions for its user base too. **Recommendation: lnbits-side reconciliation should ONLY ADD rules, never remove them**, leaving deletions as a manual admin op. Worth noting in lnbits#35's reconciliation pass. This is purely operational, not a schema concern. The bunker doesn't need to know about instance identity. ### 3. Test plan for the auth-shift change Regression cases that should cover the migration safely: | Test | What it proves | |---|---| | Pre-existing KeyUser w/ snapshotted SigningCondition + matching policy rule → allowed | Backwards-compat path works | | Pre-existing KeyUser w/ snapshotted SigningCondition + NO matching policy rule (e.g. orphan from a deleted rule) → still allowed via override layer | SigningCondition retained as override doesn't break legacy auth | | Per-user `add_signing_condition(allowed: false)` denies request even when policy allows | Per-user deny precedence | | Per-user `add_signing_condition(allowed: true)` allows request even when policy doesn't | Per-user grant beyond policy | | `add_policy_rule` → immediately following sign request for that kind from a previously-bound KeyUser succeeds | Live propagation (the headline feature) | | `remove_policy_rule` → immediately following sign request for that kind from a previously-bound KeyUser is rejected (assuming no per-user override) | Live revocation | | `revoke_user` continues to deny all subsequent ops for that KeyUser regardless of policy | Binary revoke still works | | `revoke_token` denies ops bound to that specific token without affecting other tokens issued to the same KeyUser | New surgical revocation works | | `maxUsageCount` enforcement | Out of scope — note the field isn't currently consulted anywhere, separate audit needed | The first two are critical for any deployment with already-onboarded users (the migration safety net). The next three exercise the new live-policy semantic. The last three guard the existing revoke flows. A property-based test would also be reasonable here — generate a random policy + KeyUser graph, evaluate the auth check against both the snapshot model and the live model, assert results match for all already-bound rules. That would catch any subtle divergence from the existing behavior. ### Where these gaps live now These three additions cover the design surface that the original body deferred or implied. Implementation can proceed off the issue body for the auth shift itself; this comment is the supplement for the companion RPCs + operational + test concerns.
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
aiolabs/nsecbunkerd#11
No description provided.