Trim applyToken SigningCondition fan-out once override layer is rarely-consulted #12

Closed
opened 2026-05-30 11:21:08 +00:00 by padreug · 1 comment
Owner

Context

Filed as a parking-spot follow-up to #11 (live-policy auth + companion admin RPCs). Surfaced in webapp's review of the #11 implementation plan (2026-05-30 — see comment 1473 and the coord-log entry that follows it).

Today

After #11 lands, sign-time authorization works in two layers:

  1. Override layerSigningCondition rows scoped to a KeyUser. Consulted first in checkIfPubkeyAllowed. Used for per-user grants beyond the policy and per-user denies that override the policy.
  2. Base layer — live join over KeyUser → Token → Policy → PolicyRule. Consulted only if no matching SigningCondition row exists for the (method, kind) being checked.

The applyToken helper (src/daemon/backend/index.ts:77-121) still fans out SigningCondition rows at token-bind time — one per PolicyRule on the bound Policy. That fan-out is what kept legacy auth working before #11; under #11 it becomes redundant for the happy path, because the live-policy join would have produced the same result.

Why this isn't urgent

The fan-out is not wrong under #11 — it's a pre-population of the override layer that just happens to mirror the base layer. Auth answers are identical either way. The two costs of leaving it in place:

  1. Storage — N SigningCondition rows per KeyUser instead of zero. Each row is tiny (~50 bytes), so a 1000-user bunker holds ~tens of KB of redundant data. Not a real cost yet.
  2. Drift surface — if a future PolicyRule mutation (e.g. remove_policy_rule via #11's new RPC) needs to be reflected per-user, the live-policy path handles it automatically but the pre-populated SigningCondition row would still claim "allowed" via step 3 of the auth check. Concretely: an operator removes a kind from the shared lnbits-default policy expecting global revocation, but every KeyUser still holds a SigningCondition row granting that kind, so the live-revoke headline feature of #11 silently doesn't fire.

The second is the real reason this should be trimmed eventually — it actively undermines the #11 design intent for already-bound users. But it doesn't undermine #11 for new bindings, and operators can be told to "wipe the affected SigningCondition rows manually after a remove_policy_rule" as a transitional admin op.

Proposed work

  1. Validate the live-policy path covers the happy-path by adding instrumentation to checkIfPubkeyAllowed that counts how often step 3 (override) returns true vs step 4 (live join) returns true. Run for some period on the regtest stack and (ideally) a non-prod environment with real traffic. If step 3 is consulted in >5% of sign requests, that's signal that the override layer is doing real work and trimming the fan-out would change observable behavior.
  2. Once step 4 dominates, remove the loop at backend/index.ts:96-112 that creates a SigningCondition per PolicyRule. Leave only the connect-method row at lines 88-94 (still useful as the "this user has been bound" marker).
  3. Migration story for existing rows: leave them in place — they continue to grant via the override layer, which is the intended backwards-compat behavior. Don't bulk-delete unless someone has a specific reason.
  4. Update aiolabs/lnbits#35's reconciliation pass to assume the override layer is no longer pre-populated — lnbits-default's add_policy_rule calls become the only path to permissions for new users.

Scope guard

This explicitly does NOT cover trimming the override layer entirely. Override remains useful for:

  • Per-user grants beyond policy (e.g. an admin allowlists a power-user for a kind the default policy doesn't grant).
  • Per-user denies (the method='*' allowed=false explicit-reject row).
  • The revoke_user binary path, though that's tracked on KeyUser.revokedAt not SigningCondition.

cc #11 (parent), aiolabs/lnbits#35 (downstream consumer).

## Context Filed as a parking-spot follow-up to #11 (live-policy auth + companion admin RPCs). Surfaced in webapp's review of the #11 implementation plan (2026-05-30 — see [comment 1473](https://git.atitlan.io/aiolabs/nsecbunkerd/issues/11#issuecomment-1473) and the coord-log entry that follows it). ## Today After #11 lands, sign-time authorization works in two layers: 1. **Override layer** — `SigningCondition` rows scoped to a `KeyUser`. Consulted first in `checkIfPubkeyAllowed`. Used for per-user grants beyond the policy and per-user denies that override the policy. 2. **Base layer** — live join over `KeyUser → Token → Policy → PolicyRule`. Consulted only if no matching `SigningCondition` row exists for the (method, kind) being checked. The `applyToken` helper (`src/daemon/backend/index.ts:77-121`) still fans out `SigningCondition` rows at token-bind time — one per `PolicyRule` on the bound `Policy`. That fan-out is what kept legacy auth working before #11; under #11 it becomes redundant for the happy path, because the live-policy join would have produced the same result. ## Why this isn't urgent The fan-out is **not wrong** under #11 — it's a pre-population of the override layer that just happens to mirror the base layer. Auth answers are identical either way. The two costs of leaving it in place: 1. **Storage** — N `SigningCondition` rows per `KeyUser` instead of zero. Each row is tiny (~50 bytes), so a 1000-user bunker holds ~tens of KB of redundant data. Not a real cost yet. 2. **Drift surface** — if a future `PolicyRule` mutation (e.g. `remove_policy_rule` via #11's new RPC) needs to be reflected per-user, the live-policy path handles it automatically but the pre-populated SigningCondition row would still claim "allowed" via step 3 of the auth check. Concretely: an operator removes a kind from the shared `lnbits-default` policy expecting global revocation, but every KeyUser still holds a SigningCondition row granting that kind, so the live-revoke headline feature of #11 silently doesn't fire. The second is the real reason this should be trimmed eventually — it actively undermines the #11 design intent for already-bound users. But it doesn't undermine #11 for **new** bindings, and operators can be told to "wipe the affected SigningCondition rows manually after a remove_policy_rule" as a transitional admin op. ## Proposed work 1. **Validate the live-policy path covers the happy-path** by adding instrumentation to `checkIfPubkeyAllowed` that counts how often step 3 (override) returns true vs step 4 (live join) returns true. Run for some period on the regtest stack and (ideally) a non-prod environment with real traffic. If step 3 is consulted in >5% of sign requests, that's signal that the override layer is doing real work and trimming the fan-out would change observable behavior. 2. **Once step 4 dominates**, remove the loop at `backend/index.ts:96-112` that creates a SigningCondition per PolicyRule. Leave only the connect-method row at lines 88-94 (still useful as the "this user has been bound" marker). 3. **Migration story for existing rows**: leave them in place — they continue to grant via the override layer, which is the intended backwards-compat behavior. Don't bulk-delete unless someone has a specific reason. 4. **Update `aiolabs/lnbits#35`'s reconciliation pass** to assume the override layer is no longer pre-populated — `lnbits-default`'s `add_policy_rule` calls become the only path to permissions for new users. ## Scope guard This explicitly does NOT cover trimming the override layer entirely. Override remains useful for: - Per-user grants beyond policy (e.g. an admin allowlists a power-user for a kind the default policy doesn't grant). - Per-user denies (the `method='*'` allowed=false explicit-reject row). - The `revoke_user` binary path, though that's tracked on `KeyUser.revokedAt` not SigningCondition. cc #11 (parent), aiolabs/lnbits#35 (downstream consumer).
Author
Owner

Superseded by #27 (merged + deployed). This issue asked to trim the applyToken SigningCondition fan-out once the override layer became rarely-consulted — #27 went further and removed it entirely: applyToken now records only the KeyUser ← Token binding, and token grants are evaluated live via the ACL step-4 token join (Token → Policy → PolicyRule). There is no fan-out left to trim.

Closing as done by #27.

Superseded by #27 (merged + deployed). This issue asked to *trim* the `applyToken` SigningCondition fan-out once the override layer became rarely-consulted — #27 went further and **removed it entirely**: `applyToken` now records only the `KeyUser ← Token` binding, and token grants are evaluated live via the ACL step-4 token join (`Token → Policy → PolicyRule`). There is no fan-out left to trim. Closing as done by #27.
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
aiolabs/nsecbunkerd#12
No description provided.