refactor(v2): cassette transport — signer.nip44_* migration (#29 v1.1 / closes #21 partial)
Some checks failed
ci.yml / refactor(v2): cassette transport — signer.nip44_* migration (#29 v1.1 / closes #21 partial) (pull_request) Failing after 0s

Migrates the cassette transport's encrypt/decrypt paths off direct
`account.prvkey` reads to `signer.nip44_encrypt` / `signer.nip44_decrypt`
on the NostrSigner ABC landed by aiolabs/lnbits PR #38 (phase 2.4). Closes
the operator-side regression flagged at coord-log 2026-05-31T06:50Z:
Greg's RemoteBunkerSigner-migrated account had `accounts.prvkey IS NULL`
post-bunker, which the old code couldn't handle — consumer was logging
WARN every poll cycle and skipping every inbound state event.

## What changed

### cassette_transport.py

- New imports: `resolve_signer`, `SignerError`, `SignerUnavailableError`,
  `NsecBunkerTimeoutError`, `NsecBunkerRpcError` from the post-#38 lnbits
  surface. (The `try: from lnbits.core.signers import SignerError` block
  in the old code was permanently failing because `SignerError` actually
  lives in `lnbits.core.signers.base`, not the package root — fixed.)
- New `_resolve_operator_signer(operator_user_id)`: single source of
  truth for "give me the operator's account + NostrSigner, or raise an
  operator-facing error." Used by both the publish path and the consumer
  task.
- New `_nip44_encrypt_via_signer(account, signer, plaintext, peer)`
  and `_nip44_decrypt_via_signer(...)`: route through `signer.nip44_*`
  first; on `SignerUnavailableError` from a LocalSigner stub (the
  post-#38 ABC has LocalSigner raise on nip44_* explicitly — bunker
  migration required for NIP-44 v2), fall back to the hand-rolled impl
  against `account.prvkey`. Transitional until every operator on the
  instance is bunker-backed (S7).
- `_sign_as_operator` simplified: now `await signer.sign_event(event)`
  (the ABC is async; the old code passed `signer.sign_event` to the
  caller without await, returning a coroutine — also broken but never
  hit because the ImportError fallback fired first).
- `publish_to_atm` flow: `_resolve_operator_signer` → `_nip44_encrypt_
  via_signer` → `_sign_as_operator` → publish. Each step maps bunker /
  signer errors to `OperatorIdentityMissing` (400) / `SignerUnavailable`
  (503) / `CassetteTransportError` (500) for the API handler.
- `decrypt_and_parse_state_event` now `async` and takes `(event, account,
  signer)` instead of `(event, operator_privkey_hex)`. Maps
  `NsecBunkerTimeoutError` → `CassetteEventTransientError` (caller
  should retry on next poll, NOT advance `state_event_id`).
  `NsecBunkerRpcError` / `SignerUnavailableError` / `Nip44Error` / etc.
  → `CassetteEventDecodeError` (terminal — caller logs + skips).
- New `CassetteEventTransientError` class for the bunker-timeout case.
  Distinct from `CassetteEventDecodeError` so the consumer can log at
  INFO + retry vs WARNING + advance.
- Deleted `_get_operator_privkey_hex` (no longer needed).

### tasks.py — _handle_cassette_state_event

- Resolves the signer via `_resolve_operator_signer(machine.operator_
  user_id)`. On `CassetteTransportError` (OperatorIdentityMissing /
  SignerUnavailable), logs + skips.
- Awaits `decrypt_and_parse_state_event(event_obj, account, signer)`.
  On `CassetteEventTransientError`, logs at INFO + returns (state_event_
  id NOT advanced → consumer retries on next poll cycle).
  On `CassetteEventDecodeError`, logs at WARNING + returns (still
  state_event_id NOT advanced for v1; the WARN log surfaces the
  underlying issue for operator triage).

### tests/test_cassette_state_consumer.py — rewritten

- Three test doubles: `_FakeBunkerSigner` (working nip44_decrypt via
  hand-rolled impl), `_FakeLocalSignerStub` (raises like the post-#38
  LocalSigner stub), `_FakeRaisingSigner` (configurable exception).
- `_fake_account` helper using SimpleNamespace — the code under test
  only reads `.signer_type` + `.prvkey`.
- Five test classes covering: bunker-signer happy path (incl. multi-
  same-denom round-trip), LocalSigner transitional fallback,
  bunker-error mapping (timeout → transient, rpc reject → decode),
  payload validation (tamper / wrong-key / missing-fields / garbage
  JSON / wrong shape), d-tag construction (unchanged, kept as
  regression guard).
- Async coroutines driven via `asyncio.run` — matches the existing
  project pattern (no pytest-asyncio plugin in CI; see test_init.py
  failure mode).

### nip44.py — docstring update

Added a "Runtime status (post lnbits PR #38, 2026-05-31)" section
documenting that runtime usage moved to `signer.nip44_*` and this
module's role narrowed to (a) the LocalSigner transitional fallback
called from `cassette_transport`, and (b) test-only fixtures in
test_nip44_v2.py for spec-vector + bitspire cross-test validation.
"Don't add new runtime call sites here. The signer abstraction is
the path."

## Verification

- 155 passed, 1 pre-existing async-plugin failure unchanged. The 19
  consumer tests cover bunker happy path + LocalSigner fallback +
  bunker error mapping + payload validation + d-tag construction.
- Live smoke against Greg's RemoteBunkerSigner-migrated account
  on the regtest container: consumer correctly resolves the bunker
  signer, fires `NIP-46 rpc -> method=nip44_decrypt`, catches the
  resulting `NsecBunkerTimeoutError` (the local nsecbunkerd is not
  responding within 15s — separate operational concern), maps to
  `CassetteEventTransientError`, logs at INFO with "will retry next
  poll", and crucially does NOT advance `state_event_id` on the
  cassette_configs rows. Retry semantics preserved.

## Outstanding

- The bunker timeout itself is an operational issue (nsecbunkerd
  config / policy / process state for kind-less nip44_decrypt RPC) —
  not a satmachineadmin code concern; surface to the nsecbunkerd /
  lnbits sessions if it persists.
- Once every operator on the instance is on RemoteBunkerSigner (S7
  fully landed), the `_nip44_*_via_signer` helpers collapse to a
  direct `await signer.nip44_*` call, the LocalSigner fallback can
  be deleted, and `nip44.py`'s runtime exports retire (test-only).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Padreug 2026-05-31 09:21:43 +02:00
commit dcb7de0c27
4 changed files with 573 additions and 199 deletions

View file

@ -44,18 +44,24 @@ import json
import time
from typing import Optional
import coincurve
from lnbits.core.crud.users import get_account
from lnbits.utils.nostr import normalize_public_key, sign_event
from lnbits.core.services.nip46_bunker_client import (
NsecBunkerRpcError,
NsecBunkerTimeoutError,
)
from lnbits.core.signers import resolve_signer
from lnbits.core.signers.base import (
NostrSigner,
SignerError,
SignerUnavailableError,
)
from lnbits.utils.nostr import normalize_public_key
from loguru import logger
from .models import Machine, PublishCassettesPayload
from .nip44 import (
Nip44Error,
decrypt_with_conversation_key,
encrypt_with_conversation_key,
get_conversation_key,
)
from .nip44 import Nip44Error
from .nip44 import decrypt_from as _nip44_local_decrypt
from .nip44 import encrypt_for as _nip44_local_encrypt
_KIND_NIP78 = 30078
_D_TAG_CONFIG_PREFIX = "bitspire-cassettes:" # operator → ATM
@ -91,7 +97,19 @@ class RelayUnavailable(CassetteTransportError):
class CassetteEventDecodeError(CassetteTransportError):
"""Inbound state event failed validation: bad signature, NIP-44 v2
decrypt failure, or payload didn't conform to PublishCassettesPayload."""
decrypt failure, or payload didn't conform to PublishCassettesPayload.
Terminal caller should log + skip, advancing past the event."""
class CassetteEventTransientError(CassetteTransportError):
"""Inbound state event couldn't be decrypted because the signer
component (typically the bunker) is transiently unavailable. Caller
should NOT advance past the event; retry on next tick.
Distinct from CassetteEventDecodeError so the consumer task can
differentiate "MAC failed, give up" from "bunker is partitioned, try
again in a few seconds" — surfaced by lnbits at coord-log
2026-05-31T07:10Z as the load-bearing distinction post-PR-#38."""
# =============================================================================
@ -129,28 +147,19 @@ def build_state_d_tags_for_machines(machines: list[Machine]) -> list[str]:
# =============================================================================
async def _sign_as_operator(
operator_user_id: str, event: dict
) -> Optional[dict]:
"""Sign `event` using the operator's stored Nostr identity.
async def _resolve_operator_signer(operator_user_id: str):
"""Fetch the operator's account + resolve to a NostrSigner.
Mutates `event` to add `created_at` (now), `pubkey`, `id`, and `sig`.
Returns the signed event, or raises a typed CassetteTransportError
on a hard failure the caller should surface to the operator.
Single source of truth for "give me the signer for this operator,
or raise an operator-facing error if we can't." Returns
`(account, signer)` so callers that need both (publish path needs
`account.pubkey` for the event author and the signer for both
encrypt + sign) don't double-fetch.
Routing: post-`aiolabs/lnbits#17` (signer abstraction) we go through
`lnbits.core.signers.resolve_signer`, which transparently handles
LocalSigner (envelope-encrypted nsec at rest, decrypted on demand)
and ClientSideOnlySigner (raises SignerUnavailableError). On pre-#17
lnbits versions the import fails and we fall back to a direct
`account.prvkey` read. Both paths produce identical signed events.
Pattern preserved from the removed nostr_publish.py at commit
e13178d / 131ff92 recovered here for the cassette transport.
Unlike the prior fleet-publish path (which soft-failed on missing
operator identity since the publish was a CRUD side-effect), the
cassette publish is operator-initiated so missing identity is a hard
error surfaced as HTTP 400 by the API caller.
Raises:
- OperatorIdentityMissing no account, or no pubkey on file
- SignerUnavailable signer resolve failed, or signer can't sign
server-side (ClientSideOnly)
"""
account = await get_account(operator_user_id)
if account is None or not account.pubkey:
@ -159,30 +168,6 @@ async def _sign_as_operator(
"Onboard via the LNbits Nostr-login flow to publish cassette "
"config to your ATMs."
)
# created_at is part of the BIP-340 event-id hash; must be set before
# signing so both code paths below see the same value.
event["created_at"] = int(time.time())
try:
from lnbits.core.signers import ( # type: ignore[import-not-found]
SignerError,
SignerUnavailableError,
resolve_signer,
)
except ImportError:
# Pre-#17 lnbits — direct prvkey read. Removed once the #17
# cascade lands on every host that runs this extension.
if not account.prvkey:
raise OperatorIdentityMissing(
f"operator {operator_user_id[:8]}... has no signing key "
"on file (pre-lnbits#17 path). Onboard via Nostr-login or "
"wait for aiolabs/lnbits#18 bunker integration."
)
private_key = coincurve.PrivateKey(bytes.fromhex(account.prvkey))
return sign_event(event, account.pubkey, private_key)
# Post-#17 lnbits — route through the signer abstraction.
try:
signer = resolve_signer(account)
except SignerError as exc:
@ -190,16 +175,32 @@ async def _sign_as_operator(
f"signer resolve failed for operator {operator_user_id[:8]}...: "
f"{exc}"
) from exc
if not signer.can_sign():
raise SignerUnavailable(
f"operator {operator_user_id[:8]}... has a client-side-only "
"signer; server can't publish on their behalf. Wait for bunker "
"integration (lnbits#18) or operator-driven publishing."
"signer; server can't sign or NIP-44-encrypt on their behalf. "
"Operator must hold their nsec via a NIP-46 bunker (lnbits#18) "
"or migrate to a server-signing account."
)
return account, signer
async def _sign_as_operator(
operator_user_id: str, event: dict
) -> Optional[dict]:
"""Sign `event` using the operator's signer (LocalSigner or
RemoteBunkerSigner). Mutates `event` to add `created_at` (now),
`pubkey`, `id`, and `sig`.
Raises typed CassetteTransportError subclasses on hard failure
(the publish endpoint maps these to HTTP statuses); never returns
None on the publish path.
"""
_account, signer = await _resolve_operator_signer(operator_user_id)
# created_at is part of the BIP-340 event-id hash; set before signing.
event["created_at"] = int(time.time())
try:
return signer.sign_event(event)
return await signer.sign_event(event)
except SignerUnavailableError as exc:
raise SignerUnavailable(
f"signer unavailable for operator {operator_user_id[:8]}...: "
@ -207,27 +208,57 @@ async def _sign_as_operator(
) from exc
async def _get_operator_privkey_hex(operator_user_id: str) -> str:
"""Fetch the operator's signing key hex for NIP-44 v2 encryption.
async def _nip44_encrypt_via_signer(
account, signer: NostrSigner, plaintext: str, peer_pubkey_hex: str
) -> str:
"""NIP-44 v2 encrypt via the signer abstraction, with a transitional
fallback to direct-prvkey for LocalSigner accounts.
NIP-44 v2 ECDH needs the raw private scalar, which the signer
abstraction's high-level `sign_event` doesn't expose. For v1 we
read `account.prvkey` directly same surface that the pre-#17
fallback in `_sign_as_operator` uses. Post-bunker (lnbits#18)
this becomes a NIP-44-over-bunker call routed through the bunker
client (the operator's nsec never leaves the bunker process), but
that path is v2 follow-up.
The bunker (RemoteBunkerSigner) implements `nip44_encrypt` natively
the operator's nsec never leaves the bunker process. LocalSigner's
`nip44_encrypt` stub explicitly raises SignerUnavailableError
("LocalSigner does not implement nip44_encrypt") per the
post-PR-#38 ABC — the spec is "migrate to bunker." For the
transitional window where some operators are still on LocalSigner
+ their `account.prvkey` is intact, we catch that signal and use
our hand-rolled NIP-44 v2 impl against the stored prvkey. Same
wire output either way.
Raises OperatorIdentityMissing on missing keys.
Removed once every operator account on this instance is bunker-
backed (S7 fully landed). At that point this helper collapses to
`return await signer.nip44_encrypt(plaintext, peer_pubkey_hex)`.
"""
account = await get_account(operator_user_id)
if account is None or not account.prvkey:
raise OperatorIdentityMissing(
f"operator {operator_user_id[:8]}... has no signing key on "
"file; can't NIP-44 v2 encrypt the cassette payload to the "
"ATM. Onboard via the LNbits Nostr-login flow."
)
return account.prvkey
try:
return await signer.nip44_encrypt(plaintext, peer_pubkey_hex)
except SignerUnavailableError:
if (
account.signer_type == "LocalSigner"
and account.prvkey
):
return _nip44_local_encrypt(
plaintext, account.prvkey, peer_pubkey_hex
)
# ClientSideOnly, or RemoteBunkerSigner with bunker comms failure
# at config time — re-raise without wrapping; caller maps it.
raise
async def _nip44_decrypt_via_signer(
account, signer: NostrSigner, ciphertext: str, peer_pubkey_hex: str
) -> str:
"""Decrypt mirror of `_nip44_encrypt_via_signer`. Same LocalSigner
transitional fallback."""
try:
return await signer.nip44_decrypt(ciphertext, peer_pubkey_hex)
except SignerUnavailableError:
if (
account.signer_type == "LocalSigner"
and account.prvkey
):
return _nip44_local_decrypt(
ciphertext, account.prvkey, peer_pubkey_hex
)
raise
# =============================================================================
@ -269,18 +300,39 @@ async def publish_to_atm(
Returns the signed event dict on success (caller may log event.id for
audit). Raises CassetteTransportError subclasses on hard failures:
- OperatorIdentityMissing 400: operator hasn't onboarded
- SignerUnavailable 503: signer offline / client-side-only
- SignerUnavailable 503: signer offline / client-side-only / bunker
timeout at the encrypt or sign step
- RelayUnavailable 503: nostrclient not installed
- CassetteTransportError 500: anything else
"""
atm_pubkey_hex = _atm_hex_pubkey(machine)
# Build the NIP-44 v2 encrypted content using the operator's privkey
# as sender and the ATM pubkey as recipient.
operator_privkey_hex = await _get_operator_privkey_hex(operator_user_id)
# Single fetch + resolve — same signer is used for both encrypt and sign.
account, signer = await _resolve_operator_signer(operator_user_id)
# NIP-44 v2 encrypt the wire payload. Bunker round-trip on
# RemoteBunkerSigner; direct prvkey on LocalSigner (transitional).
plaintext = json.dumps(payload.to_wire_dict(), separators=(",", ":"))
conversation_key = get_conversation_key(operator_privkey_hex, atm_pubkey_hex)
content = encrypt_with_conversation_key(plaintext, conversation_key)
try:
content = await _nip44_encrypt_via_signer(
account, signer, plaintext, atm_pubkey_hex
)
except NsecBunkerTimeoutError as exc:
raise SignerUnavailable(
f"bunker unreachable while encrypting cassette config for "
f"operator {operator_user_id[:8]}...: {exc}"
) from exc
except NsecBunkerRpcError as exc:
raise SignerUnavailable(
f"bunker rejected nip44_encrypt for operator "
f"{operator_user_id[:8]}... (policy / MAC / config issue): "
f"{exc}"
) from exc
except SignerUnavailableError as exc:
raise SignerUnavailable(
f"signer cannot nip44-encrypt for operator "
f"{operator_user_id[:8]}...: {exc}"
) from exc
event: dict = {
"kind": _KIND_NIP78,
@ -289,11 +341,9 @@ async def publish_to_atm(
["p", atm_pubkey_hex],
],
"content": content,
# created_at is set inside _sign_as_operator before signing.
}
signed = await _sign_as_operator(operator_user_id, event)
# _sign_as_operator raises on hard failure; a None return would mean
# an unexpected soft-path slipped through — treat as hard error here.
if signed is None:
raise CassetteTransportError(
"sign_as_operator returned None unexpectedly — soft-fail path "
@ -314,24 +364,32 @@ async def publish_to_atm(
# =============================================================================
def decrypt_and_parse_state_event(
event: dict, operator_privkey_hex: str
async def decrypt_and_parse_state_event(
event: dict, account, signer: NostrSigner
) -> PublishCassettesPayload:
"""Decrypt + parse an inbound `bitspire-cassettes-state:<atm_pubkey_hex>`
event the ATM published toward the operator. Caller is responsible
for:
event the ATM published toward the operator.
Caller is responsible for:
- filtering on `kind=30078` and the expected `#d` tag list
- verifying the event signature (lnbits.utils.nostr.verify_event)
- confirming `event["pubkey"]` matches a known ATM in the operator's
machines table (the d-tag suffix == event pubkey == machine.machine_npub
canonicalised)
- confirming `event["pubkey"]` matches a known ATM (= machine.machine_npub
canonicalised) the consumer task does this before calling here
- resolving the operator's account + signer via
`_resolve_operator_signer(...)` and passing them in
This function does:
- NIP-44 v2 decrypt of event["content"] using the sender's pubkey
from event["pubkey"] and the operator's privkey
- NIP-44 v2 decrypt of event["content"] via `signer.nip44_decrypt`
(bunker round-trip on RemoteBunkerSigner; direct prvkey on the
transitional LocalSigner path)
- JSON parse + PublishCassettesPayload validation
Raises CassetteEventDecodeError on any decode/validate failure.
Error mapping:
- CassetteEventTransientError on NsecBunkerTimeoutError caller
should NOT advance state_event_id; retry on next consumer tick
- CassetteEventDecodeError on anything else (bunker RPC reject,
signer unavailable, MAC failure, JSON parse, payload shape)
terminal; caller logs + skips
"""
sender_pubkey = event.get("pubkey")
content = event.get("content")
@ -341,16 +399,31 @@ def decrypt_and_parse_state_event(
)
try:
conversation_key = get_conversation_key(
operator_privkey_hex, sender_pubkey
plaintext = await _nip44_decrypt_via_signer(
account, signer, content, sender_pubkey
)
plaintext = decrypt_with_conversation_key(content, conversation_key)
except Nip44Error as exc:
except NsecBunkerTimeoutError as exc:
raise CassetteEventTransientError(
f"bunker unreachable while decrypting cassette state event: {exc}"
) from exc
except NsecBunkerRpcError as exc:
raise CassetteEventDecodeError(
f"NIP-44 v2 decrypt failed: {exc}"
f"bunker rejected nip44_decrypt (policy / MAC / config): {exc}"
) from exc
except SignerUnavailableError as exc:
raise CassetteEventDecodeError(
f"signer cannot nip44-decrypt: {exc}"
) from exc
except Nip44Error as exc:
# Hand-rolled LocalSigner fallback path (transitional) — MAC fail
# / version mismatch / length issue.
raise CassetteEventDecodeError(
f"NIP-44 v2 decrypt failed (LocalSigner fallback path): {exc}"
) from exc
except ValueError as exc:
# coincurve raises ValueError on a malformed pubkey hex.
# coincurve raises ValueError on a malformed pubkey hex (only
# reachable via the LocalSigner fallback path; the bunker handles
# pubkey validation server-side).
raise CassetteEventDecodeError(
f"sender pubkey is malformed: {exc}"
) from exc