refactor(v2): cassette transport — signer.nip44_* migration (#29 v1.1 / closes #21 partial)
Some checks failed
ci.yml / refactor(v2): cassette transport — signer.nip44_* migration (#29 v1.1 / closes #21 partial) (pull_request) Failing after 0s

Migrates the cassette transport's encrypt/decrypt paths off direct
`account.prvkey` reads to `signer.nip44_encrypt` / `signer.nip44_decrypt`
on the NostrSigner ABC landed by aiolabs/lnbits PR #38 (phase 2.4). Closes
the operator-side regression flagged at coord-log 2026-05-31T06:50Z:
Greg's RemoteBunkerSigner-migrated account had `accounts.prvkey IS NULL`
post-bunker, which the old code couldn't handle — consumer was logging
WARN every poll cycle and skipping every inbound state event.

## What changed

### cassette_transport.py

- New imports: `resolve_signer`, `SignerError`, `SignerUnavailableError`,
  `NsecBunkerTimeoutError`, `NsecBunkerRpcError` from the post-#38 lnbits
  surface. (The `try: from lnbits.core.signers import SignerError` block
  in the old code was permanently failing because `SignerError` actually
  lives in `lnbits.core.signers.base`, not the package root — fixed.)
- New `_resolve_operator_signer(operator_user_id)`: single source of
  truth for "give me the operator's account + NostrSigner, or raise an
  operator-facing error." Used by both the publish path and the consumer
  task.
- New `_nip44_encrypt_via_signer(account, signer, plaintext, peer)`
  and `_nip44_decrypt_via_signer(...)`: route through `signer.nip44_*`
  first; on `SignerUnavailableError` from a LocalSigner stub (the
  post-#38 ABC has LocalSigner raise on nip44_* explicitly — bunker
  migration required for NIP-44 v2), fall back to the hand-rolled impl
  against `account.prvkey`. Transitional until every operator on the
  instance is bunker-backed (S7).
- `_sign_as_operator` simplified: now `await signer.sign_event(event)`
  (the ABC is async; the old code passed `signer.sign_event` to the
  caller without await, returning a coroutine — also broken but never
  hit because the ImportError fallback fired first).
- `publish_to_atm` flow: `_resolve_operator_signer` → `_nip44_encrypt_
  via_signer` → `_sign_as_operator` → publish. Each step maps bunker /
  signer errors to `OperatorIdentityMissing` (400) / `SignerUnavailable`
  (503) / `CassetteTransportError` (500) for the API handler.
- `decrypt_and_parse_state_event` now `async` and takes `(event, account,
  signer)` instead of `(event, operator_privkey_hex)`. Maps
  `NsecBunkerTimeoutError` → `CassetteEventTransientError` (caller
  should retry on next poll, NOT advance `state_event_id`).
  `NsecBunkerRpcError` / `SignerUnavailableError` / `Nip44Error` / etc.
  → `CassetteEventDecodeError` (terminal — caller logs + skips).
- New `CassetteEventTransientError` class for the bunker-timeout case.
  Distinct from `CassetteEventDecodeError` so the consumer can log at
  INFO + retry vs WARNING + advance.
- Deleted `_get_operator_privkey_hex` (no longer needed).

### tasks.py — _handle_cassette_state_event

- Resolves the signer via `_resolve_operator_signer(machine.operator_
  user_id)`. On `CassetteTransportError` (OperatorIdentityMissing /
  SignerUnavailable), logs + skips.
- Awaits `decrypt_and_parse_state_event(event_obj, account, signer)`.
  On `CassetteEventTransientError`, logs at INFO + returns (state_event_
  id NOT advanced → consumer retries on next poll cycle).
  On `CassetteEventDecodeError`, logs at WARNING + returns (still
  state_event_id NOT advanced for v1; the WARN log surfaces the
  underlying issue for operator triage).

### tests/test_cassette_state_consumer.py — rewritten

- Three test doubles: `_FakeBunkerSigner` (working nip44_decrypt via
  hand-rolled impl), `_FakeLocalSignerStub` (raises like the post-#38
  LocalSigner stub), `_FakeRaisingSigner` (configurable exception).
- `_fake_account` helper using SimpleNamespace — the code under test
  only reads `.signer_type` + `.prvkey`.
- Five test classes covering: bunker-signer happy path (incl. multi-
  same-denom round-trip), LocalSigner transitional fallback,
  bunker-error mapping (timeout → transient, rpc reject → decode),
  payload validation (tamper / wrong-key / missing-fields / garbage
  JSON / wrong shape), d-tag construction (unchanged, kept as
  regression guard).
- Async coroutines driven via `asyncio.run` — matches the existing
  project pattern (no pytest-asyncio plugin in CI; see test_init.py
  failure mode).

### nip44.py — docstring update

Added a "Runtime status (post lnbits PR #38, 2026-05-31)" section
documenting that runtime usage moved to `signer.nip44_*` and this
module's role narrowed to (a) the LocalSigner transitional fallback
called from `cassette_transport`, and (b) test-only fixtures in
test_nip44_v2.py for spec-vector + bitspire cross-test validation.
"Don't add new runtime call sites here. The signer abstraction is
the path."

## Verification

- 155 passed, 1 pre-existing async-plugin failure unchanged. The 19
  consumer tests cover bunker happy path + LocalSigner fallback +
  bunker error mapping + payload validation + d-tag construction.
- Live smoke against Greg's RemoteBunkerSigner-migrated account
  on the regtest container: consumer correctly resolves the bunker
  signer, fires `NIP-46 rpc -> method=nip44_decrypt`, catches the
  resulting `NsecBunkerTimeoutError` (the local nsecbunkerd is not
  responding within 15s — separate operational concern), maps to
  `CassetteEventTransientError`, logs at INFO with "will retry next
  poll", and crucially does NOT advance `state_event_id` on the
  cassette_configs rows. Retry semantics preserved.

## Outstanding

- The bunker timeout itself is an operational issue (nsecbunkerd
  config / policy / process state for kind-less nip44_decrypt RPC) —
  not a satmachineadmin code concern; surface to the nsecbunkerd /
  lnbits sessions if it persists.
- Once every operator on the instance is on RemoteBunkerSigner (S7
  fully landed), the `_nip44_*_via_signer` helpers collapse to a
  direct `await signer.nip44_*` call, the LocalSigner fallback can
  be deleted, and `nip44.py`'s runtime exports retire (test-only).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Padreug 2026-05-31 09:21:43 +02:00
commit dcb7de0c27
4 changed files with 573 additions and 199 deletions

View file

@ -386,18 +386,37 @@ async def _handle_cassette_state_event(
get_machine_by_atm_pubkey_hex,
apply_bootstrap_state,
) -> None:
"""Verify signature, route to the right operator's privkey, decrypt,
parse, upsert. Each step that fails is logged at WARNING (not ERROR)
so a noisy attacker can't fill the logs — this is data on a public
relay, garbage is expected."""
"""Verify signature, resolve the operator's signer, decrypt via the
signer abstraction (bunker round-trip for RemoteBunkerSigner; direct
prvkey on the LocalSigner transitional fallback inside the transport
helper), parse, upsert.
Each step logs at WARNING (not ERROR) so a noisy attacker can't fill
the logs this is data on a public relay, garbage is expected.
Two skip outcomes:
- Terminal (CassetteEventDecodeError / SignerUnavailable /
OperatorIdentityMissing / etc.): log + return. `apply_bootstrap_
state` is never called `state_event_id` is not advanced
same event would re-process on next poll cycle but the consumer's
WARN log surfaces the underlying issue immediately.
- Transient (CassetteEventTransientError): log at INFO (less noisy)
+ return. Same retry-via-no-advance semantics, just less
alarming in the operator log feed.
"""
import json as _json
from datetime import datetime as _datetime
from datetime import timezone as _timezone
from lnbits.core.crud.users import get_account
from lnbits.utils.nostr import verify_event
from .cassette_transport import decrypt_and_parse_state_event
from .cassette_transport import (
CassetteEventDecodeError,
CassetteEventTransientError,
CassetteTransportError,
_resolve_operator_signer,
decrypt_and_parse_state_event,
)
event_raw = event_message.event
if isinstance(event_raw, str):
@ -430,16 +449,36 @@ async def _handle_cassette_state_event(
)
return
account = await get_account(machine.operator_user_id)
if account is None or not account.prvkey:
try:
account, signer = await _resolve_operator_signer(
machine.operator_user_id
)
except CassetteTransportError as exc:
# OperatorIdentityMissing / SignerUnavailable — log + skip.
logger.warning(
f"satmachineadmin: operator {machine.operator_user_id[:8]}... "
"has no privkey on file; can't decrypt cassette state event for "
f"machine {machine.id}. Onboard via Nostr-login."
f"satmachineadmin: can't resolve signer for operator "
f"{machine.operator_user_id[:8]}... (machine {machine.id}): "
f"{exc}"
)
return
payload = decrypt_and_parse_state_event(event_obj, account.prvkey)
try:
payload = await decrypt_and_parse_state_event(
event_obj, account, signer
)
except CassetteEventTransientError as exc:
logger.info(
f"satmachineadmin: cassette state event for machine {machine.id} "
f"hit a transient signer error (will retry next poll): {exc}"
)
return
except CassetteEventDecodeError as exc:
logger.warning(
f"satmachineadmin: cassette state event decode failed for "
f"machine {machine.id} (id={event_obj.get('id', '?')[:12]}...): "
f"{exc}"
)
return
event_id = event_obj.get("id", "")
created_at_unix = event_obj.get("created_at", 0)