refactor(v2): cassette transport — signer.nip44_* migration (#29 v1.1 / closes #21 partial)
Some checks failed
ci.yml / refactor(v2): cassette transport — signer.nip44_* migration (#29 v1.1 / closes #21 partial) (pull_request) Failing after 0s
Some checks failed
ci.yml / refactor(v2): cassette transport — signer.nip44_* migration (#29 v1.1 / closes #21 partial) (pull_request) Failing after 0s
Migrates the cassette transport's encrypt/decrypt paths off direct `account.prvkey` reads to `signer.nip44_encrypt` / `signer.nip44_decrypt` on the NostrSigner ABC landed by aiolabs/lnbits PR #38 (phase 2.4). Closes the operator-side regression flagged at coord-log 2026-05-31T06:50Z: Greg's RemoteBunkerSigner-migrated account had `accounts.prvkey IS NULL` post-bunker, which the old code couldn't handle — consumer was logging WARN every poll cycle and skipping every inbound state event. ## What changed ### cassette_transport.py - New imports: `resolve_signer`, `SignerError`, `SignerUnavailableError`, `NsecBunkerTimeoutError`, `NsecBunkerRpcError` from the post-#38 lnbits surface. (The `try: from lnbits.core.signers import SignerError` block in the old code was permanently failing because `SignerError` actually lives in `lnbits.core.signers.base`, not the package root — fixed.) - New `_resolve_operator_signer(operator_user_id)`: single source of truth for "give me the operator's account + NostrSigner, or raise an operator-facing error." Used by both the publish path and the consumer task. - New `_nip44_encrypt_via_signer(account, signer, plaintext, peer)` and `_nip44_decrypt_via_signer(...)`: route through `signer.nip44_*` first; on `SignerUnavailableError` from a LocalSigner stub (the post-#38 ABC has LocalSigner raise on nip44_* explicitly — bunker migration required for NIP-44 v2), fall back to the hand-rolled impl against `account.prvkey`. Transitional until every operator on the instance is bunker-backed (S7). - `_sign_as_operator` simplified: now `await signer.sign_event(event)` (the ABC is async; the old code passed `signer.sign_event` to the caller without await, returning a coroutine — also broken but never hit because the ImportError fallback fired first). - `publish_to_atm` flow: `_resolve_operator_signer` → `_nip44_encrypt_ via_signer` → `_sign_as_operator` → publish. Each step maps bunker / signer errors to `OperatorIdentityMissing` (400) / `SignerUnavailable` (503) / `CassetteTransportError` (500) for the API handler. - `decrypt_and_parse_state_event` now `async` and takes `(event, account, signer)` instead of `(event, operator_privkey_hex)`. Maps `NsecBunkerTimeoutError` → `CassetteEventTransientError` (caller should retry on next poll, NOT advance `state_event_id`). `NsecBunkerRpcError` / `SignerUnavailableError` / `Nip44Error` / etc. → `CassetteEventDecodeError` (terminal — caller logs + skips). - New `CassetteEventTransientError` class for the bunker-timeout case. Distinct from `CassetteEventDecodeError` so the consumer can log at INFO + retry vs WARNING + advance. - Deleted `_get_operator_privkey_hex` (no longer needed). ### tasks.py — _handle_cassette_state_event - Resolves the signer via `_resolve_operator_signer(machine.operator_ user_id)`. On `CassetteTransportError` (OperatorIdentityMissing / SignerUnavailable), logs + skips. - Awaits `decrypt_and_parse_state_event(event_obj, account, signer)`. On `CassetteEventTransientError`, logs at INFO + returns (state_event_ id NOT advanced → consumer retries on next poll cycle). On `CassetteEventDecodeError`, logs at WARNING + returns (still state_event_id NOT advanced for v1; the WARN log surfaces the underlying issue for operator triage). ### tests/test_cassette_state_consumer.py — rewritten - Three test doubles: `_FakeBunkerSigner` (working nip44_decrypt via hand-rolled impl), `_FakeLocalSignerStub` (raises like the post-#38 LocalSigner stub), `_FakeRaisingSigner` (configurable exception). - `_fake_account` helper using SimpleNamespace — the code under test only reads `.signer_type` + `.prvkey`. - Five test classes covering: bunker-signer happy path (incl. multi- same-denom round-trip), LocalSigner transitional fallback, bunker-error mapping (timeout → transient, rpc reject → decode), payload validation (tamper / wrong-key / missing-fields / garbage JSON / wrong shape), d-tag construction (unchanged, kept as regression guard). - Async coroutines driven via `asyncio.run` — matches the existing project pattern (no pytest-asyncio plugin in CI; see test_init.py failure mode). ### nip44.py — docstring update Added a "Runtime status (post lnbits PR #38, 2026-05-31)" section documenting that runtime usage moved to `signer.nip44_*` and this module's role narrowed to (a) the LocalSigner transitional fallback called from `cassette_transport`, and (b) test-only fixtures in test_nip44_v2.py for spec-vector + bitspire cross-test validation. "Don't add new runtime call sites here. The signer abstraction is the path." ## Verification - 155 passed, 1 pre-existing async-plugin failure unchanged. The 19 consumer tests cover bunker happy path + LocalSigner fallback + bunker error mapping + payload validation + d-tag construction. - Live smoke against Greg's RemoteBunkerSigner-migrated account on the regtest container: consumer correctly resolves the bunker signer, fires `NIP-46 rpc -> method=nip44_decrypt`, catches the resulting `NsecBunkerTimeoutError` (the local nsecbunkerd is not responding within 15s — separate operational concern), maps to `CassetteEventTransientError`, logs at INFO with "will retry next poll", and crucially does NOT advance `state_event_id` on the cassette_configs rows. Retry semantics preserved. ## Outstanding - The bunker timeout itself is an operational issue (nsecbunkerd config / policy / process state for kind-less nip44_decrypt RPC) — not a satmachineadmin code concern; surface to the nsecbunkerd / lnbits sessions if it persists. - Once every operator on the instance is on RemoteBunkerSigner (S7 fully landed), the `_nip44_*_via_signer` helpers collapse to a direct `await signer.nip44_*` call, the LocalSigner fallback can be deleted, and `nip44.py`'s runtime exports retire (test-only). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
4b128ca53c
commit
dcb7de0c27
4 changed files with 573 additions and 199 deletions
63
tasks.py
63
tasks.py
|
|
@ -386,18 +386,37 @@ async def _handle_cassette_state_event(
|
|||
get_machine_by_atm_pubkey_hex,
|
||||
apply_bootstrap_state,
|
||||
) -> None:
|
||||
"""Verify signature, route to the right operator's privkey, decrypt,
|
||||
parse, upsert. Each step that fails is logged at WARNING (not ERROR)
|
||||
so a noisy attacker can't fill the logs — this is data on a public
|
||||
relay, garbage is expected."""
|
||||
"""Verify signature, resolve the operator's signer, decrypt via the
|
||||
signer abstraction (bunker round-trip for RemoteBunkerSigner; direct
|
||||
prvkey on the LocalSigner transitional fallback inside the transport
|
||||
helper), parse, upsert.
|
||||
|
||||
Each step logs at WARNING (not ERROR) so a noisy attacker can't fill
|
||||
the logs — this is data on a public relay, garbage is expected.
|
||||
|
||||
Two skip outcomes:
|
||||
- Terminal (CassetteEventDecodeError / SignerUnavailable /
|
||||
OperatorIdentityMissing / etc.): log + return. `apply_bootstrap_
|
||||
state` is never called → `state_event_id` is not advanced →
|
||||
same event would re-process on next poll cycle but the consumer's
|
||||
WARN log surfaces the underlying issue immediately.
|
||||
- Transient (CassetteEventTransientError): log at INFO (less noisy)
|
||||
+ return. Same retry-via-no-advance semantics, just less
|
||||
alarming in the operator log feed.
|
||||
"""
|
||||
import json as _json
|
||||
from datetime import datetime as _datetime
|
||||
from datetime import timezone as _timezone
|
||||
|
||||
from lnbits.core.crud.users import get_account
|
||||
from lnbits.utils.nostr import verify_event
|
||||
|
||||
from .cassette_transport import decrypt_and_parse_state_event
|
||||
from .cassette_transport import (
|
||||
CassetteEventDecodeError,
|
||||
CassetteEventTransientError,
|
||||
CassetteTransportError,
|
||||
_resolve_operator_signer,
|
||||
decrypt_and_parse_state_event,
|
||||
)
|
||||
|
||||
event_raw = event_message.event
|
||||
if isinstance(event_raw, str):
|
||||
|
|
@ -430,16 +449,36 @@ async def _handle_cassette_state_event(
|
|||
)
|
||||
return
|
||||
|
||||
account = await get_account(machine.operator_user_id)
|
||||
if account is None or not account.prvkey:
|
||||
try:
|
||||
account, signer = await _resolve_operator_signer(
|
||||
machine.operator_user_id
|
||||
)
|
||||
except CassetteTransportError as exc:
|
||||
# OperatorIdentityMissing / SignerUnavailable — log + skip.
|
||||
logger.warning(
|
||||
f"satmachineadmin: operator {machine.operator_user_id[:8]}... "
|
||||
"has no privkey on file; can't decrypt cassette state event for "
|
||||
f"machine {machine.id}. Onboard via Nostr-login."
|
||||
f"satmachineadmin: can't resolve signer for operator "
|
||||
f"{machine.operator_user_id[:8]}... (machine {machine.id}): "
|
||||
f"{exc}"
|
||||
)
|
||||
return
|
||||
|
||||
payload = decrypt_and_parse_state_event(event_obj, account.prvkey)
|
||||
try:
|
||||
payload = await decrypt_and_parse_state_event(
|
||||
event_obj, account, signer
|
||||
)
|
||||
except CassetteEventTransientError as exc:
|
||||
logger.info(
|
||||
f"satmachineadmin: cassette state event for machine {machine.id} "
|
||||
f"hit a transient signer error (will retry next poll): {exc}"
|
||||
)
|
||||
return
|
||||
except CassetteEventDecodeError as exc:
|
||||
logger.warning(
|
||||
f"satmachineadmin: cassette state event decode failed for "
|
||||
f"machine {machine.id} (id={event_obj.get('id', '?')[:12]}...): "
|
||||
f"{exc}"
|
||||
)
|
||||
return
|
||||
|
||||
event_id = event_obj.get("id", "")
|
||||
created_at_unix = event_obj.get("created_at", 0)
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue