feat(v2): abandoned-tx queue + force-reset for stuck settlements (P3f)

Completes the P3 operator-UX cluster. Surfaces settlements that didn't
process cleanly as a queryable worklist so operators can investigate +
retry without scanning the full settlement history.

New endpoints:

  GET    /api/v1/dca/settlements/stuck?threshold_minutes=30
    Returns StuckSettlementsResponse with three buckets:
      - errored: distribution failed; existing /retry endpoint handles
      - stuck_pending: landed but never picked up (listener crashed
        before invoking process_settlement)
      - stuck_processing: claim taken but no completion in N minutes;
        processor crashed mid-flight, processing_claim is set but no
        terminal state landed

  POST   /api/v1/dca/settlements/{id}/force-reset
    Operator escape hatch for genuinely stuck settlements. Flips
    'pending'/'processing' → 'errored' so the /retry endpoint can take
    over. Refuses unless the settlement is older than threshold_minutes
    (default 30) so operators can't accidentally interrupt a
    slow-but-running settlement. Age check uses created_at as proxy.

CRUD:
- get_stuck_settlements_for_operator(uid, threshold_minutes) joins
  dca_settlements → dca_machines and returns the three lists
  scoped per operator. No age filter on 'errored' (operators always
  want to see those); age filter applies to 'pending'/'processing'.
- force_reset_stuck_settlement(id) UPDATEs 'pending'/'processing' to
  'errored', clears processing_claim, sets a marker error_message.

The retry endpoint shipped in fix bundle 1 (commit 3ede66f) is the
intended downstream — operator sees stuck-processing row, hits force-
reset (flips to errored), then hits retry (flips to pending, voids
failed legs, re-runs process_settlement via the claim path).

34 routes registered. 72/72 tests pass.

Refs: aiolabs/satmachineadmin#9 — completes P3 operator-UX cluster

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Padreug 2026-05-14 17:43:20 +02:00
commit 578f2c142d
3 changed files with 191 additions and 0 deletions

View file

@ -407,6 +407,28 @@ class PartialDispenseData(BaseModel):
return v
class StuckSettlementsResponse(BaseModel):
"""Operator worklist surfacing settlements that didn't process cleanly.
Three categories, segregated so the UI can render them with appropriate
affordances (retry / investigate / force-error):
- errored: distribution failed; one or more legs reported a payment
error. Operator retry endpoint handles these directly.
- stuck_pending: landed but never picked up by the processor (listener
crashed before invoking process_settlement, or the claim was lost).
Older than `threshold_minutes`.
- stuck_processing: claim was taken but no completion in
`threshold_minutes`. The processor likely crashed mid-flight.
Operator can force-recover via POST .../force-reset.
"""
threshold_minutes: int
errored: list # list[DcaSettlement]
stuck_pending: list
stuck_processing: list
class AppendSettlementNoteData(BaseModel):
"""Operator-authored free-form note on a settlement.