feat(v2): hand-rolled NIP-44 v2 crypto + reference-vector tests (#29 v1)

LNbits ships only NIP-04 (AES-CBC) in lnbits.utils.nostr.encrypt_content,
but the locked design at #29 (paired with lamassu-next#56) wires kind-30078
cassette config with NIP-44 v2 content per the privacy-by-default
architecture (dcd0874). Hand-rolling rather than adding a Python lib dep
per the plan-approval (option A) — keeps the impl auditable inline and
avoids pulling in a non-trivial dep tree.

nip44.py covers the full envelope:
  - get_conversation_key — ECDH x-coord + HKDF-extract with salt b"nip44-v2"
  - encrypt_with_conversation_key / decrypt_with_conversation_key — low-level,
    nonce-controllable for testing pinned vectors
  - encrypt_for / decrypt_from — high-level pair-keyed API (the shape app
    code reaches for)
  - _pad / _unpad — NIP-44 v2 length-prefixed padding scheme
  - HMAC-SHA256 verification on nonce || ciphertext, constant-time compare
    via hmac.compare_digest
  - Typed errors (Nip44VersionError / Nip44MacError / Nip44LengthError)
    so callers can distinguish tamper from corruption from spec mismatch

Stack: coincurve for ECDH (already a transitive lnbits dep), cryptography
for ChaCha20 + HKDF-expand (also already there). No new pyproject deps.

34 tests in tests/test_nip44_v2.py, three layers:
  1. Pinned reference vector — conversation_key for (sec=1, sec=2) matches
     the canonical paulmillr/nip44 published value
     (c41c775356fd92eadc63ff5a0dc1da211b268cbea22316767095b2871ea1412d).
     Regression-fails loudly if key derivation drifts.
  2. Round-trip + tamper detection — encrypt/decrypt across plaintext
     lengths (1, 32, 33, 1000, 5000, 65535 bytes); flipped MAC byte;
     flipped ciphertext byte; flipped nonce byte; wrong recipient privkey;
     version-byte rejection; padding-formula spot checks.
  3. Cross-impl byte-compat — placeholder test_decrypts_bitspire_sample
     marked @pytest.mark.skip, pending bitspire posting a sample event
     encrypted on their nostr-tools side to the coord log (per the
     2026-05-30T15:55Z entry). Wire that fixture and unskip when posted.

Total: 132 passed, 1 skipped (cross-test fixture pending), 1 pre-existing
async-plugin failure unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Padreug 2026-05-30 18:10:30 +02:00
commit da07bae554
2 changed files with 543 additions and 0 deletions

272
tests/test_nip44_v2.py Normal file
View file

@ -0,0 +1,272 @@
"""
Tests for the hand-rolled NIP-44 v2 implementation in `nip44.py`.
Three layers of validation, ordered by trust:
1. Pinned reference vector from the canonical paulmillr/nip44 test suite
the conversation_key for (sec=1, sec=2) is widely-published as
c41c775356fd92eadc63ff5a0dc1da211b268cbea22316767095b2871ea1412d. If
our get_conversation_key() ever drifts from that value, the impl is
broken at the key-derivation layer.
2. Round-trip + tamper detection verifies the encrypt/decrypt loop
under random nonces, catches HMAC + version + padding tampering.
3. Cross-test (TBD) bitspire will post one sample event encrypted on
their nostr-tools side to the coord log; test_decrypts_bitspire_sample
wires it as a fixture and asserts byte-compatibility with the
nostr-tools NIP-44 v2 impl. Placeholder stub until the sample lands.
"""
import base64
import coincurve
import pytest
from ..nip44 import (
Nip44LengthError,
Nip44MacError,
Nip44VersionError,
_calc_padded_len,
decrypt_from,
decrypt_with_conversation_key,
encrypt_for,
encrypt_with_conversation_key,
get_conversation_key,
)
# Helper: derive a compressed-x-coord pubkey hex string from a secret hex.
def _pub_hex(sec_hex: str) -> str:
return (
coincurve.PrivateKey(bytes.fromhex(sec_hex))
.public_key.format(compressed=True)[1:]
.hex()
)
# Canonical test keys widely used across NIP-44 reference vectors.
_SEC_ONE = "00" * 31 + "01" # integer 1
_SEC_TWO = "00" * 31 + "02" # integer 2
_PUB_ONE = _pub_hex(_SEC_ONE)
_PUB_TWO = _pub_hex(_SEC_TWO)
# =============================================================================
# Layer 1 — pinned reference vector (paulmillr/nip44)
# =============================================================================
class TestConversationKeyReferenceVector:
"""Pinned reference vector from the canonical NIP-44 v2 test suite
(paulmillr/nip44). If get_conversation_key drifts from this value we
have a key-derivation regression fail loudly."""
REFERENCE_CK_HEX = (
"c41c775356fd92eadc63ff5a0dc1da211b268cbea22316767095b2871ea1412d"
)
def test_sec_one_pub_two(self):
ck = get_conversation_key(_SEC_ONE, _PUB_TWO)
assert ck.hex() == self.REFERENCE_CK_HEX
def test_sec_two_pub_one_is_symmetric(self):
"""Conversation key is symmetric: ck(privA, pubB) == ck(privB, pubA).
Both sides of a NIP-44 conversation derive the identical PRK; this
is what lets the recipient decrypt with their own privkey + the
sender's pubkey."""
ck_ab = get_conversation_key(_SEC_ONE, _PUB_TWO)
ck_ba = get_conversation_key(_SEC_TWO, _PUB_ONE)
assert ck_ab == ck_ba
# =============================================================================
# Layer 2 — round-trip + tamper detection
# =============================================================================
class TestRoundTrip:
"""Encrypt then decrypt under the high-level pair-keyed API."""
@pytest.mark.parametrize(
"plaintext",
[
"a", # 1 byte (minimum)
"hello, nip44 v2", # short
"x" * 32, # exactly the small-payload boundary
"x" * 33, # just over
"y" * 1000, # medium
"z" * 5000, # large
'{"denominations": {"20": {"position": 1, "count": 49}}}', # realistic
],
)
def test_round_trip_various_lengths(self, plaintext):
payload = encrypt_for(plaintext, _SEC_ONE, _PUB_TWO)
recovered = decrypt_from(payload, _SEC_TWO, _PUB_ONE)
assert recovered == plaintext
def test_payloads_are_unique_under_random_nonce(self):
"""Same plaintext + same key pair should produce different payloads
each time because the nonce is fresh CSPRNG bytes. Catches a
regression where the nonce is accidentally pinned."""
plaintext = "the same message"
p1 = encrypt_for(plaintext, _SEC_ONE, _PUB_TWO)
p2 = encrypt_for(plaintext, _SEC_ONE, _PUB_TWO)
assert p1 != p2
assert decrypt_from(p1, _SEC_TWO, _PUB_ONE) == plaintext
assert decrypt_from(p2, _SEC_TWO, _PUB_ONE) == plaintext
def test_pinned_nonce_is_deterministic(self):
"""Same plaintext + same key pair + same nonce = byte-identical
payload. Regression-locks the chacha20 + hmac chain."""
ck = get_conversation_key(_SEC_ONE, _PUB_TWO)
nonce = bytes(32) # 32 zero bytes
p1 = encrypt_with_conversation_key("a", ck, nonce=nonce)
p2 = encrypt_with_conversation_key("a", ck, nonce=nonce)
assert p1 == p2
assert decrypt_with_conversation_key(p1, ck) == "a"
class TestTamperDetection:
"""HMAC-SHA256 verification catches tampered envelopes. The cryptographic
construction depends on this if HMAC verification ever no-ops, a
relay-MITM could forge ATM state events."""
def _payload(self) -> str:
return encrypt_for("important message", _SEC_ONE, _PUB_TWO)
def test_flipped_mac_byte_rejected(self):
raw = bytearray(base64.b64decode(self._payload()))
raw[-1] ^= 0x01
tampered = base64.b64encode(bytes(raw)).decode("ascii")
with pytest.raises(Nip44MacError):
decrypt_from(tampered, _SEC_TWO, _PUB_ONE)
def test_flipped_ciphertext_byte_rejected(self):
raw = bytearray(base64.b64decode(self._payload()))
# Flip a byte in the middle of the ciphertext segment
# (version[1] + nonce[32..32] + ciphertext[33..-32] + mac[-32..])
ct_start = 1 + 32
raw[ct_start + 5] ^= 0x01
tampered = base64.b64encode(bytes(raw)).decode("ascii")
with pytest.raises(Nip44MacError):
decrypt_from(tampered, _SEC_TWO, _PUB_ONE)
def test_flipped_nonce_byte_rejected(self):
raw = bytearray(base64.b64decode(self._payload()))
# Nonce starts at byte 1 (after version)
raw[1] ^= 0x01
tampered = base64.b64encode(bytes(raw)).decode("ascii")
with pytest.raises(Nip44MacError):
decrypt_from(tampered, _SEC_TWO, _PUB_ONE)
def test_wrong_recipient_privkey_rejected(self):
"""The MAC is derived from the conversation key, so a wrong
recipient privkey produces a different conversation key
different hmac_key MAC verification fails. (Doesn't decrypt
to garbage; fails fast.)"""
sec_three = "00" * 31 + "03"
with pytest.raises(Nip44MacError):
decrypt_from(self._payload(), sec_three, _PUB_ONE)
class TestVersionRejection:
def test_v1_byte_rejected(self):
raw = bytearray(base64.b64decode(encrypt_for("x", _SEC_ONE, _PUB_TWO)))
raw[0] = 0x01
bad = base64.b64encode(bytes(raw)).decode("ascii")
with pytest.raises(Nip44VersionError):
decrypt_from(bad, _SEC_TWO, _PUB_ONE)
def test_unknown_version_byte_rejected(self):
raw = bytearray(base64.b64decode(encrypt_for("x", _SEC_ONE, _PUB_TWO)))
raw[0] = 0xFF
bad = base64.b64encode(bytes(raw)).decode("ascii")
with pytest.raises(Nip44VersionError):
decrypt_from(bad, _SEC_TWO, _PUB_ONE)
class TestLengthGuards:
def test_empty_plaintext_rejected(self):
with pytest.raises(Nip44LengthError):
encrypt_for("", _SEC_ONE, _PUB_TWO)
def test_plaintext_at_max_length_accepted(self):
plaintext = "x" * 65535
payload = encrypt_for(plaintext, _SEC_ONE, _PUB_TWO)
assert decrypt_from(payload, _SEC_TWO, _PUB_ONE) == plaintext
def test_plaintext_over_max_rejected(self):
with pytest.raises(Nip44LengthError):
encrypt_for("x" * 65536, _SEC_ONE, _PUB_TWO)
def test_invalid_base64_payload_rejected(self):
with pytest.raises(Nip44LengthError):
decrypt_from("not!!!base64@@@", _SEC_TWO, _PUB_ONE)
def test_payload_too_short_rejected(self):
# 50 bytes is well under the 99-byte minimum
too_short = base64.b64encode(b"\x02" + b"\x00" * 49).decode("ascii")
with pytest.raises(Nip44LengthError):
decrypt_from(too_short, _SEC_TWO, _PUB_ONE)
class TestPaddingFormula:
"""Spot-check the _calc_padded_len formula against hand-computed cases.
Locks in the NIP-44 v2 padding scheme so a refactor can't silently
break wire compatibility (which would only surface as cross-impl
decryption failures exactly what test_decrypts_bitspire_sample is
meant to catch end-to-end, but a unit test here is cheaper)."""
@pytest.mark.parametrize(
"plaintext_len,expected_padded",
[
(1, 32), # <= 32 → 32
(16, 32),
(32, 32),
(33, 64), # > 32 → next chunk
(64, 64),
(65, 96), # chunk = 32 for L=65 (next_power(64) = 64; 64//8 = 8; max(32, 8) = 32)
(100, 128),
(128, 128),
# L=129: next_power(128) = 1<<8 = 256; chunk = max(32, 256//8) = 32;
# padded = 32 * (128//32 + 1) = 32 * 5 = 160.
(129, 160),
(256, 256), # chunk = 32 for L=256 (next_power(255)=256; max(32, 32) = 32)
(257, 320),
(1000, 1024), # chunk = 128 for L=1000 (next_power(999)=1024; max(32, 128) = 128)
],
)
def test_calc_padded_len(self, plaintext_len, expected_padded):
assert _calc_padded_len(plaintext_len) == expected_padded
# =============================================================================
# Layer 3 — byte-compat cross-test against nostr-tools (bitspire's impl)
# =============================================================================
@pytest.mark.skip(
reason=(
"Waiting on bitspire to post one sample encrypted event to "
"~/dev/coordination/log.md per the 2026-05-30T15:55Z entry. Once "
"posted, hardcode the (event_id, content, recipient_privkey, "
"expected_plaintext) fixture here and remove the skip — this test "
"is the byte-compat cross-test between our hand-rolled NIP-44 v2 "
"and the nostr-tools impl the ATM uses."
)
)
def test_decrypts_bitspire_sample_event_from_coord_log():
"""Cross-impl byte-compatibility test. Bitspire generates one event on
their side (nostr-tools NIP-44 v2 impl), posts the raw event JSON +
a known throwaway recipient privkey to the coord log, and we assert
our `decrypt_from` recovers the expected `{"denominations": {...}}`
plaintext.
If this passes, both impls produce byte-identical wire format. If it
fails, the spec ambiguity surfaces before either side ships exactly
what bitspire flagged in the plan review (`07:55Z`).
"""
# event_b64_content = "..." # paste from coord log
# sender_pubkey_hex = "..."
# recipient_privkey_hex = "..."
# expected_plaintext = '{"denominations": {"20": {"position": 1, "count": 49}}}'
# recovered = decrypt_from(event_b64_content, recipient_privkey_hex, sender_pubkey_hex)
# assert recovered == expected_plaintext
raise NotImplementedError("fixture pending — see skip reason")