Two-layer fix for the issue where a fresh client chaining
create_new_key + NIP-46 connect on the same target key would
time out — bunker had no subscription registered for the new
key by the time the connect event arrived at the relay.
Layer 1 — run.ts: loadNsec and unlockKey were synchronous and
fire-and-forgot the async startKey promise. create_new_key.ts:35
already awaited loadNsec, but the await was a no-op against a sync
return. Promoted both to async and properly awaited startKey, so
backend.start() at least gets a chance to run before the caller's
response goes out.
Layer 2 — backend/index.ts: NDKNip46Backend.start() registers the
kind-24133 subscription via this.ndk.subscribe(...) but returns
immediately, before the relay's EOSE confirms it has the
subscription on file. Override start() in our Backend subclass to
await EOSE before resolving. This is the actual race-closer —
layer 1's await alone wasn't enough because start() was still
returning before the relay registered the subscription.
Surfaced by aiolabs/lnbits#33's eager-bind chain, which publishes
a NIP-46 connect event in the same HTTP round-trip as
create_new_key. Pre-fix lnbits deferred the connect to first
sign_event (minutes-to-hours after provisioning), so the race
window was hidden.
Verified end-to-end on bohm regtest: demo account creation through
the webapp now completes cleanly, with bunker logs showing
connect + sign_event for the freshly-provisioned key.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Calls `relayConnectionWatchdog` (introduced in the previous commit) at
the end of admin-interface connect(). Gated by NSEC_BUNKER_DISABLE_WATCHDOG=1
for operators who run external liveness checks (Prometheus probes, k8s
readiness, etc.) and don't want the daemon to self-terminate.
This restores the watchdog behavior that was commented out in commit
42dbbd7 (the emergency stopgap for the old self-echo false positives),
but on top of the now-reliable pool-status mechanism.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The original watchdog published a kind-24133 event to its own pubkey
every 20s and exited if no echo arrived within 50s. On a single private
relay setup (LNbits's nostrrelay extension channel), NDK 2.8.1's outbox
model doesn't reliably route self-publishes back through the matching
subscription, so the watchdog fires false positives and exits every 50s
even though admin RPCs over the same channel still work fine. The
upstream patches we landed previously (commit 42dbbd7) commented the
call out as an emergency stopgap; this commit replaces the mechanism
with one that actually answers the right question.
Pool-status watchdog: poll `ndk.pool.connectedRelays().length` every
10s, track the most recent moment any relay was connected, exit if no
relay has been connected for 60s. Uses NDK's own connection-lifecycle
tracking which works reliably across all relay configurations — no
self-publish, no subscription dependency, no relay traffic. Same intent
as pingOrDie (detect partition from relay layer and let the supervisor
restart us), reliable signal.
Call site re-enable + env-flag opt-out follow in the next commit.
Drops the now-unused NostrEvent import.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three correctness fixes to the nix derivation that mirror the Dockerfile
correctness fixes:
1. Drop `pnpm prune --prod --ignore-scripts` from the build phase. The
prune step removed the prisma CLI (devDependency) from the output,
so the runtime invocation of `prisma migrate deploy` had nothing to
exec. Same trap the upstream Dockerfile fell into via `--prod` install.
2. Copy `scripts/` into `$out/share/nsecbunkerd/` alongside dist,
node_modules, prisma, templates. Without it the launcher script
(which contains the migration step) wasn't present.
3. The makeWrapper target switches from `dist/index.js` to
`scripts/start.js`. Same change the Dockerfile ENTRYPOINT got in
the previous commit. Also adds nodejs_20 to PATH so `npm` is
resolvable from inside start.js, and drops `--chdir` so the caller
(systemd, docker compose) controls cwd — start.js now resolves
sibling paths from `__dirname`, independently committed.
The `patchNdk` substitution narrows from the old `workspace:*` form
(no longer in the package.json after fork commit 06272c8) to the
current `"2.8.1"` → `"^2.8.1"` rewrite needed to align package.json
with the lockfile under --frozen-lockfile.
Remaining known gap: nixpkgs ships prisma-engines 7.7.0 while the
JS prisma CLI in node_modules is 5.4.1, an RPC vocabulary mismatch
that breaks the migrate step at runtime (`Method not found:
listMigrationDirectories`). Either bump prisma JS to ^7.x or overlay
prisma-engines to 5.4.1. Out of scope for this commit; docker build
unaffected.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The launcher previously assumed cwd was the package root: `mkdir config`
in cwd, `npm run prisma:migrate` in cwd, `node ./dist/index.js`. Works
under docker (WORKDIR /app, writable) but breaks anywhere cwd differs
from the package root — e.g. a nix-built bunker invoked from a systemd
unit whose WorkingDirectory is the state dir (/var/lib/nsecbunkerd) and
not the nix store path that holds dist/, scripts/, prisma/.
Resolve sibling paths via `path.resolve(__dirname, '..')` so the
package-internal layout is robust to cwd. Use `path.join(pkgRoot, 'dist/index.js')`
for the daemon spawn and `{ cwd: pkgRoot }` for the npm migrate exec.
Switch `mkdir config` (which only works in writable cwd) to
`fs.mkdirSync(configDir, { recursive: true })` where configDir defaults
to `./config` relative to cwd, overrideable via NSEC_BUNKER_CONFIG_DIR.
This lets the nix package install the launcher into the read-only store
while the systemd unit still does its config/state work in
/var/lib/nsecbunkerd with no shell wrapping.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Upstream Dockerfile sets `ENTRYPOINT [ "node", "./dist/index.js" ]`,
which boots the daemon directly and silently bypasses `scripts/start.js`
— the only place that runs `prisma migrate deploy`. On a clean install,
the SQLite db file at $DATABASE_URL is created empty (0 bytes) and
every Policy / KeyUser / Token / SigningCondition operation throws
"table does not exist." `ping` / `get_keys` / `create_new_key` happen
to survive because they only touch the JSON config, not the db.
Two changes:
1. ENTRYPOINT switches to `node ./scripts/start.js`. The CMD arg
(`start`) and any additional argv pass through to the daemon
unchanged via process.argv.
2. Runtime pnpm install drops `--prod`. The prisma CLI lives in
devDependencies; with `--prod`, `npx prisma migrate deploy` tries to
download prisma@latest at runtime, which OOMs in modest containers.
Including devDeps at runtime adds modest image bulk for correctness.
Validated end-to-end against the local regtest stack — after the
rebuild the SQLite db boots populated with 22 migrations, and the
lnbits-side admin spike harness passes all 9 steps including NIP-46
sign_event with Schnorr-valid signatures.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The wire-level `create_new_token` RPC carries `policyId` as a string
(everything in NDK RPC params is string). The handler correctly
parseInts it for the `findUnique({where:{id:parseInt(policyId)}})` call
but then forwards the unparsed string straight into the Prisma
`token.create({data:{...policyId}})` payload. Prisma rejects with
"Argument `policyId`: Invalid value provided. Expected Int or Null,
provided String" because `Token.policyId` is declared `Int` per the
schema (references `Policy.id`, which is autoincrement Int).
Hoist `policyIdInt = parseInt(policyId)` and use it for both the
findUnique lookup and the create payload. Latent upstream bug — no one
would have seen it before because the wrong-kind error response (fixed
in the previous commit) made the symptom look like a transport timeout
rather than a Prisma validation error.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The catch block in handleRequest and both response paths in create_account
pass `NDKKind.NostrConnectAdmin` as the response kind. That constant does
NOT exist in NDK 2.8.1 — only `NostrConnect = 24133` is exported — so it
resolves to `undefined` and NDKNostrRpc.sendResponse falls through to its
own default of `NDKKind.NostrConnect = 24133`. Net effect: any error
response to an admin-channel (kind 24134) request is published on the
NIP-46 signing channel (24133) instead, which clients subscribed for
24134 never see. Looks like a transport-layer NDK-echo / silent-drop
issue from the client's perspective, but the bunker IS publishing
reliably — just on the wrong kind.
Mirror `req.event.kind` so the error response goes back on the same
channel the request came in on. Same pattern the unknown-method path
and create_account's validation-error path already used; just propagate
it to the remaining sites. Drops the now-unused NDKKind import from
create_account.ts.
Validated end-to-end against the local bunker via the lnbits-side admin
spike harness — after this fix + the migration entrypoint fix + the
policyId type fix, all 9 spike steps including NIP-46 sign_event pass
with Schnorr-valid signatures. See coordination log entry 2026-05-27T14:30Z.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add NSEC_BUNKER_DEBUG_TRANSPORT=1 opt-in logging that emits REQUEST_IN
on inbound NIP-46 RPCs, RESPONSE_SENT around NDKNostrRpc.sendResponse,
and PUBLISHED / PUBLISH_FAILED per-relay on the bunker's pool. Surfaces
the diagnostic signal NDKNostrRpc itself discards: sendResponse calls
`event.publish(this.relaySet)` and throws away the Set<NDKRelay> it
returns, so silent outbox-drops and wrong-kind responses are invisible
without hooking the pool's per-relay events directly.
Validated against the local bunker via the lnbits-side admin spike
harness (~/dev/lnbits/misc-aio/bunker_admin_spike.py): the instrumentation
made the 9-step harness reveal a wrong-kind error response path (separate
fix in the next commit) that had been masquerading as an NDK echo issue
for a week. With the env flag unset the daemon stays as quiet as before.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
NDK 2.8.1's NDKPrivateKeySigner constructor forwards its arg straight
to nostr-tools getPublicKey() which requires 32-byte hex/bytes/bigint
and throws on bech32 input. Every key loaded through startKey (i.e.
every key created via create_new_key, plus boot-time reloads of any
plain-nsec entries in the config) was failing silently with the
nostr-tools type error. The try/catch caught the throw and returned
without loading the key, so the bunker would happily report
create_new_key as successful, the key would persist encrypted on
disk, but the runtime keystore would not have a signer for it.
NIP-46 connect / sign_event against any admin-provisioned target
therefore silently timed out from the client side — blocking
essentially every signing flow.
Sister bug to #5 (getKeys iterator) in a different code path. The
fix matches the existing pattern in create_new_key.ts:16:
hexpk = nip19.decode(nsec).data as string;
Verified against the local spike harness: create_new_key now loads
the target into runtime; get_keys returns the new entry (assuming
#5 is patched separately for the iterator path).
Fixes#8.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
NDK 2.8.1's outbox model doesn't reliably deliver self-published
events back through subscriptions when the configured relay set is
a single custom (non-public) relay. The pingOrDie self-watchdog
publishes a kind-24133 event to its own pubkey every 20s and exits
the bunker if it doesn't see the echo within 50s — which means on
a private relay channel (e.g. LNbits's nostrrelay extension), the
bunker exits cleanly every 50s even though admin RPCs over that
same channel are working fine.
Plain-WebSocket round-trips to the same relay echo correctly in
<1s, so the issue is on NDK's side, not the relay's.
Commenting out the watchdog is the minimum patch to keep the
daemon alive. Real fix is either an env-flag opt-out, a simpler
connectivity check that doesn't depend on self-echo, or an NDK
upgrade that fixes the outbox-vs-subscribe race.
Fixes#4. See also #7 for the underlying NDK echo investigation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two upstream-rot issues fixed in one commit (same root cause: the
upstream Dockerfile predates the move to pnpm and the lockfile has
drifted):
- npm install can't resolve workspace:* deps (which package.json used
to declare for @nostr-dev-kit/ndk — see prior commit for the pin).
Switching to pnpm@9 matches the lockfile that ships in-repo.
- pnpm-lock.yaml is out of date vs package.json (likely from
generation-time vs commit-time drift), so --frozen-lockfile fails
with ERR_PNPM_OUTDATED_LOCKFILE. Drop the flag in both build and
runtime stages to let pnpm resolve fresh, at the cost of giving up
determinism — to be restored once the lockfile is regenerated.
Also reorders the build stage to COPY lockfile + manifest before the
source, so the install layer caches across source-only edits.
Fixes#1, #2.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Upstream declares the dependency as workspace:*, but the repo has no
pnpm-workspace.yaml and no sibling @nostr-dev-kit/ndk package — so
pnpm install fails with ERR_PNPM_WORKSPACE_PKG_NOT_FOUND on a clean
clone. The shipped pnpm-lock.yaml was resolving to ndk 2.8.1, so pin
to that exact version to match what the lockfile already expects.
Fixes#3.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
devShell: nodejs_20, pnpm_8, prisma + prisma-engines, sqlite, openssl,
plus the env wiring so prisma uses nix-provided engines instead of
fetching from binaries.prisma.sh.
packages.default: full native build via pnpm_8.fetchDeps + configHook.
Patches the workspace:* ndk spec to the lockfile-resolved ^2.8.1 so
--frozen-lockfile accepts it, then re-runs install with scripts to
trigger bcrypt's node-pre-gyp fallback-to-build (uses python311 since
node-gyp 9.4.1 bundled with pnpm 8 still imports distutils).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add .dockerignore
- Replace .env with .env.example
- Add migrations service
- Cleanup Dockerfile: simpler setup, simpler copy, no migrations inside the image
- Update README to match new instruction