add_account duplicate-detection regex isn't anchored → false 409s and silent double-writes #48

Closed
opened 2026-06-16 21:58:15 +00:00 by padreug · 1 comment
Owner

Found in the high-effort code review of PR #46.

fava_client.py:1652:

re.search(rf"open {re.escape(account_name)}(?:\s|$)", source, re.MULTILINE)

The pattern has no leading anchor, so it matches open <name> anywhere in the chart file — including inside another account's description metadata or a comment — and its (?:\s|$) boundary misses an inline ;comment.

Empirically verified both failure directions:

# False POSITIVE — name appears only in a prior account's description:
src = '2020-01-01 open Expenses:Notes\n  description: "remember to open Expenses:Vehicle:Gas next month"\n'
exists("Expenses:Vehicle:Gas", src)   # → True  (a later legit add 409s)

# False NEGATIVE — a real Open with an inline comment, no space before ';' (valid Beancount):
src = '2020-01-01 open Expenses:Vehicle:Gas;legacy-import\n'
exists("Expenses:Vehicle:Gas", src)   # → False (add_account appends a DUPLICATE Open)

The false-negative is the dangerous one: a duplicate open makes bean-check reject the file, breaking every subsequent /api/source write.

Suggested fix

Anchor to an actual Open directive and use an account-boundary negative-lookahead so a prefix can't match a longer sibling either:

re.search(rf"^\d{{4}}-\d{{2}}-\d{{2}} open {re.escape(account_name)}(?![\w:-])", source, re.MULTILINE)

(Or, better, replace text-grepping with a query against Fava's parsed accounts / libra's account DB — see the centralization issue.)

Found in the high-effort code review of PR #46. `fava_client.py:1652`: ```python re.search(rf"open {re.escape(account_name)}(?:\s|$)", source, re.MULTILINE) ``` The pattern has no leading anchor, so it matches `open <name>` **anywhere** in the chart file — including inside another account's `description` metadata or a comment — and its `(?:\s|$)` boundary misses an inline `;comment`. Empirically verified both failure directions: ```python # False POSITIVE — name appears only in a prior account's description: src = '2020-01-01 open Expenses:Notes\n description: "remember to open Expenses:Vehicle:Gas next month"\n' exists("Expenses:Vehicle:Gas", src) # → True (a later legit add 409s) # False NEGATIVE — a real Open with an inline comment, no space before ';' (valid Beancount): src = '2020-01-01 open Expenses:Vehicle:Gas;legacy-import\n' exists("Expenses:Vehicle:Gas", src) # → False (add_account appends a DUPLICATE Open) ``` The false-negative is the dangerous one: a duplicate `open` makes `bean-check` reject the file, breaking **every** subsequent `/api/source` write. ### Suggested fix Anchor to an actual Open directive and use an account-boundary negative-lookahead so a prefix can't match a longer sibling either: ```python re.search(rf"^\d{{4}}-\d{{2}}-\d{{2}} open {re.escape(account_name)}(?![\w:-])", source, re.MULTILINE) ``` (Or, better, replace text-grepping with a query against Fava's parsed accounts / libra's account DB — see the centralization issue.)
Author
Owner

Fixed and shipped in v0.4.0 (PR #46, catalog updated).

  • 0ea96cd — extracted the check into a pure _open_directive_exists() anchored to a real <date> open <name> directive with an account-boundary negative-lookahead.
  • 3adb3d3 — hardened the date/whitespace to match Beancount's actual DATE token ((17|18|19|20)[0-9]{2}[-/][0-9]+[-/][0-9]+) and ignored inter-token whitespace, after a coherence pass over the Beancount source.

Covered by 12 unit tests (both false directions + prefix/child non-matches + slash/single-digit/multi-space/tab date variants).

Fixed and shipped in **v0.4.0** (PR #46, catalog updated). - `0ea96cd` — extracted the check into a pure `_open_directive_exists()` anchored to a real `<date> open <name>` directive with an account-boundary negative-lookahead. - `3adb3d3` — hardened the date/whitespace to match Beancount's actual DATE token (`(17|18|19|20)[0-9]{2}[-/][0-9]+[-/][0-9]+`) and ignored inter-token whitespace, after a coherence pass over the Beancount source. Covered by 12 unit tests (both false directions + prefix/child non-matches + slash/single-digit/multi-space/tab date variants).
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
aiolabs/libra#48
No description provided.