libra/docs/BQL-BALANCE-QUERIES.md
Padreug c174cda48d Rename Castle Accounting extension to Libra
Full identifier rename: module path lnbits.extensions.castle →
lnbits.extensions.libra, DB ext_castle → ext_libra, URL prefix
/castle/ → /libra/, manifest id castle → libra, fava ledger slug
default castle-ledger → libra-ledger, Beancount source metadata
castle-api → libra-api and link prefixes castle-{entry,tx}- →
libra-{entry,tx}-, column castle_wallet_id → libra_wallet_id, all
Python/JS/HTML identifiers (castle_ext, CastleSettings,
castle_reference, castleWalletConfigured, etc.).

Display name "Castle Accounting" → "Libra" (the scales/balance
metaphor — fits double-entry bookkeeping).

No backward compat: production hosts will be force-updated. Old
castle-prefixed Beancount metadata in existing Fava ledgers is
historical; new entries use libra-* prefixes going forward.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 10:24:46 +02:00

643 lines
18 KiB
Markdown

# BQL Balance Queries Implementation
**Date**: November 10, 2025
**Status**: In Progress
**Context**: Replace manual aggregation with Beancount Query Language (BQL)
---
## Problem
Current `get_user_balance()` implementation:
- **115 lines** of manual aggregation logic
- Fetches **ALL** journal entries (inefficient)
- Manual regex parsing of amounts
- Manual looping through entries/postings
- O(n) complexity for every balance query
**Performance Impact**:
- Every balance check fetches entire ledger
- No database-level filtering
- CPU-intensive parsing and aggregation
- Scales poorly as ledger grows
---
## Solution: Use Beancount Query Language (BQL)
Beancount has a built-in query language that can efficiently:
- Filter accounts (regex patterns)
- Sum positions (balances)
- Exclude transactions by flag
- Group and aggregate
- All processing done by Beancount engine (optimized C code)
---
## BQL Query Design
### Query 1: Get User Balance (SATS + Fiat)
```sql
SELECT
account,
sum(position) as balance
WHERE
account ~ ':User-{user_id[:8]}'
AND (account ~ 'Payable' OR account ~ 'Receivable')
AND flag != '!'
GROUP BY account
```
**What this does**:
- `account ~ ':User-abc12345'` - Match user's accounts (regex)
- `account ~ 'Payable' OR account ~ 'Receivable'` - Only payable/receivable accounts
- `flag != '!'` - Exclude pending transactions
- `sum(position)` - Aggregate balances
- `GROUP BY account` - Separate totals per account
**Result Format** (from Fava API):
```json
{
"data": {
"rows": [
["Liabilities:Payable:User-abc12345", {"SATS": "150000", "EUR": "145.50"}],
["Assets:Receivable:User-abc12345", {"SATS": "50000", "EUR": "48.00"}]
],
"types": [
{"name": "account", "type": "str"},
{"name": "balance", "type": "Position"}
]
}
}
```
### Query 2: Get All User Balances (Admin View)
```sql
SELECT
account,
sum(position) as balance
WHERE
(account ~ 'Payable:User-' OR account ~ 'Receivable:User-')
AND flag != '!'
GROUP BY account
```
**What this does**:
- Match ALL user accounts (not just one user)
- Aggregate balances per account
- Extract user_id from account name in post-processing
---
## Implementation Plan
### Step 1: Add General BQL Query Method
Add to `fava_client.py`:
```python
async def query_bql(self, query_string: str) -> Dict[str, Any]:
"""
Execute arbitrary Beancount Query Language (BQL) query.
Args:
query_string: BQL query (e.g., "SELECT account, sum(position) WHERE ...")
Returns:
{
"rows": [[col1, col2, ...], ...],
"types": [{"name": "col1", "type": "str"}, ...],
"column_names": ["col1", "col2", ...]
}
Example:
result = await fava.query_bql("SELECT account, sum(position) WHERE account ~ 'User-abc'")
for row in result["rows"]:
account, balance = row
print(f"{account}: {balance}")
"""
try:
async with httpx.AsyncClient(timeout=self.timeout) as client:
response = await client.get(
f"{self.base_url}/query",
params={"query_string": query_string}
)
response.raise_for_status()
result = response.json()
# Fava returns: {"data": {"rows": [...], "types": [...]}}
data = result.get("data", {})
rows = data.get("rows", [])
types = data.get("types", [])
column_names = [t.get("name") for t in types]
return {
"rows": rows,
"types": types,
"column_names": column_names
}
except httpx.HTTPStatusError as e:
logger.error(f"BQL query error: {e.response.status_code} - {e.response.text}")
logger.error(f"Query was: {query_string}")
raise
except httpx.RequestError as e:
logger.error(f"Fava connection error: {e}")
raise
```
### Step 2: Implement BQL-Based Balance Query
Add to `fava_client.py`:
```python
async def get_user_balance_bql(self, user_id: str) -> Dict[str, Any]:
"""
Get user balance using BQL (efficient, ~10 lines vs 115 lines manual).
Args:
user_id: User ID
Returns:
{
"balance": int (sats),
"fiat_balances": {"EUR": Decimal("100.50")},
"accounts": [{"account": "...", "sats": 150000}]
}
"""
# Build BQL query for this user's Payable/Receivable accounts
user_id_prefix = user_id[:8]
query = f"""
SELECT account, sum(position) as balance
WHERE account ~ ':User-{user_id_prefix}'
AND (account ~ 'Payable' OR account ~ 'Receivable')
AND flag != '!'
GROUP BY account
"""
result = await self.query_bql(query)
# Process results
total_sats = 0
fiat_balances = {}
accounts = []
for row in result["rows"]:
account_name, position = row
# Position is a dict like {"SATS": "150000", "EUR": "145.50"}
# or a string for single-currency
if isinstance(position, dict):
# Extract SATS
sats_str = position.get("SATS", "0")
sats_amount = int(sats_str) if sats_str else 0
total_sats += sats_amount
accounts.append({
"account": account_name,
"sats": sats_amount
})
# Extract fiat currencies
for currency in ["EUR", "USD", "GBP"]:
if currency in position:
fiat_str = position[currency]
fiat_amount = Decimal(fiat_str) if fiat_str else Decimal(0)
if currency not in fiat_balances:
fiat_balances[currency] = Decimal(0)
fiat_balances[currency] += fiat_amount
elif isinstance(position, str):
# Single currency (parse "150000 SATS" or "145.50 EUR")
import re
sats_match = re.match(r'^(-?\d+)\s+SATS$', position)
if sats_match:
sats_amount = int(sats_match.group(1))
total_sats += sats_amount
accounts.append({
"account": account_name,
"sats": sats_amount
})
else:
fiat_match = re.match(r'^(-?[\d.]+)\s+([A-Z]{3})$', position)
if fiat_match and fiat_match.group(2) in ('EUR', 'USD', 'GBP'):
fiat_amount = Decimal(fiat_match.group(1))
currency = fiat_match.group(2)
if currency not in fiat_balances:
fiat_balances[currency] = Decimal(0)
fiat_balances[currency] += fiat_amount
logger.info(f"User {user_id[:8]} balance (BQL): {total_sats} sats, fiat: {dict(fiat_balances)}")
return {
"balance": total_sats,
"fiat_balances": fiat_balances,
"accounts": accounts
}
```
### Step 3: Implement BQL-Based All Users Balance
```python
async def get_all_user_balances_bql(self) -> List[Dict[str, Any]]:
"""
Get balances for all users using BQL (efficient admin view).
Returns:
[
{
"user_id": "abc123",
"balance": 100000,
"fiat_balances": {"EUR": Decimal("100.50")},
"accounts": [...]
},
...
]
"""
query = """
SELECT account, sum(position) as balance
WHERE (account ~ 'Payable:User-' OR account ~ 'Receivable:User-')
AND flag != '!'
GROUP BY account
"""
result = await self.query_bql(query)
# Group by user_id
user_data = {}
for row in result["rows"]:
account_name, position = row
# Extract user_id from account name
# Format: "Liabilities:Payable:User-abc12345" or "Assets:Receivable:User-abc12345"
if ":User-" not in account_name:
continue
user_id_with_prefix = account_name.split(":User-")[1]
# User ID is the first 8 chars (our standard)
user_id = user_id_with_prefix[:8]
if user_id not in user_data:
user_data[user_id] = {
"user_id": user_id,
"balance": 0,
"fiat_balances": {},
"accounts": []
}
# Process position (same logic as single-user query)
if isinstance(position, dict):
sats_str = position.get("SATS", "0")
sats_amount = int(sats_str) if sats_str else 0
user_data[user_id]["balance"] += sats_amount
user_data[user_id]["accounts"].append({
"account": account_name,
"sats": sats_amount
})
for currency in ["EUR", "USD", "GBP"]:
if currency in position:
fiat_str = position[currency]
fiat_amount = Decimal(fiat_str) if fiat_str else Decimal(0)
if currency not in user_data[user_id]["fiat_balances"]:
user_data[user_id]["fiat_balances"][currency] = Decimal(0)
user_data[user_id]["fiat_balances"][currency] += fiat_amount
# (Handle string format similarly...)
return list(user_data.values())
```
---
## Testing Strategy
### Unit Tests
```python
# tests/test_fava_client_bql.py
async def test_query_bql():
"""Test general BQL query method."""
fava = get_fava_client()
result = await fava.query_bql("SELECT account WHERE account ~ 'Assets'")
assert "rows" in result
assert "column_names" in result
assert len(result["rows"]) > 0
async def test_get_user_balance_bql():
"""Test BQL-based user balance query."""
fava = get_fava_client()
balance = await fava.get_user_balance_bql("test_user_id")
assert "balance" in balance
assert "fiat_balances" in balance
assert "accounts" in balance
assert isinstance(balance["balance"], int)
async def test_bql_matches_manual():
"""Verify BQL results match manual aggregation (for migration)."""
fava = get_fava_client()
user_id = "test_user_id"
# Get balance both ways
bql_balance = await fava.get_user_balance_bql(user_id)
manual_balance = await fava.get_user_balance(user_id)
# Should match
assert bql_balance["balance"] == manual_balance["balance"]
assert bql_balance["fiat_balances"] == manual_balance["fiat_balances"]
```
### Integration Tests
```python
async def test_bql_performance():
"""BQL should be significantly faster than manual aggregation."""
import time
fava = get_fava_client()
user_id = "test_user_id"
# Time BQL approach
start = time.time()
bql_result = await fava.get_user_balance_bql(user_id)
bql_time = time.time() - start
# Time manual approach
start = time.time()
manual_result = await fava.get_user_balance(user_id)
manual_time = time.time() - start
logger.info(f"BQL: {bql_time:.3f}s, Manual: {manual_time:.3f}s")
# BQL should be faster (or at least not slower)
# With large ledgers, BQL should be 2-10x faster
assert bql_time <= manual_time * 2 # Allow some variance
```
---
## Migration Strategy
### Phase 1: Add BQL Methods (Non-Breaking)
1. Add `query_bql()` method
2. Add `get_user_balance_bql()` method
3. Add `get_all_user_balances_bql()` method
4. Keep existing methods unchanged
**Benefit**: Can test BQL in parallel without breaking existing code.
### Phase 2: Switch to BQL (Breaking Change)
1. Rename old methods:
- `get_user_balance()``get_user_balance_manual()` (deprecated)
- `get_all_user_balances()``get_all_user_balances_manual()` (deprecated)
2. Rename new methods:
- `get_user_balance_bql()``get_user_balance()`
- `get_all_user_balances_bql()``get_all_user_balances()`
3. Update all call sites
4. Test thoroughly
5. Remove deprecated manual methods after 1-2 sprints
---
## Expected Performance Improvements
### Before (Manual Aggregation)
```
User balance query:
- Fetch ALL entries: ~100-500ms (depends on ledger size)
- Manual parsing: ~50-200ms (CPU-bound)
- Total: 150-700ms
```
### After (BQL)
```
User balance query:
- BQL query (filtered at source): ~20-50ms
- Minimal parsing: ~5-10ms
- Total: 25-60ms
Improvement: 5-10x faster
```
### Scalability
**Manual approach**:
- O(n) where n = total number of entries
- Gets slower as ledger grows
- Fetches entire ledger every time
**BQL approach**:
- O(log n) with indexing (Beancount internal optimization)
- Filtered at source (only user's accounts)
- Constant time as ledger grows (for single user)
---
## Code Reduction
- **Before**: `get_user_balance()` = 115 lines
- **After**: `get_user_balance_bql()` = ~60 lines (with comments and error handling)
- **Net reduction**: 55 lines (~48%)
- **Before**: `get_all_user_balances()` = ~100 lines
- **After**: `get_all_user_balances_bql()` = ~70 lines
- **Net reduction**: 30 lines (~30%)
**Total code reduction**: ~85 lines across balance query methods
---
## Risks and Mitigation
### Risk 1: BQL Query Syntax Errors
**Mitigation**:
- Test queries manually in Fava UI first
- Add comprehensive error logging
- Validate query results format
### Risk 2: Position Format Variations
**Mitigation**:
- Handle both dict and string position formats
- Add fallback parsing
- Log unexpected formats for investigation
### Risk 3: Regression in Balance Calculations
**Mitigation**:
- Run both methods in parallel during transition
- Compare results and log discrepancies
- Comprehensive test suite
---
## Test Results and Findings
**Date**: November 10, 2025
**Status**: ⚠️ **NOT FEASIBLE for Libra's Current Data Structure**
### Implementation Completed
1. ✅ Analyze current implementation
2. ✅ Design BQL queries
3. ✅ Implement `query_bql()` method (fava_client.py:494-547)
4. ✅ Implement `get_user_balance_bql()` method (fava_client.py:549-644)
5. ✅ Implement `get_all_user_balances_bql()` method (fava_client.py:646-747)
6. ✅ Test against real data
### Test Results
**✅ BQL query execution works perfectly:**
- Successfully queries Fava's `/query` endpoint
- Returns structured results (rows, types, column_names)
- Can filter accounts by regex patterns
- Can aggregate positions using `sum(position)`
**❌ Cannot access SATS balances:**
- BQL returns EUR/USD positions correctly
- BQL **CANNOT** access posting metadata
- SATS values stored in `posting.meta["sats-equivalent"]`
- No BQL syntax to query metadata fields
### Root Cause: Architecture Limitation
**Current Libra Ledger Structure:**
```
Posting format:
Amount: -360.00 EUR ← Position (BQL can query this)
Metadata:
sats-equivalent: 337096 ← Metadata (BQL CANNOT query this)
```
**Test Data:**
- User 375ec158 has 82 EUR postings
- ALL postings have `sats-equivalent` metadata
- ZERO postings have SATS as position amount
- Manual method: -7,694,356 sats (from metadata)
- BQL method: 0 sats (cannot access metadata)
**BQL Limitation:**
```sql
-- ✅ This works (queries position):
SELECT account, sum(position) WHERE account ~ 'User-'
-- ❌ This is NOT possible (metadata access):
SELECT account, sum(meta["sats-equivalent"]) WHERE account ~ 'User-'
```
### Why Manual Aggregation is Necessary
1. **SATS are Libra's primary currency** for balance tracking
2. **SATS values are in metadata**, not positions
3. **BQL has no metadata query capability**
4. **Must iterate through postings** to read `meta["sats-equivalent"]`
### Performance: Cache Optimization is the Solution
**Phase 1 Caching (Already Implemented)** provides the performance boost:
- ✅ Account lookups cached (5min TTL)
- ✅ Permission lookups cached (1min TTL)
- ✅ 60-80% reduction in DB queries
- ✅ Addresses the actual bottleneck (database queries, not aggregation)
**BQL would not improve performance** because:
- Still need to fetch all postings to read metadata
- Aggregation is not the bottleneck (it's fast)
- Database queries are the bottleneck (solved by caching)
---
## Conclusion
**Status**: ⚠️ **BQL Implementation Not Feasible**
**Recommendation**: **Keep manual aggregation method with Phase 1 caching**
**Rationale:**
1. ✅ Caching already provides 60-80% performance improvement
2. ✅ SATS metadata requires posting iteration regardless of query method
3. ✅ BQL cannot access the data we need (metadata)
4. ✅ Manual aggregation is well-tested and working correctly
**BQL Methods Status**:
- ✅ Implemented and committed as reference code
- ⚠️ NOT used in production (cannot query SATS from metadata)
- 📝 Kept for future consideration if ledger format changes
---
## Future Consideration: Ledger Format Change
**If** Libra's ledger format changes to use SATS as position amounts:
```beancount
; Current format (EUR position, SATS in metadata):
2025-11-10 * "Groceries"
Expenses:Food -360.00 EUR
sats-equivalent: 337096
Liabilities:Payable:User-abc 360.00 EUR
sats-equivalent: 337096
; Hypothetical future format (SATS position, EUR as cost):
2025-11-10 * "Groceries"
Expenses:Food -337096 SATS {360.00 EUR}
Liabilities:Payable:User-abc 337096 SATS {360.00 EUR}
```
**Then** BQL would become feasible:
```sql
-- Would work with SATS as position:
SELECT account, sum(position) as balance
WHERE account ~ 'User-' AND currency = 'SATS'
```
**Trade-offs of format change:**
- ✅ Would enable BQL optimization
- ✅ Aligns with "Bitcoin-first" philosophy
- ⚠️ Requires ledger migration
- ⚠️ Changes reporting currency (impacts existing workflows)
- ⚠️ Beancount cost syntax has precision limitations
**Recommendation**: Consider during major version upgrade or architectural redesign.
---
## Next Steps
1. ✅ Analyze current implementation
2. ✅ Design BQL queries
3. ✅ Implement `query_bql()` method
4. ✅ Implement `get_user_balance_bql()` method
5. ✅ Test against real data
6. ✅ Implement `get_all_user_balances_bql()` method
7. ✅ Document findings and limitations
8. ❌ Update call sites (NOT APPLICABLE - BQL not feasible)
9. ❌ Remove manual methods (NOT APPLICABLE - manual method is correct approach)
---
**Implementation By**: Claude Code
**Date**: November 10, 2025
**Status**: ✅ **Tested and Documented** | ⚠️ **Not Feasible for Production Use**