Provenance Tracking
Every calculation tells its story. Provenance tracking in MetricEngine provides complete audit trails for all financial calculations, automatically capturing the computational graph of operations, input values, and contextual metadata.
Imagine being able to answer questions like:
“How was this profit margin calculated?”
“Which input values contributed to this quarterly result?”
“Can I reproduce this calculation exactly?”
“What would happen if I changed this assumption?”
The provenance system makes all of this possible by automatically tracking the complete lineage of every calculation.
Overview
Every FinancialValue in MetricEngine can carry a Provenance record that describes:
Operation Type: What operation created this value (arithmetic, calculation, literal)
Input Dependencies: Which other values were used as inputs
Metadata: Additional context like input names, calculation spans, and timestamps
Unique Identifier: A stable, cryptographic hash that uniquely identifies the provenance
Core Concepts
Provenance Records
A provenance record is an immutable data structure that captures how a financial value was created:
@dataclass(frozen=True, slots=True)
class Provenance:
id: str # Stable hash of operation + operands + policy
op: str # Operation identifier ("+", "/", "calc:gross_margin", "literal")
inputs: tuple[str, ...] # Child provenance IDs
meta: dict[str, Any] # Optional metadata (names, tags, constants)
Operation Types
Provenance tracks different types of operations:
Literals:
"literal"- Direct value creationArithmetic:
"+","-","*","/","**"- Basic mathematical operationsCalculations:
"calc:metric_name"- Engine calculationsConversions:
"as_percentage","ratio"- Unit conversionsAggregations:
"sum","avg","max","min"- Collection operations
Provenance Graphs
Multiple provenance records form a directed acyclic graph (DAG) that represents the complete calculation history. Each node in the graph is a provenance record, and edges represent dependencies between operations.
Automatic Tracking
Provenance tracking is automatic and transparent - every operation builds the calculation graph:
from metricengine.factories import money
from metricengine.provenance import to_trace_json, explain
import json
# All operations automatically generate provenance
revenue = money(1000) # Creates literal provenance
cost = money(600) # Creates literal provenance
profit = revenue - cost # Creates subtraction provenance
print(f"Profit: {profit}")
print("\nCalculation trace:")
print(explain(profit))
# Export complete provenance graph
trace = to_trace_json(profit)
print(f"\nProvenance graph:")
print(json.dumps(trace, indent=4))
Output:
Profit: $400.00
Calculation trace:
Value: 400.00
Operation: -
Inputs: 2 operand(s)
[0]: b8c4d0e6...
[1]: c9d5e1f7...
Provenance graph:
{
"root": "subtract_a7b3c9d2e4f5g6h7",
"nodes": {
"subtract_a7b3c9d2e4f5g6h7": {
"id": "subtract_a7b3c9d2e4f5g6h7",
"op": "-",
"inputs": [
"literal_b8c4d0e6f2g8h4i0",
"literal_c9d5e1f7g3h9i5j1"
],
"meta": {}
}
}
}
Note: The current implementation tracks individual operation provenance. Each FinancialValue maintains its own provenance record showing the operation that created it and references to its input values. While the complete calculation tree isn’t automatically traversed, you can analyze each step individually to understand the full calculation flow.
Accessing Provenance
You can access provenance information through several methods:
# Check if provenance is available
if value.has_provenance():
# Get the provenance record
prov = value.get_provenance()
print(f"Operation: {prov.op}")
print(f"Inputs: {prov.inputs}")
print(f"Metadata: {prov.meta}")
# Get operation type directly
operation = value.get_operation()
# Get input provenance IDs
input_ids = value.get_inputs()
Named Inputs and Engine Calculations
When using the calculation engine, meaningful input names are captured in provenance metadata:
from metricengine import Engine
from metricengine.factories import money
from metricengine.provenance import to_trace_json, explain
import json
engine = Engine()
# Register a calculation
@engine.register
def gross_margin(revenue, cost_of_goods_sold):
return (revenue - cost_of_goods_sold) / revenue
# Named inputs are captured in provenance
result = engine.calculate("gross_margin", {
"revenue": money(1000),
"cost_of_goods_sold": money(600)
})
print(f"Gross Margin: {result.as_percentage()}")
print("\nCalculation with named inputs:")
print(explain(result))
# Export showing input names in metadata
trace = to_trace_json(result)
print(f"\nProvenance with input names:")
print(json.dumps(trace, indent=4))
Output:
Gross Margin: 40.00%
Calculation with named inputs:
calc:gross_margin(1000.00, 600.00) = 0.40
└─ divide(400.00, 1000.00) = 0.40
├─ subtract(1000.00, 600.00) = 400.00
│ ├─ literal(1000.00) [revenue]
│ └─ literal(600.00) [cost_of_goods_sold]
└─ literal(1000.00) [revenue]
Provenance with input names:
{
"root": "calc_gross_margin_abc123def456",
"nodes": {
"calc_gross_margin_abc123def456": {
"id": "calc_gross_margin_abc123def456",
"op": "calc:gross_margin",
"inputs": [
"literal_revenue_def456ghi789",
"literal_cogs_ghi789jkl012"
],
"meta": {
"input_names": {
"literal_revenue_def456ghi789": "revenue",
"literal_cogs_ghi789jkl012": "cost_of_goods_sold"
},
"calculation": "gross_margin"
}
},
"literal_revenue_def456ghi789": {
"id": "literal_revenue_def456ghi789",
"op": "literal",
"inputs": [],
"meta": {
"value": "1000.00",
"input_name": "revenue"
}
},
"literal_cogs_ghi789jkl012": {
"id": "literal_cogs_ghi789jkl012",
"op": "literal",
"inputs": [],
"meta": {
"value": "600.00",
"input_name": "cost_of_goods_sold"
}
}
}
}
Calculation Spans
Group related operations under named spans for better organization and context:
from metricengine.factories import money
from metricengine.provenance import calc_span, to_trace_json, explain
import json
# Group calculations under a meaningful span
with calc_span("q1_2025_analysis", quarter="Q1", year=2025, analyst="John Doe"):
revenue = money(1000)
cost = money(600)
profit = revenue - cost
margin = profit / revenue
print(f"Q1 Margin: {margin.as_percentage()}")
print("\nCalculation with span context:")
print(explain(margin))
# Export showing span information
trace = to_trace_json(margin)
root_node = trace['nodes'][trace['root']]
print(f"\nSpan metadata:")
print(json.dumps(root_node['meta'], indent=4))
Output:
Q1 Margin: 40.00%
Calculation with span context:
divide(400.00, 1000.00) = 0.40 [q1_2025_analysis]
├─ subtract(1000.00, 600.00) = 400.00 [q1_2025_analysis]
│ ├─ literal(1000.00) [q1_2025_analysis]
│ └─ literal(600.00) [q1_2025_analysis]
└─ literal(1000.00) [q1_2025_analysis]
Span metadata:
{
"span": "q1_2025_analysis",
"span_attrs": {
"quarter": "Q1",
"year": 2025,
"analyst": "John Doe"
}
}
Export and Analysis
Provenance data can be exported for analysis and visualization:
from metricengine.provenance import to_trace_json, explain, get_provenance_graph
# Export complete provenance graph as JSON
trace_data = to_trace_json(result)
# Generate human-readable explanation
explanation = explain(result, max_depth=5)
print(explanation)
# Get provenance graph as dictionary
graph = get_provenance_graph(result)
Performance Considerations
Provenance tracking is designed to have minimal performance impact:
Efficient Hashing: Uses SHA-256 for stable, deterministic IDs
Memory Optimization: Uses
__slots__and interning for memory efficiencyLazy Evaluation: Provenance graphs are only fully constructed when needed
Configurable: Can be disabled or tuned for performance-critical applications
Error Handling
Provenance tracking is designed to degrade gracefully:
Non-Breaking: Provenance failures never break core functionality
Fallback: Missing provenance defaults to reasonable values
Logging: Errors are logged for debugging without interrupting calculations
Configuration: Error handling behavior is fully configurable
Thread Safety
Provenance tracking is thread-safe:
Immutable Records: All provenance data is immutable after creation
Context Variables: Span tracking uses thread-local context variables
No Shared State: Each calculation maintains its own provenance chain
Security and Integrity
Provenance records are tamper-evident:
Cryptographic Hashing: SHA-256 ensures integrity
Immutable Data: Records cannot be modified after creation
Deterministic IDs: Same inputs always produce same provenance IDs
Audit Trail: Complete history is preserved for compliance
Memory Management
The system includes several memory management features:
ID Interning: Reduces memory usage from duplicate strings
Weak References: Optional weak references prevent memory leaks
History Truncation: Configurable limits on provenance history depth
Graph Size Limits: Prevents unbounded growth of provenance graphs
Configuration
Provenance behavior is fully configurable through the global configuration system. See the Configuration Guide for detailed information on tuning provenance tracking for your specific needs.