Using Provenance Tracking

This guide provides practical examples of how to use provenance tracking in MetricEngine for debugging, auditing, and understanding complex financial calculations. Provenance tracking creates a complete audit trail showing exactly how every value was calculated - think of it as a “calculation DNA” that can be traced, analyzed, and visualized.

Why Provenance Matters

Imagine you’re reviewing a financial model and see a profit margin of 23.7%. With traditional calculations, you’d have to manually trace through spreadsheets or code to understand how that number was derived. With MetricEngine’s provenance system, you can instantly see:

The complete calculation tree showing every step
All input values that contributed to the result
Who performed the analysis and when
The exact sequence of operations
Tamper-evident verification that the calculation is correct

This is invaluable for debugging errors, meeting compliance requirements, and building explainable financial models.

Quick Start: A Simple Example

Let’s start with a minimal but interesting example that shows the power of provenance tracking:

from metricengine import FinancialValue
from metricengine.provenance import to_trace_json, explain
import json

# Simple calculation - provenance tracked automatically
revenue = FinancialValue(1000)
cost = FinancialValue(600)
margin = revenue - cost

print(f"Gross Margin: {margin}")  # $400.00

# Get human-readable explanation
explanation = explain(margin)
print("\nHow was this calculated?")
print(explanation)

# Export complete provenance as JSON
trace = to_trace_json(margin)
print("\nComplete Provenance Graph:")
print(json.dumps(trace, indent=4))

Output:

Gross Margin: $400.00

How was this calculated?
Value: 400.00
Operation: -
  Inputs: 2 operand(s)
    [0]: a1b2c3d4...
    [1]: b2c3d4e5...

Complete Provenance Graph:
{
    "root": "e7f8a9...",
    "nodes": {
        "e7f8a9...": {
            "id": "e7f8a9...",
            "op": "-",
            "inputs": [
                "a1b2c3...",
                "b2c3d4..."
            ],
            "meta": {}
        }
    }
}

This demonstrates the key features of provenance tracking:

Individual Operation Records: Each calculation maintains its own provenance record
Input References: The operation references its input values via their provenance IDs
Tamper-Evident IDs: Unique, cryptographic hashes ensure calculation integrity
Complete Metadata: All relevant context is preserved in the provenance record

Understanding the Current Implementation: The current provenance system tracks individual operations rather than maintaining a complete traversable graph. Each FinancialValue knows how it was created and references its inputs, but to understand a complete calculation flow, you analyze each step individually. This approach provides excellent performance while still enabling comprehensive audit trails.

Basic Usage

Automatic Provenance Tracking

Provenance tracking works automatically without any code changes:

from metricengine import FinancialValue

# Create values - provenance is tracked automatically
revenue = FinancialValue(1000)
cost = FinancialValue(600)
margin = revenue - cost

# Check if provenance is available
if margin.has_provenance():
    prov = margin.get_provenance()
    print(f"Operation: {prov.op}")  # "-"
    print(f"Number of inputs: {len(prov.inputs)}")  # 2
    print(f"Provenance ID: {prov.id[:16]}...")  # First 16 chars of hash

Accessing Provenance Information

# Get provenance record
provenance = value.get_provenance()

# Get operation type directly
operation = value.get_operation()  # e.g., "+", "calc:gross_margin"

# Get input provenance IDs
input_ids = value.get_inputs()  # tuple of parent provenance IDs

# Check if value has provenance
has_prov = value.has_provenance()

Engine Calculations with Named Inputs

Basic Named Inputs

Named inputs make provenance traces much more readable by providing meaningful names instead of cryptic IDs:

from metricengine import Engine
from metricengine.provenance import explain, to_trace_json
import json

engine = Engine()

# Use named inputs for better provenance
result = engine.calculate("profitability.gross_margin", {
    "revenue": 1000,
    "cost_of_goods_sold": 600
})

print(f"Gross Margin: {result}")

# Get readable explanation with input names
explanation = explain(result)
print("\nCalculation Breakdown:")
print(explanation)

# Export with named inputs in metadata
trace = to_trace_json(result)
print("\nProvenance with Named Inputs:")
print(json.dumps(trace, indent=4))

Output:

Gross Margin: 40.00%

Calculation Breakdown:
gross_margin (calc:profitability.gross_margin)
├── revenue: $1,000.00 (literal: revenue)
└── cost_of_goods_sold: $600.00 (literal: cost_of_goods_sold)

Provenance with Named Inputs:
{
    "root": "f1e2d...",
    "nodes": {
        "f1e2...": {
            "id": "f1e2d...",
            "op": "calc:profitability.gross_margin",
            "inputs": [
                "a1b2c3...",
                "b2c3d4..."
            ],
            "meta": {
                "input_names": {
                    "a1b2c3...": "revenue",
                    "b2c3d4...": "cost_of_goods_sold"
                },
                "calculation": "profitability.gross_margin"
            }
        },
        "a1b2c3...": {
            "id": "a1b2c3...",
            "op": "literal",
            "inputs": [],
            "meta": {
                "value": "1000.00",
                "input_name": "revenue"
            }
        },
        "b2c3d4...": {
            "id": "b2c3d4...",
            "op": "literal",
            "inputs": [],
            "meta": {
                "value": "600.00",
                "input_name": "cost_of_goods_sold"
            }
        }
    }
}

Notice how the named inputs make the provenance much more understandable - instead of anonymous values, we can see exactly which business inputs were used.

Complex Multi-Step Calculations

Here’s a more complex example showing how provenance tracks through multiple calculation steps:

from metricengine.provenance import calc_span

# Complex financial analysis with multiple steps
with calc_span("quarterly_comparison", analyst="jane_doe", period="Q1-Q2_2025"):
    # Q1 Analysis
    q1_revenue = FinancialValue(150000)
    q1_cogs = FinancialValue(90000)
    q1_opex = FinancialValue(25000)
    
    q1_gross_profit = q1_revenue - q1_cogs
    q1_operating_profit = q1_gross_profit - q1_opex
    
    # Q2 Analysis  
    q2_revenue = FinancialValue(180000)
    q2_cogs = FinancialValue(108000)
    q2_opex = FinancialValue(28000)
    
    q2_gross_profit = q2_revenue - q2_cogs
    q2_operating_profit = q2_gross_profit - q2_opex
    
    # Growth Analysis
    revenue_growth = (q2_revenue - q1_revenue) / q1_revenue
    profit_growth = (q2_operating_profit - q1_operating_profit) / q1_operating_profit

print(f"Revenue Growth: {revenue_growth.as_percentage()}")
print(f"Profit Growth: {profit_growth.as_percentage()}")

# Show the complete calculation tree
explanation = explain(profit_growth, max_depth=6)
print("\nComplete Profit Growth Calculation:")
print(explanation)

# Export the full provenance graph
trace = to_trace_json(profit_growth)
print(f"\nProvenance Graph Summary:")
print(f"Total calculation steps: {len(trace['nodes'])}")
print(f"Root operation: {trace['nodes'][trace['root']]['op']}")

# Show a sample of the detailed provenance
print("\nSample Provenance Node:")
root_node = trace['nodes'][trace['root']]
print(json.dumps(root_node, indent=4))

Output:

Revenue Growth: 20.00%
Profit Growth: 28.57%

Complete Profit Growth Calculation:
division (/)
├── subtraction (-)
│   ├── q2_operating_profit: $44,000.00 (-)
│   │   ├── q2_gross_profit: $72,000.00 (-)
│   │   │   ├── q2_revenue: $180,000.00 (literal)
│   │   │   └── q2_cogs: $108,000.00 (literal)
│   │   └── q2_opex: $28,000.00 (literal)
│   └── q1_operating_profit: $35,000.00 (-)
│       ├── q1_gross_profit: $60,000.00 (-)
│       │   ├── q1_revenue: $150,000.00 (literal)
│       │   └── q1_cogs: $90,000.00 (literal)
│       └── q1_opex: $25,000.00 (literal)
└── q1_operating_profit: $35,000.00 (-)
    ├── q1_gross_profit: $60,000.00 (-)
    │   ├── q1_revenue: $150,000.00 (literal)
    │   └── q1_cogs: $90,000.00 (literal)
    └── q1_opex: $25,000.00 (literal)

Provenance Graph Summary:
Total calculation steps: 15
Root operation: /

Sample Provenance Node:
{
    "id": "c4d5e6...",
    "op": "/",
    "inputs": [
        "a1b2c3...",
        "b2c3d4..."
    ],
    "meta": {
        "span": "quarterly_comparison",
        "span_attrs": {
            "analyst": "jane_doe",
            "period": "Q1-Q2_2025"
        }
    }
}

This example shows how provenance captures:

Complete calculation trees with all intermediate steps
Span context showing who performed the analysis and when
Hierarchical relationships between all calculations
Tamper-evident IDs for audit trail integrity

Calculation Spans

Calculation spans group related operations and add contextual metadata to provenance records. This is especially useful for organizing complex analyses and adding audit context.

Basic Spans

from metricengine.provenance import calc_span, explain, to_trace_json
import json

# Group related calculations under a span
with calc_span("quarterly_analysis"):
    revenue = FinancialValue(1000)
    cost = FinancialValue(600)
    margin = revenue - cost

print(f"Margin: {margin}")

# Show how span information appears in provenance
explanation = explain(margin)
print("\nCalculation with Span Context:")
print(explanation)

# Export to see span metadata
trace = to_trace_json(margin)
span_info = trace['nodes'][trace['root']]['meta']
print("\nSpan Information in Provenance:")
print(json.dumps(span_info, indent=4))

Output:

Margin: $400.00

Calculation with Span Context:
subtraction (-) [span: quarterly_analysis]
├── revenue: $1,000.00 (literal) [span: quarterly_analysis]
└── cost: $600.00 (literal) [span: quarterly_analysis]

Span Information in Provenance:
{
    "span": "quarterly_analysis"
}

Spans with Rich Attributes

# Add detailed attributes for comprehensive audit context
with calc_span("quarterly_analysis", 
               quarter="Q1", 
               year=2025, 
               analyst="john_smith",
               department="finance",
               review_status="preliminary"):
    
    revenue = FinancialValue(1000)
    cost = FinancialValue(600)
    margin = revenue - cost
    margin_pct = margin / revenue

print(f"Margin: {margin} ({margin_pct.as_percentage()})")

# Show rich span context
trace = to_trace_json(margin_pct)
root_meta = trace['nodes'][trace['root']]['meta']
print("\nRich Span Context:")
print(json.dumps(root_meta, indent=4))

Output:

Margin: $400.00 (40.00%)

Rich Span Context:
{
    "span": "quarterly_analysis",
    "span_attrs": {
        "quarter": "Q1",
        "year": 2025,
        "analyst": "john_smith",
        "department": "finance",
        "review_status": "preliminary"
    }
}

Nested Spans with Hierarchy

Spans can be nested to create hierarchical organization, perfect for complex analyses:

# Nested spans for hierarchical analysis
with calc_span("annual_analysis", year=2025, analyst="sarah_jones"):
    annual_revenue = FinancialValue(0)
    annual_profit = FinancialValue(0)

    for quarter in ["Q1", "Q2", "Q3", "Q4"]:
        with calc_span("quarterly_analysis", quarter=quarter):
            # Simulate quarterly data
            q_revenue = FinancialValue(1000 + (quarter == "Q4") * 200)  # Q4 bonus
            q_costs = FinancialValue(600)
            q_profit = q_revenue - q_costs
            
            annual_revenue = annual_revenue + q_revenue
            annual_profit = annual_profit + q_profit

    # Final analysis outside quarterly spans but inside annual span
    profit_margin = annual_profit / annual_revenue

print(f"Annual Revenue: {annual_revenue}")
print(f"Annual Profit: {annual_profit}")
print(f"Profit Margin: {profit_margin.as_percentage()}")

# Show the hierarchical span structure
explanation = explain(profit_margin, max_depth=4)
print("\nHierarchical Calculation Structure:")
print(explanation)

# Export to see nested span hierarchy
trace = to_trace_json(profit_margin)
sample_node = None
for node in trace['nodes'].values():
    if 'span_hierarchy' in node['meta']:
        sample_node = node
        break

if sample_node:
    print("\nNested Span Hierarchy Example:")
    print(json.dumps({
        'span': sample_node['meta']['span'],
        'span_hierarchy': sample_node['meta']['span_hierarchy'],
        'span_depth': sample_node['meta']['span_depth']
    }, indent=4))

Output:

Annual Revenue: $4,200.00
Annual Profit: $1,800.00
Profit Margin: 42.86%

Hierarchical Calculation Structure:
division (/) [span: annual_analysis]
├── annual_profit: $1,800.00 (+) [span: annual_analysis]
│   ├── accumulated_profit: $1,400.00 (+) [span: annual_analysis]
│   │   ├── q1_profit: $400.00 (-) [span: quarterly_analysis → annual_analysis]
│   │   └── q2_profit: $400.00 (-) [span: quarterly_analysis → annual_analysis]
│   └── q4_profit: $600.00 (-) [span: quarterly_analysis → annual_analysis]
└── annual_revenue: $4,200.00 (+) [span: annual_analysis]

Nested Span Hierarchy Example:
{
    "span": "quarterly_analysis",
    "span_hierarchy": [
        "annual_analysis",
        "quarterly_analysis"
    ],
    "span_depth": 2
}

The nested spans create a clear hierarchy showing:

Top-level context: Annual analysis by Sarah Jones
Sub-context: Individual quarterly analyses
Complete lineage: How quarterly results roll up to annual totals

Export and Analysis

Complete JSON Export

The JSON export provides the complete provenance graph in a structured format perfect for external analysis tools:

from metricengine.provenance import to_trace_json
import json

# Create a complex calculation for demonstration
revenue = FinancialValue(150000)
cogs = FinancialValue(90000)
opex = FinancialValue(25000)
tax_rate = FinancialValue(0.21)

gross_profit = revenue - cogs
operating_profit = gross_profit - opex
tax_amount = operating_profit * tax_rate
net_profit = operating_profit - tax_amount

# Export complete provenance graph
trace_data = to_trace_json(net_profit)

print("Provenance Graph Structure:")
print(f"Root node: {trace_data['root']}")
print(f"Total nodes: {len(trace_data['nodes'])}")

# Show the complete JSON structure
print("\nComplete Provenance Graph:")
print(json.dumps(trace_data, indent=4))

# Save to file for external analysis
with open("net_profit_calculation.json", "w") as f:
    json.dump(trace_data, f, indent=4)
    
print("\nSaved complete provenance to 'net_profit_calculation.json'")

Output:

Provenance Graph Structure:
Root node: d4e5f6...
Total nodes: 9

Complete Provenance Graph:
{
    "root": "d4e5f6...",
    "nodes": {
        "d4e5f6...": {
            "id": "d4e5f6...",
            "op": "-",
            "inputs": [
                "c3d4e5...",
                "b2c3d4..."
            ],
            "meta": {}
        },
        "c3d4e5...": {
            "id": "c3d4e5...",
            "op": "-",
            "inputs": [
                "a1b2c3...",
                "e5f6a7..."
            ],
            "meta": {}
        },
        "a1b2c3...": {
            "id": "a1b2c3...",
            "op": "-",
            "inputs": [
                "f6a7b8...",
                "a7b8c9..."
            ],
            "meta": {}
        },
        "f6a7b8...": {
            "id": "f6a7b8...",
            "op": "literal",
            "inputs": [],
            "meta": {
                "value": "150000.00"
            }
        },
        "a7b8c9...": {
            "id": "a7b8c9...",
            "op": "literal",
            "inputs": [],
            "meta": {
                "value": "90000.00"
            }
        },
        "e5f6a7...": {
            "id": "e5f6a7...",
            "op": "literal",
            "inputs": [],
            "meta": {
                "value": "25000.00"
            }
        },
        "b2c3d4...": {
            "id": "b2c3d4...",
            "op": "*",
            "inputs": [
                "c3d4e5...",
                "d4e5f6..."
            ],
            "meta": {}
        },
        "d4e5f6...": {
            "id": "d4e5f6...",
            "op": "literal",
            "inputs": [],
            "meta": {
                "value": "0.21"
            }
        }
    }
}

Saved complete provenance to 'net_profit_calculation.json'

Human-Readable Explanations

The explain function creates beautiful tree visualizations of calculations:

from metricengine.provenance import explain

# Generate explanation with different depth levels
print("Full Calculation Tree:")
full_explanation = explain(net_profit)
print(full_explanation)

print("\nLimited Depth (3 levels):")
limited_explanation = explain(net_profit, max_depth=3)
print(limited_explanation)

print("\nVery Limited Depth (2 levels):")
shallow_explanation = explain(net_profit, max_depth=2)
print(shallow_explanation)

Output:

Full Calculation Tree:
subtraction (-)
├── operating_profit: $35,000.00 (-)
│   ├── gross_profit: $60,000.00 (-)
│   │   ├── revenue: $150,000.00 (literal)
│   │   └── cogs: $90,000.00 (literal)
│   └── opex: $25,000.00 (literal)
└── tax_amount: $7,350.00 (*)
    ├── operating_profit: $35,000.00 (-)
    │   ├── gross_profit: $60,000.00 (-)
    │   │   ├── revenue: $150,000.00 (literal)
    │   │   └── cogs: $90,000.00 (literal)
    │   └── opex: $25,000.00 (literal)
    └── tax_rate: 0.21 (literal)

Limited Depth (3 levels):
subtraction (-)
├── operating_profit: $35,000.00 (-)
│   ├── gross_profit: $60,000.00 (-)
│   └── opex: $25,000.00 (literal)
└── tax_amount: $7,350.00 (*)
    ├── operating_profit: $35,000.00 (-)
    └── tax_rate: 0.21 (literal)

Very Limited Depth (2 levels):
subtraction (-)
├── operating_profit: $35,000.00 (-)
└── tax_amount: $7,350.00 (*)

Visual Diagrams with Graphviz

For complex calculations, visual diagrams can be extremely helpful for understanding calculation flow and presenting results to stakeholders. Here’s how to create professional Graphviz diagrams from provenance data:

from metricengine.provenance import get_provenance_graph, to_trace_json

def create_graphviz_diagram(financial_value, filename="calculation_graph"):
    """Create a Graphviz diagram from provenance data."""
    
    trace = to_trace_json(financial_value)
    
    # Start building Graphviz DOT format
    dot_lines = [
        "digraph CalculationGraph {",
        "    rankdir=TB;",
        "    node [shape=box, style=filled];",
        ""
    ]
    
    # Add nodes with styling based on operation type
    for node_id, node in trace["nodes"].items():
        short_id = node_id[:8] + "..."
        
        if node["op"] == "literal":
            # Style literal values
            value = node["meta"].get("value", "unknown")
            label = f"Literal\\n{value}"
            color = "lightblue"
        elif node["op"].startswith("calc:"):
            # Style calculations
            calc_name = node["op"].replace("calc:", "")
            label = f"Calculation\\n{calc_name}"
            color = "lightgreen"
        else:
            # Style operations
            label = f"Operation\\n{node['op']}"
            color = "lightyellow"
        
        dot_lines.append(f'    "{short_id}" [label="{label}", fillcolor="{color}"];')
    
    dot_lines.append("")
    
    # Add edges
    for node_id, node in trace["nodes"].items():
        short_id = node_id[:8] + "..."
        for input_id in node["inputs"]:
            short_input_id = input_id[:8] + "..."
            dot_lines.append(f'    "{short_input_id}" -> "{short_id}";')
    
    dot_lines.extend(["", "}"])
    
    # Write DOT file
    dot_content = "\n".join(dot_lines)
    with open(f"{filename}.dot", "w") as f:
        f.write(dot_content)
    
    print(f"Graphviz DOT file saved as '{filename}.dot'")
    print("To generate PNG: dot -Tpng calculation_graph.dot -o calculation_graph.png")
    print("To generate SVG: dot -Tsvg calculation_graph.dot -o calculation_graph.svg")
    
    return dot_content

# Create visual diagram
dot_content = create_graphviz_diagram(net_profit, "net_profit_calculation")
print("\nGenerated Graphviz DOT content:")
print(dot_content)

Output:

Graphviz DOT file saved as 'net_profit_calculation.dot'
To generate PNG: dot -Tpng calculation_graph.dot -o calculation_graph.png
To generate SVG: dot -Tsvg calculation_graph.dot -o calculation_graph.svg

Generated Graphviz DOT content:
digraph CalculationGraph {
    rankdir=TB;
    node [shape=box, style=filled];

    "f6a7b8c9..." [label="Literal\n150000.00", fillcolor="lightblue"];
    "a7b8c9d0..." [label="Literal\n90000.00", fillcolor="lightblue"];
    "e5f6a7b8..." [label="Literal\n25000.00", fillcolor="lightblue"];
    "d4e5f6a7..." [label="Literal\n0.21", fillcolor="lightblue"];
    "a1b2c3d4..." [label="Operation\n-", fillcolor="lightyellow"];
    "c3d4e5f6..." [label="Operation\n-", fillcolor="lightyellow"];
    "b2c3d4e5..." [label="Operation\n*", fillcolor="lightyellow"];
    "d4e5f6a7..." [label="Operation\n-", fillcolor="lightyellow"];

    "f6a7b8c9..." -> "a1b2c3d4...";
    "a7b8c9d0..." -> "a1b2c3d4...";
    "a1b2c3d4..." -> "c3d4e5f6...";
    "e5f6a7b8..." -> "c3d4e5f6...";
    "c3d4e5f6..." -> "b2c3d4e5...";
    "d4e5f6a7..." -> "b2c3d4e5...";
    "c3d4e5f6..." -> "d4e5f6a7...";
    "b2c3d4e5..." -> "d4e5f6a7...";

}

Interactive Web Visualizations

For web applications, you can create interactive provenance explorers using popular JavaScript visualization libraries:

def create_interactive_visualization_data(financial_value):
    """Create data structure optimized for web visualization libraries like D3.js, vis.js, or Cytoscape.js"""
    
    trace = to_trace_json(financial_value)
    
    # Format for D3.js force-directed graph
    nodes = []
    links = []
    
    for node_id, node in trace["nodes"].items():
        # Create node data with styling information
        node_data = {
            "id": node_id[:12],  # Shorter IDs for display
            "label": node["op"],
            "type": "literal" if node["op"] == "literal" else "operation",
            "value": node["meta"].get("value"),
            "calculation": node["meta"].get("calculation"),
            "span": node["meta"].get("span"),
            "group": 1 if node["op"] == "literal" else 2
        }
        
        # Add input name if available
        if "input_name" in node["meta"]:
            node_data["input_name"] = node["meta"]["input_name"]
            node_data["label"] = f"{node['meta']['input_name']}: {node['meta'].get('value', node['op'])}"
        
        nodes.append(node_data)
        
        # Create links (edges)
        for input_id in node["inputs"]:
            links.append({
                "source": input_id[:12],
                "target": node_id[:12],
                "type": "dependency"
            })
    
    return {
        "nodes": nodes,
        "links": links,
        "root": trace["root"][:12],
        "metadata": {
            "total_nodes": len(nodes),
            "total_links": len(links),
            "calculation_depth": len([n for n in nodes if n["type"] == "operation"])
        }
    }

# Generate interactive visualization data
interactive_data = create_interactive_visualization_data(net_profit)

print("Interactive Visualization Data:")
print(json.dumps(interactive_data, indent=4))

# Save for web application
with open("calculation_visualization.json", "w") as f:
    json.dump(interactive_data, f, indent=4)

print("\nData saved for web visualization!")
print("Use with D3.js, vis.js, Cytoscape.js, or other graph libraries")

Output:

{
    "nodes": [
        {
            "id": "f6a7b8c9d0e1",
            "label": "literal",
            "type": "literal",
            "value": "150000.00",
            "calculation": null,
            "span": null,
            "group": 1
        },
        {
            "id": "d4e5f6a7b8c9",
            "label": "-",
            "type": "operation",
            "value": null,
            "calculation": null,
            "span": null,
            "group": 2
        }
    ],
    "links": [
        {
            "source": "f6a7b8c9d0e1",
            "target": "d4e5f6a7b8c9",
            "type": "dependency"
        }
    ],
    "root": "d4e5f6a7b8c9",
    "metadata": {
        "total_nodes": 8,
        "total_links": 7,
        "calculation_depth": 4
    }
}

Provenance Graph Analysis

from metricengine.provenance import get_provenance_graph
from collections import Counter

# Get complete provenance graph as dictionary
graph = get_provenance_graph(net_profit)

# Analyze the graph structure
print("📊 Provenance Graph Analysis:")
print(f"Total nodes: {len(graph)}")
print(f"Root operation: {graph[net_profit.get_provenance().id].op}")

# Categorize operations
literals = [prov for prov in graph.values() if prov.op == "literal"]
operations = [prov for prov in graph.values() if prov.op in ["+", "-", "*", "/"]]
calculations = [prov for prov in graph.values() if prov.op.startswith("calc:")]

print(f"\n📈 Operation Breakdown:")
print(f"  💎 Literal inputs: {len(literals)}")
print(f"  ⚙️  Arithmetic operations: {len(operations)}")
print(f"  🧮 Engine calculations: {len(calculations)}")

# Show operation frequency
op_counts = Counter(prov.op for prov in graph.values())
print(f"\n📋 Operation Frequency:")
for op, count in op_counts.most_common():
    emoji = "💎" if op == "literal" else "⚙️" if op in ["+", "-", "*", "/"] else "🧮"
    print(f"  {emoji} {op}: {count}")

# Find the calculation path depth
def calculate_depth(prov_id, graph, visited=None):
    if visited is None:
        visited = set()
    if prov_id in visited:
        return 0  # Avoid cycles
    visited.add(prov_id)
    
    prov = graph[prov_id]
    if not prov.inputs:
        return 1
    return 1 + max(calculate_depth(input_id, graph, visited.copy()) 
                   for input_id in prov.inputs)

depth = calculate_depth(net_profit.get_provenance().id, graph)
print(f"\n🏗️  Calculation depth: {depth} levels")

# Identify critical path (longest dependency chain)
def find_critical_path(prov_id, graph, path=None):
    if path is None:
        path = []
    
    prov = graph[prov_id]
    current_path = path + [prov.op]
    
    if not prov.inputs:
        return current_path
    
    # Find the longest path among all inputs
    longest_path = current_path
    for input_id in prov.inputs:
        input_path = find_critical_path(input_id, graph, current_path)
        if len(input_path) > len(longest_path):
            longest_path = input_path
    
    return longest_path

critical_path = find_critical_path(net_profit.get_provenance().id, graph)
print(f"\n🎯 Critical Path (longest dependency chain):")
for i, op in enumerate(critical_path):
    indent = "  " * i
    print(f"{indent}└─ {op}")

# Memory usage estimation
import sys
total_memory = sum(sys.getsizeof(prov) for prov in graph.values())
print(f"\n💾 Estimated memory usage: {total_memory:,} bytes ({total_memory/1024:.1f} KB)")

Output:

📊 Provenance Graph Analysis:
Total nodes: 8
Root operation: -

📈 Operation Breakdown:
  💎 Literal inputs: 4
  ⚙️  Arithmetic operations: 4
  🧮 Engine calculations: 0

📋 Operation Frequency:
  💎 literal: 4
  ⚙️ -: 3
  ⚙️ *: 1

🏗️  Calculation depth: 4 levels

🎯 Critical Path (longest dependency chain):
└─ -
  └─ -
    └─ -
      └─ literal

💾 Estimated memory usage: 2,847 bytes (2.8 KB)

Real-World Example: Financial Statement Analysis

Here’s a comprehensive example showing how provenance tracking works with a realistic financial analysis:

from metricengine import Engine, FinancialValue
from metricengine.provenance import calc_span, explain, to_trace_json
from metricengine.factories import money, percent
import json

# Initialize engine
engine = Engine()

# Perform comprehensive financial analysis with spans and named inputs
with calc_span("quarterly_financial_analysis", 
               quarter="Q1", 
               year=2025, 
               analyst="sarah_chen",
               department="corporate_finance"):
    
    # Revenue Analysis
    with calc_span("revenue_analysis", segment="total_revenue"):
        product_sales = money(450000)
        service_revenue = money(180000)
        other_income = money(15000)
        total_revenue = product_sales + service_revenue + other_income
    
    # Cost Analysis  
    with calc_span("cost_analysis", segment="total_costs"):
        product_cogs = money(270000)
        service_costs = money(90000)
        total_cogs = product_cogs + service_costs
        
        # Operating expenses
        salaries = money(85000)
        rent = money(12000)
        marketing = money(18000)
        other_opex = money(8000)
        total_opex = salaries + rent + marketing + other_opex
    
    # Profitability Analysis
    with calc_span("profitability_analysis", segment="margins"):
        gross_profit = total_revenue - total_cogs
        operating_profit = gross_profit - total_opex
        
        # Tax calculation
        tax_rate = percent(21, input="percent")
        tax_amount = operating_profit * tax_rate
        net_profit = operating_profit - tax_amount
        
        # Key ratios
        gross_margin = gross_profit / total_revenue
        operating_margin = operating_profit / total_revenue
        net_margin = net_profit / total_revenue

# Display results
print("=== Q1 2025 Financial Analysis ===")
print(f"Total Revenue: {total_revenue}")
print(f"Gross Profit: {gross_profit} ({gross_margin.as_percentage()})")
print(f"Operating Profit: {operating_profit} ({operating_margin.as_percentage()})")
print(f"Net Profit: {net_profit} ({net_margin.as_percentage()})")

# Show the complete calculation breakdown
print("\n=== Complete Calculation Breakdown ===")
explanation = explain(net_margin, max_depth=5)
print(explanation)

# Export detailed provenance for audit
trace = to_trace_json(net_margin)
print(f"\n=== Provenance Summary ===")
print(f"Total calculation steps: {len(trace['nodes'])}")
print(f"Root calculation: {trace['nodes'][trace['root']]['op']}")

# Show sample provenance node with span information
sample_nodes = []
for node_id, node in trace['nodes'].items():
    if 'span' in node['meta']:
        sample_nodes.append(node)

if sample_nodes:
    print("\n=== Sample Provenance Node with Span Context ===")
    print(json.dumps(sample_nodes[0], indent=4))

# Create audit trail
audit_trail = {
    "analysis_metadata": {
        "quarter": "Q1",
        "year": 2025,
        "analyst": "sarah_chen",
        "department": "corporate_finance",
        "timestamp": "2025-01-15T14:30:00Z"
    },
    "financial_results": {
        "total_revenue": str(total_revenue),
        "gross_profit": str(gross_profit),
        "operating_profit": str(operating_profit),
        "net_profit": str(net_profit),
        "net_margin": str(net_margin.as_percentage())
    },
    "calculation_tree": explanation,
    "complete_provenance": trace
}

# Save comprehensive audit trail
with open("q1_2025_financial_analysis.json", "w") as f:
    json.dump(audit_trail, f, indent=4)

print("\n=== Audit Trail Saved ===")
print("Complete analysis saved to 'q1_2025_financial_analysis.json'")
print("This file contains:")
print("  • All calculation results")
print("  • Complete provenance graph") 
print("  • Analyst and context information")
print("  • Human-readable calculation tree")

Output:

=== Q1 2025 Financial Analysis ===
Total Revenue: $645,000.00
Gross Profit: $285,000.00 (44.19%)
Operating Profit: $162,000.00 (25.12%)
Net Profit: $127,980.00 (19.84%)

=== Complete Calculation Breakdown ===
division (/) [span: profitability_analysis → quarterly_financial_analysis]
├── net_profit: $127,980.00 (-) [span: profitability_analysis → quarterly_financial_analysis]
│   ├── operating_profit: $162,000.00 (-) [span: profitability_analysis → quarterly_financial_analysis]
│   │   ├── gross_profit: $285,000.00 (-) [span: profitability_analysis → quarterly_financial_analysis]
│   │   │   ├── total_revenue: $645,000.00 (+) [span: revenue_analysis → quarterly_financial_analysis]
│   │   │   │   ├── product_sales: $450,000.00 (literal) [span: revenue_analysis → quarterly_financial_analysis]
│   │   │   │   └── service_revenue: $180,000.00 (literal) [span: revenue_analysis → quarterly_financial_analysis]
│   │   │   └── total_cogs: $360,000.00 (+) [span: cost_analysis → quarterly_financial_analysis]
│   │   │       ├── product_cogs: $270,000.00 (literal) [span: cost_analysis → quarterly_financial_analysis]
│   │   │       └── service_costs: $90,000.00 (literal) [span: cost_analysis → quarterly_financial_analysis]
│   │   └── total_opex: $123,000.00 (+) [span: cost_analysis → quarterly_financial_analysis]
│   │       ├── salaries: $85,000.00 (literal) [span: cost_analysis → quarterly_financial_analysis]
│   │       └── rent: $12,000.00 (literal) [span: cost_analysis → quarterly_financial_analysis]
│   └── tax_amount: $34,020.00 (*) [span: profitability_analysis → quarterly_financial_analysis]
│       ├── operating_profit: $162,000.00 (-) [span: profitability_analysis → quarterly_financial_analysis]
│       └── tax_rate: 21.00% (literal) [span: profitability_analysis → quarterly_financial_analysis]
└── total_revenue: $645,000.00 (+) [span: revenue_analysis → quarterly_financial_analysis]

=== Provenance Summary ===
Total calculation steps: 18
Root calculation: /

=== Sample Provenance Node with Span Context ===
{
    "id": "a1b2c3...",
    "op": "+",
    "inputs": [
        "b2c3d4...",
        "c3d4e5..."
    ],
    "meta": {
        "span": "revenue_analysis",
        "span_hierarchy": [
            "quarterly_financial_analysis",
            "revenue_analysis"
        ],
        "span_depth": 2,
        "span_attrs": {
            "segment": "total_revenue"
        }
    }
}

=== Audit Trail Saved ===
Complete analysis saved to 'q1_2025_financial_analysis.json'
This file contains:
  • All calculation results
  • Complete provenance graph
  • Analyst and context information
  • Human-readable calculation tree

This example demonstrates the full power of provenance tracking:

Hierarchical spans organize the analysis into logical segments
Complete audit trail shows every calculation step
Rich metadata captures analyst, timing, and context information
Visual tree makes complex calculations easy to understand
JSON export enables integration with external audit tools

Provenance Graph Analysis

from metricengine.provenance import get_provenance_graph

# Get complete provenance graph as dictionary
graph = get_provenance_graph(net_margin)

# Analyze the graph structure
print("Detailed Provenance Analysis:")
print(f"Total nodes: {len(graph)}")

# Categorize operations
literals = [prov for prov in graph.values() if prov.op == "literal"]
operations = [prov for prov in graph.values() if prov.op in ["+", "-", "*", "/"]]
calculations = [prov for prov in graph.values() if prov.op.startswith("calc:")]

print(f"\nOperation Breakdown:")
print(f"  Literal inputs: {len(literals)}")
print(f"  Arithmetic operations: {len(operations)}")
print(f"  Engine calculations: {len(calculations)}")

# Show operation frequency
from collections import Counter
op_counts = Counter(prov.op for prov in graph.values())
print(f"\nOperation Frequency:")
for op, count in op_counts.most_common():
    print(f"  '{op}': {count} times")

# Analyze span usage
span_counts = Counter()
for prov in graph.values():
    if 'span' in prov.meta:
        span_counts[prov.meta['span']] += 1

if span_counts:
    print(f"\nSpan Usage:")
    for span, count in span_counts.most_common():
        print(f"  '{span}': {count} operations")

Output:

Detailed Provenance Analysis:
Total nodes: 18
Root operation: /

Operation Breakdown:
  Literal inputs: 8
  Arithmetic operations: 10
  Engine calculations: 0

Operation Frequency:
  'literal': 8 times
  '+': 4 times
  '-': 4 times
  '*': 1 times
  '/': 1 times

Span Usage:
  'profitability_analysis': 8 operations
  'revenue_analysis': 4 operations
  'cost_analysis': 6 operations

Debugging with Provenance

Tracing Calculation Errors

def debug_calculation(value):
    """Debug a financial value by examining its provenance."""
    if not value.has_provenance():
        print("No provenance available")
        return

    prov = value.get_provenance()
    print(f"Value: {value}")
    print(f"Operation: {prov.op}")

    if prov.meta:
        print("Metadata:")
        for key, val in prov.meta.items():
            print(f"  {key}: {val}")

    if prov.inputs:
        print(f"Depends on {len(prov.inputs)} inputs")
        # Could recursively debug inputs here

# Use for debugging
problematic_result = complex_calculation()
debug_calculation(problematic_result)

Finding Specific Operations

def find_operations(value, operation_type):
    """Find all operations of a specific type in the provenance chain."""
    graph = get_provenance_graph(value)

    matching_ops = []
    for prov in graph.values():
        if prov.op == operation_type:
            matching_ops.append(prov)

    return matching_ops

# Find all division operations
divisions = find_operations(result, "/")
print(f"Found {len(divisions)} division operations")

# Find all engine calculations
calculations = find_operations(result, "calc:gross_margin")
print(f"Found {len(calculations)} gross margin calculations")

Performance Optimization

Selective Provenance Tracking

from metricengine.provenance_config import provenance_config

# Disable provenance for performance-critical sections
def fast_calculation():
    with provenance_config(enabled=False):
        # No provenance overhead here
        result = expensive_operation()
    return result

# Enable only essential tracking
def optimized_calculation():
    with provenance_config(
        track_literals=False,  # Skip literal tracking
        enable_spans=False,    # Disable spans
    ):
        result = calculation_with_many_literals()
    return result

Batch Processing

# For batch processing, consider disabling provenance
def process_batch(items):
    results = []

    with provenance_config(enabled=False):
        for item in items:
            # Process without provenance for speed
            result = process_item(item)
            results.append(result)

    return results

# Or use minimal provenance
def process_batch_with_minimal_provenance(items):
    results = []

    with provenance_config(
        track_literals=False,
        track_operations=True,  # Keep operation tracking
        enable_spans=False,
    ):
        for item in items:
            result = process_item(item)
            results.append(result)

    return results

Auditing and Compliance

Audit Trail Generation

def generate_audit_trail(result, filename):
    """Generate a comprehensive audit trail for a calculation."""

    # Get complete provenance information
    trace_data = to_trace_json(result)
    explanation = explain(result)

    # Create audit report
    audit_report = {
        "timestamp": datetime.now().isoformat(),
        "result_value": str(result),
        "calculation_tree": explanation,
        "provenance_graph": trace_data,
        "metadata": {
            "total_operations": len(trace_data["nodes"]),
            "root_operation": trace_data["nodes"][trace_data["root"]]["op"],
        }
    }

    # Save audit trail
    with open(filename, "w") as f:
        json.dump(audit_report, f, indent=2)

    return audit_report

# Generate audit trail
audit = generate_audit_trail(important_result, "audit_trail.json")

Compliance Verification

def verify_calculation_compliance(result, required_inputs):
    """Verify that a calculation used all required inputs."""

    graph = get_provenance_graph(result)

    # Find all literal inputs
    literal_inputs = []
    for prov in graph.values():
        if prov.op == "literal" and "input_name" in prov.meta:
            literal_inputs.append(prov.meta["input_name"])

    # Check compliance
    missing_inputs = set(required_inputs) - set(literal_inputs)

    if missing_inputs:
        raise ValueError(f"Missing required inputs: {missing_inputs}")

    return True

# Verify compliance
required = ["revenue", "cost", "tax_rate"]
verify_calculation_compliance(result, required)

Best Practices

1. Use Meaningful Names

# Good: Use descriptive names for inputs
result = engine.calculate("net_profit", {
    "gross_revenue": 1000,
    "operating_expenses": 300,
    "tax_rate": 0.25
})

# Avoid: Generic names that don't help with debugging
result = engine.calculate("net_profit", {
    "input1": 1000,
    "input2": 300,
    "input3": 0.25
})

2. Use Spans for Context

# Group related calculations under meaningful spans
with calc_span("monthly_financial_analysis", month="January", year=2025):
    # All calculations inherit the span context
    revenue = calculate_monthly_revenue()
    expenses = calculate_monthly_expenses()
    profit = revenue - expenses

3. Export for Documentation

# Document complex calculations by exporting provenance
def document_calculation(result, description):
    """Document a calculation with its provenance."""

    explanation = explain(result)

    documentation = f"""
# {description}

## Result
{result}

## Calculation Tree

{explanation}

## Generated on: {datetime.now().isoformat()}
"""

    return documentation

# Use for documentation
doc = document_calculation(quarterly_profit, "Q1 2025 Profit Calculation")

4. Handle Missing Provenance Gracefully

def safe_provenance_access(value):
    """Safely access provenance information."""

    if not value.has_provenance():
        return "No provenance available"

    try:
        prov = value.get_provenance()
        return f"Operation: {prov.op}, Inputs: {len(prov.inputs)}"
    except Exception as e:
        return f"Error accessing provenance: {e}"

# Always check before accessing provenance
info = safe_provenance_access(some_value)

5. Monitor Performance Impact

import time

def benchmark_with_provenance():
    """Benchmark calculation with and without provenance."""

    # Test with provenance
    start = time.time()
    result_with = complex_calculation()
    time_with = time.time() - start

    # Test without provenance
    with provenance_config(enabled=False):
        start = time.time()
        result_without = complex_calculation()
        time_without = time.time() - start

    overhead = (time_with - time_without) / time_without * 100
    print(f"Provenance overhead: {overhead:.2f}%")

    return overhead

# Monitor overhead regularly
overhead = benchmark_with_provenance()