Code Examples

Five working examples that demonstrate ACP from basic consensus queries through domain-specific custom axioms. Each example includes complete code, expected output metrics, and interpretation guidance.

Prerequisites

Python 3.11+ installed
OpenRouter API key -- get one at openrouter.ai/keys
ACP-PROJECT repository cloned

Setup

cd ACP-PROJECT

python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

pip install -r requirements.txt

export OPENROUTER_API_KEY="sk-or-v1-YOUR_KEY"
# Windows: set OPENROUTER_API_KEY=sk-or-v1-YOUR_KEY

Working directory

All examples must be run from the project root directory. Running from any other location will cause ModuleNotFoundError: No module named 'src'.

Example 1: Simple Consensus Query

The most basic ACP use case: send a simple factual question to multiple models and observe how they converge on a single answer with a near-zero D-score.

01_simple_query.py

import asyncio
import os
from src.engine import ConsensusEngine, ConsensusConfig
from src.llm.openrouter_llm import OpenRouterLLM

async def main():
    api_key = os.environ["OPENROUTER_API_KEY"]

    models = [
        OpenRouterLLM(api_key=api_key, model="openai/gpt-5.4-mini"),
        OpenRouterLLM(api_key=api_key, model="anthropic/claude-haiku-4-5"),
    ]

    config = ConsensusConfig(
        max_iterations=5,
        D_threshold=0.1,
    )

    engine = ConsensusEngine(models=models, config=config)
    result = await engine.run("What is 2 + 2?")

    print(f"Consensus reached: {result.consensus_reached}")
    print(f"D-score: {result.final_D:.4f}")
    print(f"Iterations: {result.iterations_used}")
    print(f"Answer: {result.final_answer}")

asyncio.run(main())

Expected output

Output

Consensus reached: True
D-score: 0.0000
Iterations: 1
Answer: 4

A D-score of 0.0 indicates perfect agreement. For a trivially verifiable fact like "2 + 2 = 4", all models converge immediately in a single iteration.

Example 2: Fact Checking

Verify factual claims across multiple domains by grounding consensus in specific axiom levels. This example checks mathematical constants, physical constants, algorithm complexity, protocol definitions, and language characteristics.

02_fact_checking.py

import asyncio
import os
from src.engine import ConsensusEngine, ConsensusConfig
from src.llm.openrouter_llm import OpenRouterLLM

QUERIES = [
    {
        "query": "What is the value of Pi to 10 decimal places?",
        "axiom_level": [1],  # Mathematical axioms
    },
    {
        "query": "What is the speed of light in vacuum in m/s?",
        "axiom_level": [2],  # Physical axioms
    },
    {
        "query": "What is the average-case time complexity of QuickSort?",
        "axiom_level": [4],  # Computable axioms
    },
    {
        "query": "What transport protocol does TCP use for reliable delivery?",
        "axiom_level": [6],  # Protocol axioms
    },
    {
        "query": "Is Python statically or dynamically typed?",
        "axiom_level": [7],  # Linguistic axioms
    },
]

async def main():
    api_key = os.environ["OPENROUTER_API_KEY"]

    models = [
        OpenRouterLLM(api_key=api_key, model="openai/gpt-5.4-mini"),
        OpenRouterLLM(api_key=api_key, model="anthropic/claude-haiku-4-5"),
        OpenRouterLLM(api_key=api_key, model="google/gemini-flash-1.5"),
    ]

    config = ConsensusConfig(max_iterations=5, D_threshold=0.05)
    engine = ConsensusEngine(models=models, config=config)

    for item in QUERIES:
        result = await engine.run(
            query=item["query"],
            axiom_level=item["axiom_level"],
        )
        print(f"Query: {item['query']}")
        print(f"  D-score: {result.final_D:.4f}")
        print(f"  Consensus: {result.consensus_reached}")
        print(f"  Answer: {result.final_answer}")
        print()

asyncio.run(main())

Expected output

Output

Query: What is the value of Pi to 10 decimal places?
  D-score: 0.0000
  Consensus: True
  Answer: 3.1415926535

Query: What is the speed of light in vacuum in m/s?
  D-score: 0.0000
  Consensus: True
  Answer: 299,792,458 m/s

Query: What is the average-case time complexity of QuickSort?
  D-score: 0.0200
  Consensus: True
  Answer: O(n log n)

Query: What transport protocol does TCP use for reliable delivery?
  D-score: 0.0100
  Consensus: True
  Answer: TCP uses acknowledgments and retransmission for reliable delivery

Query: Is Python statically or dynamically typed?
  D-score: 0.0300
  Consensus: True
  Answer: Python is dynamically typed

All queries achieve consensus well below the 0.05 D-threshold. Mathematical and physical constants achieve perfect 0.0 scores because the answers are unambiguous. Slightly higher D-scores on descriptive questions reflect minor wording variation between models, not disagreement.

Example 3: Code Review

Use consensus to review code for bugs, edge cases, and security vulnerabilities. This example reviews correct implementations, buggy code, algorithm correctness, and security issues.

03_code_review.py

import asyncio
import os
from src.engine import ConsensusEngine, ConsensusConfig
from src.llm.openrouter_llm import OpenRouterLLM

BUGGY_CODE = """
def binary_search(arr, target):
    low, high = 0, len(arr)
    while low < high:
        mid = (low + high) // 2
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            low = mid
        else:
            high = mid
    return -1
"""

SQL_CODE = """
def get_user(username):
    query = f"SELECT * FROM users WHERE name = '{username}'"
    return db.execute(query)
"""

async def main():
    api_key = os.environ["OPENROUTER_API_KEY"]

    models = [
        OpenRouterLLM(api_key=api_key, model="openai/gpt-5.4-mini"),
        OpenRouterLLM(api_key=api_key, model="anthropic/claude-haiku-4-5"),
        OpenRouterLLM(api_key=api_key, model="google/gemini-flash-1.5"),
    ]

    config = ConsensusConfig(max_iterations=7, D_threshold=0.15)
    engine = ConsensusEngine(models=models, config=config)

    # Review buggy binary search
    result = await engine.run(
        query=f"Review this binary search for bugs:\n{BUGGY_CODE}"
    )
    print(f"Binary Search Review:")
    print(f"  D-score: {result.final_D:.4f}")
    print(f"  Finding: {result.final_answer}")
    print()

    # Review SQL injection vulnerability
    result = await engine.run(
        query=f"Review this code for security vulnerabilities:\n{SQL_CODE}"
    )
    print(f"SQL Code Review:")
    print(f"  D-score: {result.final_D:.4f}")
    print(f"  Finding: {result.final_answer}")

asyncio.run(main())

Expected output

Output

Binary Search Review:
  D-score: 0.0800
  Finding: Bug found: low = mid should be low = mid + 1 to avoid
  infinite loop when target is greater than arr[mid]. The current
  implementation will loop forever when low + 1 == high.

SQL Code Review:
  D-score: 0.0400
  Finding: Critical SQL injection vulnerability. User input is
  directly interpolated into the query string via f-string.
  Use parameterized queries instead:
  db.execute("SELECT * FROM users WHERE name = ?", (username,))

The lower D-score on the SQL injection finding reflects stronger consensus -- SQL injection is a well-known vulnerability that all models identify with high confidence. The binary search bug has a slightly higher D-score because models may phrase the diagnosis differently.

Example 4: Research Synthesis

Synthesize information from multiple AI perspectives on architectural decisions, best practices, and technology comparisons. Uses higher iteration limits to allow models to refine and converge on nuanced topics.

04_research_synthesis.py

import asyncio
import os
from src.engine import ConsensusEngine, ConsensusConfig
from src.llm.openrouter_llm import OpenRouterLLM

RESEARCH_QUERIES = [
    "Microservices vs monolith: when should you migrate?",
    "What are current best practices for password hashing in 2025?",
    "WebSockets vs Server-Sent Events: trade-offs for real-time apps?",
    "What scaling strategies work best for read-heavy workloads?",
]

async def main():
    api_key = os.environ["OPENROUTER_API_KEY"]

    models = [
        OpenRouterLLM(api_key=api_key, model="openai/gpt-5.4-mini"),
        OpenRouterLLM(api_key=api_key, model="anthropic/claude-haiku-4-5"),
        OpenRouterLLM(api_key=api_key, model="google/gemini-flash-1.5"),
    ]

    config = ConsensusConfig(max_iterations=12, D_threshold=0.20)
    engine = ConsensusEngine(models=models, config=config)

    for query in RESEARCH_QUERIES:
        result = await engine.run(query=query)
        print(f"Query: {query}")
        print(f"  D-score: {result.final_D:.4f}")
        print(f"  Iterations: {result.iterations_used}")
        print(f"  Consensus: {result.consensus_reached}")
        print(f"  Summary: {result.final_answer[:150]}...")
        print()

asyncio.run(main())

Expected output

Output

Query: Microservices vs monolith: when should you migrate?
  D-score: 0.1800
  Iterations: 8
  Consensus: True
  Summary: Migrate to microservices when your team exceeds ~15 engineers,
  deployment frequency is blocked by monolith coupling, and you need
  independent scaling...

Query: What are current best practices for password hashing in 2025?
  D-score: 0.0600
  Iterations: 3
  Consensus: True
  Summary: Use Argon2id as the primary recommendation (OWASP). bcrypt
  remains acceptable. Key parameters: memory >= 19 MiB, iterations >= 2...

Query: WebSockets vs Server-Sent Events: trade-offs for real-time apps?
  D-score: 0.1500
  Iterations: 6
  Consensus: True
  Summary: WebSockets for bidirectional communication (chat, gaming).
  SSE for server-to-client streaming (notifications, feeds). SSE is
  simpler and works through HTTP proxies...

Query: What scaling strategies work best for read-heavy workloads?
  D-score: 0.1200
  Iterations: 5
  Consensus: True
  Summary: Read replicas as the primary strategy, combined with
  application-level caching (Redis/Memcached) and CDN for static
  assets. Consider CQRS for complex domains...

Research queries typically require more iterations (5-8) and produce moderate D-scores (0.10-0.20) because the topics involve nuanced trade-offs. Password hashing converges faster because established standards exist.

Example 5: Custom Axioms

Define domain-specific axioms to constrain consensus within your organization's requirements. This is the most powerful ACP pattern for enterprise use cases -- compliance validation, API design review, and policy enforcement.

05_custom_axioms.py

import asyncio
import os
from src.engine import ConsensusEngine, ConsensusConfig
from src.llm.openrouter_llm import OpenRouterLLM

async def main():
    api_key = os.environ["OPENROUTER_API_KEY"]

    models = [
        OpenRouterLLM(api_key=api_key, model="openai/gpt-5.4-mini"),
        OpenRouterLLM(api_key=api_key, model="anthropic/claude-haiku-4-5"),
        OpenRouterLLM(api_key=api_key, model="google/gemini-flash-1.5"),
    ]

    config = ConsensusConfig(max_iterations=7, D_threshold=0.15)
    engine = ConsensusEngine(models=models, config=config)

    # --- API Design with Performance Constraints ---

    api_axioms = [
        "All API endpoints must follow RESTful conventions",
        "Response time must be under 200ms at P99",
        "All endpoints require JWT authentication",
        "Pagination is required for list endpoints",
    ]

    result = await engine.run(
        query="Evaluate this API design: GET /api/users returns "
              "all users without pagination or authentication",
        relevant_axioms=api_axioms,
    )
    print(f"API Design Review:")
    print(f"  D-score: {result.final_D:.4f}")
    print(f"  Finding: {result.final_answer}")
    print()

    # --- Security with Compliance Requirements ---

    security_axioms = [
        "All data at rest must be encrypted (AES-256)",
        "PII must not appear in logs",
        "API keys must be rotated every 90 days",
    ]

    result = await engine.run(
        query="Our application logs full request bodies including "
              "user email and phone number. Is this compliant?",
        relevant_axioms=security_axioms,
    )
    print(f"Security Compliance Review:")
    print(f"  D-score: {result.final_D:.4f}")
    print(f"  Finding: {result.final_answer}")

asyncio.run(main())

Expected output

Output

API Design Review:
  D-score: 0.0500
  Finding: Violates 3 of 4 axioms: (1) Missing JWT authentication,
  (2) No pagination for list endpoint, (3) Returning all users
  will exceed 200ms P99 at scale. Only RESTful convention (GET
  for retrieval) is satisfied.

Security Compliance Review:
  D-score: 0.0300
  Finding: Non-compliant. Logging email and phone number violates
  "PII must not appear in logs" axiom. Remediation: implement
  PII redaction in the logging pipeline, use structured logging
  with field-level filtering.

Custom axiom queries achieve low D-scores because the axioms provide explicit, unambiguous criteria for evaluation. This makes ACP particularly effective for compliance and policy enforcement where rules are well-defined.

Running All Examples

Run all examples sequentially

# From project root
for example in examples/0*.py; do
    echo "Running $example..."
    python "$example"
    echo "---"
done

Understanding the Output

All examples produce the same core metrics. The table below explains each field.

Metric	Meaning	Range
`D-score`	Divergence between models. Lower is better.	0 (perfect consensus) to 1 (total disagreement)
`H_total`	Total harmony score. Higher is better.	0 (discord) to 1 (perfect harmony)
`consensus_reached`	Whether D-score fell below the configured threshold.	True / False
`iterations_used`	Number of convergence rounds performed.	1 to max_iterations
`musical_interval`	Metaphorical harmony level based on D-score.	unison, octave, fifth, fourth, third, tritone

D-Score Interpretation Guide

The D-score is the primary consensus metric. Its meaning depends on context -- a D-score of 0.15 is excellent for a subjective technology decision but concerning for a mathematical fact.

D-Score Range	Quality	Typical Use Cases
`< 0.05`	Excellent	Verifiable facts, mathematical constants, well-known algorithms
`< 0.15`	Good	Code review findings, established best practices, clear standards
`< 0.30`	Moderate	Technology comparisons, architectural trade-offs, research synthesis
`> 0.30`	Weak	Highly subjective topics, poorly defined questions, insufficient iterations

Interpreting high D-scores

A D-score above 0.30 does not necessarily mean the result is wrong. It may indicate a genuinely controversial topic, an ambiguous question, or that more iterations are needed. Try increasing max_iterations or rephrasing the query with more specificity.

Cost Considerations

Pricing note

Costs are approximate and based on OpenRouter pricing as of January 2026. Always check current rates at openrouter.ai/models.

Example	Typical Cost	Duration
Simple Query	~$0.02	5-10s
Fact Checking (5 queries)	~$0.15	30-60s
Code Review (2 queries)	~$0.20	60-90s
Research Synthesis (4 queries)	~$0.30	90-120s
Custom Axioms (2 queries)	~$0.20	60-90s
Total for all examples	~$0.87	~5 min

Common Issues

"OPENROUTER_API_KEY environment variable not set"

bash

export OPENROUTER_API_KEY="sk-or-v1-..."
# Windows: set OPENROUTER_API_KEY=sk-or-v1-...

"ModuleNotFoundError: No module named 'src'"

You are running from the wrong directory. Always run examples from the project root:

bash

cd /path/to/ACP-PROJECT
python examples/01_simple_query.py

"Rate limit exceeded"

OpenRouter has per-minute rate limits. Wait 60 seconds between example runs or upgrade your plan at openrouter.ai/settings/limits.

Next Steps

Modify queries -- try your own questions in any example
Adjust parameters -- change max_iterations, D_threshold, and structure
Add models -- try different LLMs from OpenRouter
Create custom examples -- use these as templates for your own use cases