Infrastructure
The ACP infrastructure is designed for low-latency, globally distributed consensus execution. The production stack runs on Cloudflare's edge network with Vectorize for semantic search and KV for caching. A Python FastAPI backend provides full engine access for local development and self-hosted deployments.
Deployment Architecture
The production deployment distributes the ACP stack across two primary platforms: Vercel for the frontend and Cloudflare for the edge API. The three GitHub repositories serve as the source of truth for code, prompts, and axiom data.
+--------------------+ +--------------------+
| GitHub Repo 1 | | GitHub Repo 2 |
| ACP-PROJECT | | ACP-PROMPTS |
+--------+-----------+ +--------+-----------+
| |
| GitHub Actions | Raw URL fetch
| CI/CD |
v v
+--------------------+ +--------------------+
| Vercel | | Cloudflare |
| Frontend |<-----| Worker API |
| axiomprotocol.org| | Edge Functions |
+--------------------+ +--------+-----------+
|
+--------+----------+
| |
v v
+---------------+ +---------------+
| Vectorize | | KV Namespaces |
| (3 indexes) | | - Cache |
| - axioms | | - Datasets |
| - queries | | - Rate limits |
| - results | +---------------+
+-------+-------+
|
+-------+--------+
| GitHub Repo 3 |
| ACP-DATASETS |
+----------------+Cloudflare Workers (Edge API)
The Cloudflare Worker is the production API gateway for ACP. It runs on Cloudflare's global edge network, providing sub-100ms latency worldwide. The Worker handles the complete consensus pipeline: authentication, prompt loading, axiom retrieval, LLM orchestration, D-score calculation, and result caching.
| Property | Value |
|---|---|
| Runtime | Cloudflare Workers (V8 isolate) |
| Language | JavaScript / TypeScript |
| Deployment | Global edge -- 300+ data centers |
| Latency | < 100ms to nearest edge node |
| Scaling | Automatic -- serverless, zero cold starts |
| Local dev | wrangler dev at http://localhost:8787 |
Worker Endpoints
| Method | Path | Description |
|---|---|---|
POST | /consensus-iterative | Run iterative consensus with phi-spiral through axiom levels |
GET | /axioms/search?q= | Semantic search over axioms via Vectorize |
POST | /cache/check | Check semantic cache for a previously computed result |
GET | /cache/stats | Cache hit rate, size, and performance statistics |
POST | /embeddings | Generate text embeddings via Cloudflare AI |
GET | /similar?q= | Find similar past queries for context |
GET | /health | Health check with Vectorize and KV status |
POST | /seed-axioms | Bulk seed axioms from ACP-DATASETS into Vectorize |
GET | /metrics | Prometheus-format metrics for monitoring |
Consensus Request Flow
{
"query": "What is the fastest sorting algorithm?",
"models": [
"openai/gpt-5.4",
"anthropic/claude-sonnet-4-6",
"google/gemini-2.5-flash"
],
"structure": "sonata",
"max_iterations": 7
}{
"consensus_reached": true,
"final_answer": "QuickSort has O(n log n) average complexity...",
"final_D": 0.08,
"iterations_used": 3,
"convergence_path": [0.35, 0.18, 0.08],
"axioms_used": [
"acp-comp-quicksort-avg-v1",
"acp-comp-timsort-python-v1"
],
"proof": "Verified via MathOracle + WikidataOracle",
"structure_used": "sonata",
"timestamp": "2026-04-09T12:00:00Z"
}Cloudflare Vectorize
Vectorize is Cloudflare's vector database, used by ACP for semantic search over the axiom corpus. All axioms are embedded as 768-dimensional vectors using the bge-base-en-v1.5 model via Cloudflare AI, enabling sub-millisecond approximate nearest neighbor (ANN) search.
| Index | Purpose | Dimensions | Vectors |
|---|---|---|---|
VECTORIZE_AXIOMS | Axiom semantic search -- find relevant axioms for a query | 768 | Dynamic |
VECTORIZE_QUERIES | Query deduplication -- find similar past queries | 768 | Dynamic |
VECTORIZE_RESULTS | Result caching -- semantic cache for consensus outputs | 768 | Dynamic |
Embedding Model
| Property | Value |
|---|---|
| Model | bge-base-en-v1.5 (BAAI) |
| Provider | Cloudflare AI (built-in) |
| Dimensions | 768 |
| Sequence length | 512 tokens |
| Normalization | L2 normalized |
| Search metric | Cosine similarity |
Axiom Search Pipeline
When a query arrives, the Worker generates an embedding, queries the Vectorize axiom index with level filtering, and returns the top-K most relevant axioms. Results include the axiom statement, level, domain, confidence score, and semantic similarity score.
// 1. Generate embedding for the user query
const embedding = await env.AI.run(
'@cf/baai/bge-base-en-v1.5',
{ text: [query] }
);
// 2. Query Vectorize with level filtering
const results = await env.VECTORIZE_AXIOMS.query(
embedding.data[0],
{
topK: 5,
filter: { level: [4, 5, 6, 7] },
returnMetadata: true
}
);
// 3. Return axioms with similarity scores
return results.matches.map(match => ({
id: match.id,
statement: match.metadata.statement,
level: match.metadata.level,
domain: match.metadata.domain,
confidence: match.metadata.confidence,
similarity: match.score
}));KV Storage (Caching)
Cloudflare KV provides key-value storage at the edge for three primary caching functions: semantic result caching, dataset metadata storage, and rate limiting.
| Namespace | Purpose | TTL |
|---|---|---|
ACP_CACHE | Consensus result cache -- stores full results keyed by query embedding hash | 24 hours |
ACP_DATASETS | Axiom metadata cache -- frequently accessed axiom data | 7 days |
ACP_RATE_LIMITS | Rate limiting counters per API key | 1 hour window |
The semantic cache is a performance optimization. Before running a full consensus pipeline (which involves multiple LLM API calls), the Worker checks whether a semantically similar query has been resolved recently. If a cached result exists with sufficient similarity (cosine score > 0.95), it is returned immediately, saving time and API costs.
Cache Hit Rates
In production, the semantic cache achieves hit rates of 15-25% for common query patterns. This is especially effective for factual queries (Level 1-4 axioms) where the same questions are asked frequently. The cache is bypassed for Conclave Mode queries, which always require fresh independent responses.
Python FastAPI Backend
The Python backend is the reference implementation of the full ACP v4.0 protocol. It is designed for local development and self-hosted deployments where you need direct access to the consensus engine, oracle verification, and metrics database.
| Component | Technology | Purpose |
|---|---|---|
| Web Framework | FastAPI | Async HTTP API with OpenAPI documentation |
| Database | PostgreSQL + SQLAlchemy | Metrics persistence, axiom registry, session history |
| Cache | Redis | In-memory caching, task queue, session state |
| LLM Adapters | httpx | Async clients for OpenAI, Anthropic, OpenRouter APIs |
| Consensus Engine | Custom Python | Phi-spiral, musical structures, D-score, H-total, C_ij |
| Oracle System | Custom Python | HashOracle, MathOracle, WikidataOracle verification |
# Install dependencies
pip install -r requirements.txt
# Start the API server
uvicorn main:app --reload
# Server runs at http://localhost:8000
# API docs at http://localhost:8000/docs (Swagger)
# Alternative docs at http://localhost:8000/redocPython Backend vs. Worker API
| Worker API (Production) | Python Backend (Local) | |
|---|---|---|
| Deployment | Cloudflare edge (global, serverless) | Local / self-hosted (single server) |
| Latency | < 100ms (edge) | Depends on server location |
| Caching | Vectorize + KV semantic cache | Redis in-memory cache |
| Axiom search | Vectorize ANN search | HTTP to Worker Vectorize endpoint |
| Database | KV (key-value) | PostgreSQL (relational) |
| Oracle access | Limited (external HTTP) | Full (direct oracle integration) |
| Best for | Production traffic, public API | Development, testing, full engine access |
LLM Provider Integration
ACP supports multiple LLM providers through a unified adapter interface. The Worker primarily uses OpenRouter as a single gateway to multiple models, while the Python engine supports direct connections to each provider.
| Provider | Models | Integration |
|---|---|---|
| OpenRouter | GPT-4, Claude, Gemini, Llama, Mistral, and 100+ models | Primary gateway for the Worker API |
| OpenAI | GPT-4, GPT-4 Turbo, GPT-3.5 | Direct adapter in Python engine |
| Anthropic | Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Haiku | Direct adapter in Python engine |
| Mock | Deterministic test responses | Testing adapter for CI/CD |
CI/CD Pipeline
The deployment pipeline uses GitHub Actions for continuous integration and automated deployment to both Vercel and Cloudflare.
git push to main
|
v
+---------------------+
| GitHub Actions |
| |
| 1. Lint (ruff,black)|
| 2. Type check (mypy)|
| 3. Tests (pytest) |
| 4. Build check |
+----------+----------+
|
+-------+-------+
| |
v v
+----------+ +-----------+
| Vercel | | Cloudflare|
| Frontend | | Worker |
| Deploy | | Deploy |
+----------+ +-----------+| Stage | Tool | Purpose |
|---|---|---|
| Linting | Ruff + Black | Python code style enforcement |
| Type checking | mypy | Static type analysis for Python source |
| Testing | pytest | Unit and integration test suite |
| Frontend build | Next.js build | Verify frontend compiles without errors |
| Frontend deploy | Vercel | Automatic deployment on push to main |
| Worker deploy | Wrangler | Automatic deployment to Cloudflare Workers |
Self-Hosted Deployment
For organizations that need to run ACP on their own infrastructure, the Python backend can be deployed as a Docker stack with PostgreSQL and Redis.
services:
acp-backend:
image: acp/backend:latest
ports:
- "8000:8000"
environment:
- OPENROUTER_API_KEY=${OPENROUTER_API_KEY}
- DATABASE_URL=${DATABASE_URL}
- REDIS_URL=${REDIS_URL}
depends_on:
- postgres
- redis
postgres:
image: postgres:15
environment:
- POSTGRES_DB=acp
- POSTGRES_USER=acp
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
volumes:
- pgdata:/var/lib/postgresql/data
redis:
image: redis:7-alpine
ports:
- "6379:6379"
volumes:
pgdata:Axiom Seeding
When self-hosting, you will need to seed axioms into your local environment. Clone ACP-DATASETS into the same parent directory and use the seed script at scripts/vectorize/seed-all-axioms.js to populate the Vectorize index. For fully offline deployments, the Python engine can read axioms directly from JSON files in data/axioms/.
Monitoring and Observability
The Worker exposes a /metrics endpoint in Prometheus format, and the /health endpoint reports the status of Vectorize, KV, and AI bindings. Key metrics to monitor include:
| Metric | Description |
|---|---|
| Consensus success rate | Percentage of queries that reach D < threshold |
| Average iterations | Mean number of phi-spiral iterations to consensus |
| Cache hit rate | Percentage of queries served from semantic cache |
| P95 latency | 95th percentile response time for consensus requests |
| LLM error rate | Failure rate of upstream LLM API calls |
| Vectorize query latency | Time to retrieve axioms from the vector index |