ACP-DATASETS
ACP-DATASETS is the data layer of the ACP ecosystem. It contains verified axioms distributed across all 7 hierarchical levels -- from mathematical truths to self-referential facts about AI architecture. These axioms are the anchors that ground consensus in undeniable truth.
Role in the Ecosystem
During a consensus run, the ACP engine retrieves relevant axioms from this dataset via Cloudflare Vectorize semantic search. The axioms are injected into the model context at each iteration of the φ-spiral, constraining model responses to verifiable facts and reducing disagreement.
Each axiom is stored as a structured JSON file with metadata for level, domain, confidence score, oracle verification sources, and formal representations. The axioms are embedded as 768-dimensional vectors using Cloudflare AI (bge-base-en-v1.5) for fast semantic retrieval.
Repository Structure
ACP-DATASETS/
└── official/
├── level-1-mathematical/ # Mathematical axioms (120 axioms)
├── level-2-physical/ # Physical laws (180 axioms)
├── level-3-ontological/ # Concept definitions (200 axioms)
├── level-4-computable/ # Algorithms and computations (250 axioms)
├── level-5-architectural/ # System architecture (150 axioms)
├── level-6-protocol/ # Protocols and standards (100 axioms)
└── level-7-linguistic/ # Language constructs (59 axioms)Level Distribution
Axioms are distributed across the seven levels with higher concentrations in the computationally verifiable middle levels (3-4) and the foundational physical level (2). The self-referential upper levels (5-7) contain fewer axioms but carry disproportionate consensus power because AI cannot deny facts about its own substrate.
| Level | Name | Category | Count | Percentage |
|---|---|---|---|---|
| 1 | Mathematical | Fundamental | 120 | 11.3% |
| 2 | Physical | Fundamental | 180 | 17.0% |
| 3 | Ontological | Verifiable | 200 | 18.9% |
| 4 | Computable | Verifiable | 250 | 23.6% |
| 5 | Architectural | Self-referential | 150 | 14.2% |
| 6 | Protocol | Self-referential | 100 | 9.4% |
| 7 | Linguistic | Self-referential | 59 | 5.6% |
| Total | — | 100% |
Level 1 (Mathematical): 120 axioms ████████████ Level 2 (Physical): 180 axioms ██████████████████ Level 3 (Ontological): 200 axioms ████████████████████ Level 4 (Computable): 250 axioms █████████████████████████ Level 5 (Architectural): 150 axioms ███████████████ Level 6 (Protocol): 100 axioms ██████████ Level 7 (Linguistic): 59 axioms ██████ ───────────────────────────────────── TOTAL: see repository
Axiom JSON Structure
Each axiom is a JSON file with a standardized schema. The schema enforces consistency across all entries and enables automated validation, oracle verification, and Vectorize ingestion.
{
"id": "acp-comp-quicksort-avg-v1",
"level": 4,
"domain": "computer-science",
"statement": "QuickSort has average time complexity O(n log n)",
"formal": "T_avg(QuickSort) = \u0398(n log n)",
"proof": "Mathematical analysis of expected partitioning",
"description": "QuickSort achieves O(n log n) average-case time complexity through randomized pivot selection and expected balanced partitioning.",
"oracles": ["WikidataOracle", "StackOverflowOracle"],
"confidence": 1.0,
"tags": ["sorting", "algorithms", "complexity"]
}| Field | Type | Description |
|---|---|---|
id | string | Unique identifier following the pattern acp-{domain}-{name}-v{version} |
level | integer (1-7) | Axiom hierarchy level |
domain | string | Knowledge domain (e.g., computer-science, physics, mathematics) |
statement | string | Human-readable axiom statement -- the primary text used for embedding |
formal | string | Formal mathematical or logical representation |
proof | string | Description of the proof or verification method |
description | string | Extended description with context and explanation |
oracles | string[] | List of oracle sources that can verify this axiom |
confidence | number (0-1) | Confidence score -- 1.0 for axioms verified by multiple oracles |
tags | string[] | Searchable tags for categorization and filtering |
How Axioms Are Used in Consensus
Axioms enter the consensus pipeline through semantic retrieval. When a user submits a query, the system generates an embedding vector and searches Vectorize for the most relevant axioms. These axioms are then injected into the model context to ground responses in verifiable facts.
Retrieval via Worker (Vectorize)
// Generate embedding for the user query
const embedding = await getEmbedding(query, env);
// Search Vectorize for relevant axioms
const axioms = await env.VECTORIZE_AXIOMS.query(
embedding,
{
topK: 5,
filter: { level: [5, 6, 7] } // Prioritize self-referential levels
}
);
// Inject into model context
const context = `${query}\n\nRelevant verified axioms:\n${
axioms.matches.map(a => a.metadata.statement).join('\n')
}`;Retrieval via Python Engine
# Search relevant axioms through the Worker Vectorize endpoint
axioms = await vectorize.search_axioms(
query,
top_k=5,
level_filter=[5, 6, 7],
min_score=0.6
)
# Inject into consensus configuration
config.relevant_axioms = [
{"axiom": a.axiom, "level": a.level}
for a in axioms
]Level Filtering
In practice, the engine applies level filtering based on the query domain. Mathematical queries prioritize Level 1-2 axioms, while software architecture questions prioritize Level 5-7 (self-referential) axioms. The self-referential levels are especially powerful because AI cannot deny facts about its own computational substrate.
Vectorization
All axioms are vectorized using Cloudflare AI with the bge-base-en-v1.5 embedding model, producing 768-dimensional vectors. These vectors are stored in Cloudflare Vectorize and enable sub-millisecond semantic search across the entire axiom corpus.
| Property | Value |
|---|---|
| Embedding Model | bge-base-en-v1.5 (BAAI) |
| Dimensions | 768 |
| Index | VECTORIZE_AXIOMS |
| Total Vectors | Dynamic |
| Search Algorithm | Approximate nearest neighbor (ANN) |
| Metadata | statement, level, domain, confidence, tags |
Axioms are seeded into Vectorize using the seed script at scripts/vectorize/seed-all-axioms.js. After adding new axioms to ACP-DATASETS, run the seed script to make them available for semantic retrieval.
Verification and Integrity
Axiom integrity is maintained through multiple layers of verification.
Oracle Verification
Each axiom lists the oracle sources that can independently verify its truth. Oracles are external verification services organized by axiom level.
| Level | Oracle Sources |
|---|---|
| Level 1 (Mathematical) | Wolfram Alpha API, SymPy, SageMath, Coq/Lean proof checkers |
| Level 2 (Physical) | NIST Physical Constants Database, ArXiv, physics simulators |
| Level 3 (Ontological) | PubChem, Wikipedia, IUPAC databases, scientific taxonomies |
| Level 4 (Computable) | Hash calculators, algorithm complexity databases, test suites |
| Level 5 (Architectural) | Intel/AMD CPU documentation, IEEE standards |
| Level 6 (Protocol) | IANA registries, RFC documents, W3C standards |
| Level 7 (Linguistic) | Language specifications, compiler/interpreter behavior, syntax validators |
SHA-256 Hash Integrity
Each axiom file can be independently verified via SHA-256 hash. The hash covers the full JSON content, ensuring that axiom statements, confidence scores, and oracle references have not been tampered with after verification. This is especially important for Level 4 (Computable) axioms, where the axiom itself may describe a hash function.
Quality Standards
Axioms are categorized into quality tiers based on a composite score covering schema compliance, oracle agreement, test coverage, community rating, and usage success rate.
| Category | Quality Threshold | Oracle Requirement | Review |
|---|---|---|---|
| Official | >= 0.90 | 2+ verified sources | Core team |
| Community Approved | >= 0.75 | 1+ verified source | Community moderators |
| Community Pending | >= 0.60 | Under review | 3-7 day review period |
| Rejected | < 0.60 | Insufficient | Feedback provided with rejection |
Contributing Axioms
To contribute a new axiom, fork the ACP-DATASETS repository, create a JSON file following the schema in the appropriate official/level-X-*/ directory, and submit a pull request. The axiom will undergo automated schema validation and manual review before inclusion.