ACP-DATASETS

ACP-DATASETS is the data layer of the ACP ecosystem. It contains verified axioms distributed across all 7 hierarchical levels -- from mathematical truths to self-referential facts about AI architecture. These axioms are the anchors that ground consensus in undeniable truth.

Role in the Ecosystem

During a consensus run, the ACP engine retrieves relevant axioms from this dataset via Cloudflare Vectorize semantic search. The axioms are injected into the model context at each iteration of the φ-spiral, constraining model responses to verifiable facts and reducing disagreement.

Each axiom is stored as a structured JSON file with metadata for level, domain, confidence score, oracle verification sources, and formal representations. The axioms are embedded as 768-dimensional vectors using Cloudflare AI (bge-base-en-v1.5) for fast semantic retrieval.

Repository Structure

ACP-DATASETS directory layout

ACP-DATASETS/
└── official/
    ├── level-1-mathematical/   # Mathematical axioms (120 axioms)
    ├── level-2-physical/       # Physical laws (180 axioms)
    ├── level-3-ontological/    # Concept definitions (200 axioms)
    ├── level-4-computable/     # Algorithms and computations (250 axioms)
    ├── level-5-architectural/  # System architecture (150 axioms)
    ├── level-6-protocol/       # Protocols and standards (100 axioms)
    └── level-7-linguistic/     # Language constructs (59 axioms)

Level Distribution

Axioms are distributed across the seven levels with higher concentrations in the computationally verifiable middle levels (3-4) and the foundational physical level (2). The self-referential upper levels (5-7) contain fewer axioms but carry disproportionate consensus power because AI cannot deny facts about its own substrate.

Level	Name	Category	Count	Percentage
1	Mathematical	Fundamental	120	11.3%
2	Physical	Fundamental	180	17.0%
3	Ontological	Verifiable	200	18.9%
4	Computable	Verifiable	250	23.6%
5	Architectural	Self-referential	150	14.2%
6	Protocol	Self-referential	100	9.4%
7	Linguistic	Self-referential	59	5.6%
Total			—	100%

Axiom Distribution by Level

Level 1 (Mathematical):   120 axioms  ████████████
Level 2 (Physical):       180 axioms  ██████████████████
Level 3 (Ontological):    200 axioms  ████████████████████
Level 4 (Computable):     250 axioms  █████████████████████████
Level 5 (Architectural):  150 axioms  ███████████████
Level 6 (Protocol):       100 axioms  ██████████
Level 7 (Linguistic):      59 axioms  ██████
─────────────────────────────────────
TOTAL:                  see repository

Axiom JSON Structure

Each axiom is a JSON file with a standardized schema. The schema enforces consistency across all entries and enables automated validation, oracle verification, and Vectorize ingestion.

Example axiom: QuickSort average complexity

{
  "id": "acp-comp-quicksort-avg-v1",
  "level": 4,
  "domain": "computer-science",
  "statement": "QuickSort has average time complexity O(n log n)",
  "formal": "T_avg(QuickSort) = \u0398(n log n)",
  "proof": "Mathematical analysis of expected partitioning",
  "description": "QuickSort achieves O(n log n) average-case time complexity through randomized pivot selection and expected balanced partitioning.",
  "oracles": ["WikidataOracle", "StackOverflowOracle"],
  "confidence": 1.0,
  "tags": ["sorting", "algorithms", "complexity"]
}

Field	Type	Description
`id`	string	Unique identifier following the pattern acp-{domain}-{name}-v{version}
`level`	integer (1-7)	Axiom hierarchy level
`domain`	string	Knowledge domain (e.g., computer-science, physics, mathematics)
`statement`	string	Human-readable axiom statement -- the primary text used for embedding
`formal`	string	Formal mathematical or logical representation
`proof`	string	Description of the proof or verification method
`description`	string	Extended description with context and explanation
`oracles`	string[]	List of oracle sources that can verify this axiom
`confidence`	number (0-1)	Confidence score -- 1.0 for axioms verified by multiple oracles
`tags`	string[]	Searchable tags for categorization and filtering

How Axioms Are Used in Consensus

Axioms enter the consensus pipeline through semantic retrieval. When a user submits a query, the system generates an embedding vector and searches Vectorize for the most relevant axioms. These axioms are then injected into the model context to ground responses in verifiable facts.

Retrieval via Worker (Vectorize)

Semantic axiom search in the Worker

// Generate embedding for the user query
const embedding = await getEmbedding(query, env);

// Search Vectorize for relevant axioms
const axioms = await env.VECTORIZE_AXIOMS.query(
  embedding,
  {
    topK: 5,
    filter: { level: [5, 6, 7] }  // Prioritize self-referential levels
  }
);

// Inject into model context
const context = `${query}\n\nRelevant verified axioms:\n${
  axioms.matches.map(a => a.metadata.statement).join('\n')
}`;

Retrieval via Python Engine

Axiom search in the Python engine

# Search relevant axioms through the Worker Vectorize endpoint
axioms = await vectorize.search_axioms(
    query,
    top_k=5,
    level_filter=[5, 6, 7],
    min_score=0.6
)

# Inject into consensus configuration
config.relevant_axioms = [
    {"axiom": a.axiom, "level": a.level}
    for a in axioms
]

Level Filtering

In practice, the engine applies level filtering based on the query domain. Mathematical queries prioritize Level 1-2 axioms, while software architecture questions prioritize Level 5-7 (self-referential) axioms. The self-referential levels are especially powerful because AI cannot deny facts about its own computational substrate.

Vectorization

All axioms are vectorized using Cloudflare AI with the bge-base-en-v1.5 embedding model, producing 768-dimensional vectors. These vectors are stored in Cloudflare Vectorize and enable sub-millisecond semantic search across the entire axiom corpus.

Property	Value
Embedding Model	bge-base-en-v1.5 (BAAI)
Dimensions	768
Index	VECTORIZE_AXIOMS
Total Vectors	Dynamic
Search Algorithm	Approximate nearest neighbor (ANN)
Metadata	statement, level, domain, confidence, tags

Axioms are seeded into Vectorize using the seed script at scripts/vectorize/seed-all-axioms.js. After adding new axioms to ACP-DATASETS, run the seed script to make them available for semantic retrieval.

Verification and Integrity

Axiom integrity is maintained through multiple layers of verification.

Oracle Verification

Each axiom lists the oracle sources that can independently verify its truth. Oracles are external verification services organized by axiom level.

Level	Oracle Sources
Level 1 (Mathematical)	Wolfram Alpha API, SymPy, SageMath, Coq/Lean proof checkers
Level 2 (Physical)	NIST Physical Constants Database, ArXiv, physics simulators
Level 3 (Ontological)	PubChem, Wikipedia, IUPAC databases, scientific taxonomies
Level 4 (Computable)	Hash calculators, algorithm complexity databases, test suites
Level 5 (Architectural)	Intel/AMD CPU documentation, IEEE standards
Level 6 (Protocol)	IANA registries, RFC documents, W3C standards
Level 7 (Linguistic)	Language specifications, compiler/interpreter behavior, syntax validators

SHA-256 Hash Integrity

Each axiom file can be independently verified via SHA-256 hash. The hash covers the full JSON content, ensuring that axiom statements, confidence scores, and oracle references have not been tampered with after verification. This is especially important for Level 4 (Computable) axioms, where the axiom itself may describe a hash function.

Quality Standards

Axioms are categorized into quality tiers based on a composite score covering schema compliance, oracle agreement, test coverage, community rating, and usage success rate.

Category	Quality Threshold	Oracle Requirement	Review
Official	>= 0.90	2+ verified sources	Core team
Community Approved	>= 0.75	1+ verified source	Community moderators
Community Pending	>= 0.60	Under review	3-7 day review period
Rejected	< 0.60	Insufficient	Feedback provided with rejection

Contributing Axioms

To contribute a new axiom, fork the ACP-DATASETS repository, create a JSON file following the schema in the appropriate official/level-X-*/ directory, and submit a pull request. The axiom will undergo automated schema validation and manual review before inclusion.

ACP-PROJECT

ACP-PROMPTS