Kairos |
Cortex Chronicle Compass
Input Results History Control Description | Rules

Upload Protocol PDF

Drag and drop a PDF file here, or click to browse

PDF files only • Max 40 MB • Max 200 pages

Generate Synthetic Protocol (with Validation)

Well-formed protocol; all agents should pass cleanly.

Synthetic Protocol Archive

Previously generated synthetic protocols and uploaded source protocols available for re-processing

Loading synthetic protocols...

Recent Jobs

Job ID Status Step Progress Created
job_c4591001_clinica... COMPLETED DONE 100% 2026-03-22 07:43
job_c4591001_clinica... COMPLETED DONE 100% 2026-03-21 22:34
job_c4591001_clinica... COMPLETED DONE 100% 2026-03-21 21:42
job_c4591001_clinica... COMPLETED DONE 100% 2026-03-21 20:27
job_prot_sap_000_202... COMPLETED DONE 100% 2026-03-20 21:53
📋

Job console (full page)

Open latest job History Input

--
Total Jobs
--
Completed
--
Failed
--
Fully Automated
Protocol / Job ID Status Outcome Fields Created Details
Loading jobs...

Cortex Configuration

Extraction Models

Primary: --
Optional: --
Verifier: --
Tier 4: --

Thresholds

Auto-approve: --
Min extractor conf: --
Min verifier conf: --
Min evidence match: --
Min AI fallback: --

Pipeline Configuration

Runtime Modes

Architecture: --
Contract mode: --
Downstream agents: --
Design agent only: --

API Governor Limits

T3 max API calls: --
T3 max input tokens: --
T4 max API calls: --
Max wall clock: --s
Max total cost: $--

Aggregated Statistics

--
Success Rate
--
Total Jobs
--
Avg Duration
--
Avg Fields

Current Cortex Rules

Current Cortex field registry: 238 fields across 15 modules. Uploads run the full Cortex pipeline by default, and the current rules are versioned from the canonical rule loaders.

Canonical rules: defs:3.4 gates:3.0 Downstream rules: agent_configs=2.1 • schema_mapping=2.1 • validation_rules=3.0 • usdm_path_registry=2.0 • gate_rules=3.0 • schema_agent_config=2.1 • mapping_agent_config=2.1 • amendment_agent_config=2.1 • validator_agent_config=2.1 • schema_field_mapping=2.1 • validation_rules_standards=3.0 • schema_validation_rules=3.0 • gate_rules_standards=3.0

Implementation Overview

Models: Cortex is controller-led. Tier 1 is narrow deterministic extraction (~120 fields). Tier 2 is manifest-gated learned ranking (~30 fields) — currently operating in lexical fallback mode (no trained CrossEncoder bundle). Tier 3 is the universal non-system LLM extractor with bounded self-critique (~82 fields), primary model: ministral3-14b (Ollama Cloud). Tier 4 is selective arbitration for hard fields, model: qwen3.5 (Ollama Cloud). Zero OpenAI.

Data: Protocol PDFs are processed with Docling-first extraction. The active path is Docling text + table, yielding an evidence pack with page and line IDs. PyMuPDF is fallback-only when Docling returns no usable lines. Synthetic ground truth comes from synthetic_protocols. Artifacts remain immutable in GCS and job state in Firestore.

Process: Upload or synthetic generation creates the job, the worker runs extraction and builds the evidence pack, Cortex applies hybrid semantic zoning and field planning, then bounded controller rounds execute Tier 1, Tier 3, and selective Tier 4 before validation emits design_output_v1. Schema, Mapping, Amendment, and Validator continue downstream. Progress, step timings, token usage, and estimated cost are exposed on GET /api/v1/jobs/{id}.

Synthetic scenarios (generator-native): BASELINE, DESIGN_CHALLENGE, SCHEMA_CHALLENGE, MAPPING_CHALLENGE, AMENDMENT_CHALLENGE, VALIDATOR_STRESS, FULL_STRESS, STATUS_CHALLENGE, NOISE_CHALLENGE — short path MODE_A except long-path for VALIDATOR_STRESS/FULL_STRESS.

Runtime Flow

Contract-first pipeline with 238 fields across 15 modules (rules vdefs:3.4 gates:3.0). Each agent consumes upstream artifact(s) and emits one versioned JSON output. Mapping and Amendment run in parallel after Schema. Full audit trail and cost tracking per step.

PDF Upload
Extraction
Docling text + table
Evidence Pack
lines + page IDs
Hybrid Zoning + Planning
controller prep
Design Agent
bounded controller loop
Schema Agent
Mapping + Amendment
Validator
Validated Output
Artifacts: _1_design.json_2_schema.json_3_mapping.json + _4_amendment.json_5_validated.json

Pipeline Agents

D

Design Agent

Cortex processes the protocol with Docling-first extraction, builds an evidence pack, applies hybrid semantic zoning, and prepares field plans before entering bounded controller rounds. Tier 3 is the main non-system extractor, Tier 1 stays narrow, Tier 2 is manifest-gated, and Tier 4 is selective arbitration for hard fields. Validation and trace outputs stay explicit.

PDF → Text → Evidence Pack Hybrid Zoner + Planner Controller Round 1 T3 + Self-Critique Selective T4 Arbitration Validation → Final output
Sc

Schema Agent

Deterministic normalization and CT mapping from Design records. Driven by Schema workbook rules and CDISC CT dictionary. Ownership modes: PASSTHROUGH, SINGLE-CT, UCUM, DERIVED.

Parallel Execution (ThreadPoolExecutor, 2 workers)
M

Mapping Agent (TS-only MVP)

Assembly engine: builds SDTM TS rows from schema output using the mapping matrix. Deterministic row builder (no AI). MVP: Study Definition + Study Design (31 fields).

Am

Amendment Agent

Computes N vs N-1 field-level diff for same protocol ID with severity tagging. Change types: ADD/MODIFY/DELETE. Severities: MAJOR (CT code changed), MINOR (SDTM but no CT), COSMETIC.

V

Validator Agent

Merges all upstream contracts into unified validated records with combined confidence, QC flags, and human review tasks. Two-stage confidence: verifier (design-dominant weighting) then gating (evidence + agreement + rules + extraction quality).

Technical Details

API Surface

REST API (FastAPI + Uvicorn). All endpoints under /api/v1/.

Core Principles

Evidence Traceability

Every extracted value is tied to evidence (page, section, quote, bounding box) and preserved through all downstream contracts with full rule and model trace.

AI + Rules Synergy

Tier 3 is the universal non-system extractor, but Cortex stays evidence-driven. Deterministic rules validate outputs, Tier 1 stays narrow, and Tier 4 arbitrates hard disagreements instead of applying a fixed winner order.

Contract Boundaries

Each agent reads upstream artifacts only. No downstream component re-reads raw text after Design Agent. Immutable artifacts in GCS; operational state in Firestore.

Human Review Safety

Low-confidence, conflicting, or blocked records are flagged with explanation and evidence for manual resolution. AI review tips assist but never auto-resolve.

Cost-Aware Extraction

Domain-aware batch sizing with parallel execution. Zone-filtered evidence reduces tokens 60-80%. Full cost tracking per agent and per API call.

Full Audit Trail

Every pipeline step, review action, and status transition is logged as an audit event with timestamps, token usage, and cost breakdown.