Skip to content

Independent AI & Control research studio

We build structured AI methods to study stability, calibration, and reliable reasoning — and we share results openly.

Open math + dashboards for credit allocation (Thermo-Credit) and AI reliability.

News

Vision — Teach AI philosophy, Trust by limits

“The greatest—perhaps the only—use of the philosophy of pure reason is negative: not an organon for enlarging knowledge, but a discipline (training) for the setting of limits.”
— Immanuel Kant, KrV A795/B823 (paraphrase)

We operationalize this principle as engineering to contain hallucination (high-confidence false content): a regulative, feedback-based discipline that defines boundaries first, calibrates confidence, and maintains closed-loop oversight with auditable logs. This principle underpins AuditLoop.

Note (Dec 2025): an arXiv replacement has been accepted and is scheduled to be announced on 2025-12-16 10:00 JST. The latest archived release is always available via the Zenodo concept DOI.

AuditLoop — Stability & Governance for LLMs

We commercialize a reliability & governance layer for LLM applications: closed-loop evaluation and optimization with audit-ready reports, mapped to the EU AI Act, ISO/IEC 42001, and the NIST AI RMF.

A. Reliability & Governance SaaS

Automatically measures ECE/Brier/PSI, LoopGain and variance shrinkage, citation consistency, and justified refusal F1 — feeding dashboards and audit reports.

Value: provides audit-ready evidence aligned with the EU AI Act (transparency, evaluation, record-keeping), ISO/IEC 42001, and the NIST AI RMF.

B. Closed-loop Optimization Middleware

Auto-corrects production inference via a Prompt-Critique-Revision loop. Maintains token-budget parity while reducing hallucinations and dispersion.

Value: stabilizes quality KPIs (variance shrinkage) while keeping cost overruns in check.

C. Benchmarks & Conformity Reports

Assigns “stability scores” for RAG/FAQ/procedural workloads. Delivers PDF/JSON reports usable for procurement and audits. Supports RAG evaluation (RAGAS-style metrics).

Targets & roadmap (to Dec 2026)

Nov 2025-Jan 2026
Define a tight MVP: CSV/JSON → metrics → PDF/JSON report.
• Ship jobs + aggregation + report v0.1.
• Implement one provider connector (Bedrock or Vertex).
• Add a minimal RAG citation-consistency check.
• Recruit 1–2 pilot leads and secure compute support.
Q1 2026
Run controlled evals across 2–3 models/providers (calibration drift, variance shrinkage).
• Ship AuditLoop v0.2.
• Deliver ~2 PoCs/pilots with measurable improvements.
• Release audit-ready report templates v0.1.
• Draft the journal submission package by end of Q1.
Q2 2026
Deliver metrics-to-clause mapping alpha (EU AI Act / NIST / ISO) as a table + evidence pack.
• Add a conformity report generator (template-first).
• Expand to 3–4 pilots (incl. production-ish RAG).
• Target journal submission (window: Q2–Q3, results-dependent).
Q3 2026
Harden closed-loop middleware (Prompt–Critique–Revision) with a fixed intervention set.
• Tune for cost parity and latency budget.
• Reliability dashboards GA-beta.
• Privacy + red-team review (checklist + tests).
• Revise/submit as needed.
Q4 2026
Release AuditLoop v1.0 + conformity reports v1 (template-driven).
• 6+ cumulative PoCs/pilots.
• EU AI Act procurement readiness checklist + evidence package.

Note: timelines are indicative and may shift ±1 quarter depending on compute support and pilot access.

We also explore adjacent applications where privacy meets reliability — e.g., identity-neutral recruiting — strictly as a reference implementation of our trust-engineering framework.

Why now (external demand)

  • EU AI Act: GPAI transparency obligations by ~Aug 2025; high-risk phases roll out through 2026. Strong demand for evaluation, logging, and accountability.
  • ISO/IEC 42001: AI management systems standard is live. Requires operating processes and evidence.
  • NIST AI RMF: Measurement-centric risk management is formalized — strong fit for an evaluation SaaS.

Research

We pursue two parallel strands: (1) AI reliability & Kantian feedback for LLMs, and (2) Thermo-credit (QTC) economic theory for credit and monetary dynamics.

Theme 1 — AI reliability & Kantian feedback

Our current paper reports preliminary results. Next we extend to larger-scale experiments (multi-provider RAG, calibration drift, closed-loop stability under budget constraints) using Colab-based, reproducible runs.

Discuss research collaboration

Theme 1 roadmap — AI reliability & Kantian feedback (to Dec 2026)

Nov 2025-Jan 2026
Implement black-box suite (ECE/Brier/LogLoss; citation-based hallucination; refusal F1; PSI; self-consistency variance; G/A/S). Enforce token-budget parity; shakedown run.
Q1 2026
Full factorial: Model × Protocol {Baseline, Critique, Critique+Retrieval} × PromptVar (30) × Seed (≥3) × Temp {0, 0.7} × Lang {JP, EN}. Auto-generate PDF/JSON with key stability and calibration metrics.
Q2 2026
Add gray-box: logit-lens / linear probe, activation patching, small-perturbation sensitivity (output-KL Lipschitz approx). Clause mapping α (EU AI Act / ISO 42001 / NIST AI RMF).
Q3 2026
Strengthen closed-loop middleware under cost parity. Track key metrics across interventions. Dashboard beta; privacy & red-team review.
Q4 2026
White-box PoC (selected layers): local Jacobian spectral approx → H-Risk (spectral margin, condition number). Release dataset/code v1.

Key metrics: ΔECE, ΔBrier, ΔHallucination, ΔPSI, LoopGain / G·A·S, dispersion shrinkage; plus answer length, latency, and citation alignment.

Theme 2 — Thermo-credit (QTC) economic theory

We develop and test a thermodynamic analogy for credit, liquidity, and monetary aggregates (QTC). This strand remains exploratory and is documented separately for clarity.

Read the theory note

Projects

AuditLoop — Teach AI philosophy, Trust by limits

Reliability & Governance for LLMs: closed-loop evaluation and optimization mapped to the EU AI Act, ISO/IEC 42001, and the NIST AI RMF.

Learn more

Thermo-Credit Monitor (QTC)

Public monthly indicators modeling credit dynamics via a thermodynamic analogy: S_M = k · M_in · H(q), T_L, loop dissipation (PLD), and X_C. Interactive charts with PNG fallbacks.

Open report View repo

mAI Economy — Window Guidance as Code

Concept OS for soft, transparent credit allocation guidance for central banks, supervisors, and large banks, built on calibrated AI methods and Thermo-credit (QTC) structure.

Read concept note

Applications (Case Studies)

Case Study: Identity-Neutral Matching (Blind Screening)

Problem. Early-stage hiring decisions can be noisy and biased when personally identifying information (PII) leaks into the loop.

Approach. Apply our closed-loop trust engineering (Prompt-Critique-Revision) with privacy controls to evaluate requirements-skills outside of PII, logging provenance for audit.

  • Calibration & Stability: ECE/Brier reduction and variance shrinkage across prompt sets.
  • Provenance: citation/traceability checks on requirement-evidence pairs.
  • Justified refusal: policy-guarded rejection when PII or protected attributes are requested.
  • Privacy risk: PII leakage rate ≤ threshold; data minimization by design.

Compliance fit: EU AI Act (evaluation, logging), NIST AI RMF, ISO/IEC 42001; plus GDPR/ISO/IEC 27701 for privacy governance.

Scope: reference implementation and audit reports. We are not a recruiting agency.

Discuss this case study

Services

Research & Modeling

Control-theoretic analysis of ML systems, simulation studies, and quantitative metrics (stability, calibration, H-Risk).

Prototyping

Lightweight tools, reproducible experiments, and open-source utilities for data, figures, and evaluation.

Advisory & Workshops

Short-form advisory and internal workshops on uncertainty, evaluation, and reliability in AI.

About

We are a small, independent studio exploring the interface between feedback control and AI/ML. Our work focuses on epistemic stability, calibration, and methods that make intelligent systems more reliable. We value clear theory, reproducibility, and pragmatic prototypes.

Contact

Email: info@toppymicros.com