Independent AI & Control research studio
We build structured AI methods to study stability, calibration, and reliable reasoning — and we share results openly.
Open math + dashboards for credit allocation (Thermo-Credit) and AI reliability.
News
-
Dec 2025 — Updated the Kantian stability & miscalibration preprint (arXiv replacement accepted; scheduled: 2025-12-16 10:00 JST)
-
Nov 2025 #1 — Published the QTC ↔ Thermodynamics theory note
-
Nov 2025 #2 — Released early position note “Reasoning Tokens and the Case for Reusable Thought”
- Oct 2025 — Posted the Kantian stability & miscalibration preprint on arXiv.
- Sep 2025 — Public repo refresh for the MBS / Basel Dynamics project.
Vision — Teach AI philosophy, Trust by limits
“The greatest—perhaps the only—use of the philosophy of pure reason is negative: not an organon for enlarging knowledge, but a discipline (training) for the setting of limits.”
We operationalize this principle as engineering to contain hallucination (high-confidence false content): a regulative, feedback-based discipline that defines boundaries first, calibrates confidence, and maintains closed-loop oversight with auditable logs. This principle underpins AuditLoop.
Note (Dec 2025): an arXiv replacement has been accepted and is scheduled to be announced on 2025-12-16 10:00 JST. The latest archived release is always available via the Zenodo concept DOI.
AuditLoop — Stability & Governance for LLMs
We commercialize a reliability & governance layer for LLM applications: closed-loop evaluation and optimization with audit-ready reports, mapped to the EU AI Act, ISO/IEC 42001, and the NIST AI RMF.
A. Reliability & Governance SaaS
Automatically measures ECE/Brier/PSI, LoopGain and variance shrinkage, citation consistency, and justified refusal F1 — feeding dashboards and audit reports.
Value: provides audit-ready evidence aligned with the EU AI Act (transparency, evaluation, record-keeping), ISO/IEC 42001, and the NIST AI RMF.
B. Closed-loop Optimization Middleware
Auto-corrects production inference via a Prompt-Critique-Revision loop. Maintains token-budget parity while reducing hallucinations and dispersion.
Value: stabilizes quality KPIs (variance shrinkage) while keeping cost overruns in check.
C. Benchmarks & Conformity Reports
Assigns “stability scores” for RAG/FAQ/procedural workloads. Delivers PDF/JSON reports usable for procurement and audits. Supports RAG evaluation (RAGAS-style metrics).
Targets & roadmap (to Dec 2026)
• Ship jobs + aggregation + report v0.1.
• Implement one provider connector (Bedrock or Vertex).
• Add a minimal RAG citation-consistency check.
• Recruit 1–2 pilot leads and secure compute support.
• Ship AuditLoop v0.2.
• Deliver ~2 PoCs/pilots with measurable improvements.
• Release audit-ready report templates v0.1.
• Draft the journal submission package by end of Q1.
• Add a conformity report generator (template-first).
• Expand to 3–4 pilots (incl. production-ish RAG).
• Target journal submission (window: Q2–Q3, results-dependent).
• Tune for cost parity and latency budget.
• Reliability dashboards GA-beta.
• Privacy + red-team review (checklist + tests).
• Revise/submit as needed.
• 6+ cumulative PoCs/pilots.
• EU AI Act procurement readiness checklist + evidence package.
Note: timelines are indicative and may shift ±1 quarter depending on compute support and pilot access.
We also explore adjacent applications where privacy meets reliability — e.g., identity-neutral recruiting — strictly as a reference implementation of our trust-engineering framework.
Why now (external demand)
- EU AI Act: GPAI transparency obligations by ~Aug 2025; high-risk phases roll out through 2026. Strong demand for evaluation, logging, and accountability.
- ISO/IEC 42001: AI management systems standard is live. Requires operating processes and evidence.
- NIST AI RMF: Measurement-centric risk management is formalized — strong fit for an evaluation SaaS.
Research
We pursue two parallel strands: (1) AI reliability & Kantian feedback for LLMs, and (2) Thermo-credit (QTC) economic theory for credit and monetary dynamics.
Theme 1 — AI reliability & Kantian feedback
Our current paper reports preliminary results. Next we extend to larger-scale experiments (multi-provider RAG, calibration drift, closed-loop stability under budget constraints) using Colab-based, reproducible runs.
Theme 1 roadmap — AI reliability & Kantian feedback (to Dec 2026)
Key metrics: ΔECE, ΔBrier, ΔHallucination, ΔPSI, LoopGain / G·A·S, dispersion shrinkage; plus answer length, latency, and citation alignment.
Theme 2 — Thermo-credit (QTC) economic theory
We develop and test a thermodynamic analogy for credit, liquidity, and monetary aggregates (QTC). This strand remains exploratory and is documented separately for clarity.
Projects
AuditLoop — Teach AI philosophy, Trust by limits
Reliability & Governance for LLMs: closed-loop evaluation and optimization mapped to the EU AI Act, ISO/IEC 42001, and the NIST AI RMF.
Thermo-Credit Monitor (QTC)
Public monthly indicators modeling credit dynamics via a thermodynamic analogy: S_M = k · M_in · H(q), T_L, loop dissipation (PLD), and X_C. Interactive charts with PNG fallbacks.
mAI Economy — Window Guidance as Code
Concept OS for soft, transparent credit allocation guidance for central banks, supervisors, and large banks, built on calibrated AI methods and Thermo-credit (QTC) structure.
Applications (Case Studies)
Case Study: Identity-Neutral Matching (Blind Screening)
Problem. Early-stage hiring decisions can be noisy and biased when personally identifying information (PII) leaks into the loop.
Approach. Apply our closed-loop trust engineering (Prompt-Critique-Revision) with privacy controls to evaluate requirements-skills outside of PII, logging provenance for audit.
- Calibration & Stability: ECE/Brier reduction and variance shrinkage across prompt sets.
- Provenance: citation/traceability checks on requirement-evidence pairs.
- Justified refusal: policy-guarded rejection when PII or protected attributes are requested.
- Privacy risk: PII leakage rate ≤ threshold; data minimization by design.
Compliance fit: EU AI Act (evaluation, logging), NIST AI RMF, ISO/IEC 42001; plus GDPR/ISO/IEC 27701 for privacy governance.
Scope: reference implementation and audit reports. We are not a recruiting agency.
Services
Research & Modeling
Control-theoretic analysis of ML systems, simulation studies, and quantitative metrics (stability, calibration, H-Risk).
Prototyping
Lightweight tools, reproducible experiments, and open-source utilities for data, figures, and evaluation.
Advisory & Workshops
Short-form advisory and internal workshops on uncertainty, evaluation, and reliability in AI.
About
We are a small, independent studio exploring the interface between feedback control and AI/ML. Our work focuses on epistemic stability, calibration, and methods that make intelligent systems more reliable. We value clear theory, reproducibility, and pragmatic prototypes.
Contact
Email: info@toppymicros.com