Heard of Google Deepmind's Aletheia? Not many have, but why it should raise alarm bells for any business leader reading this...
By Mark Sutter
An AI just solved 6/10 research-level math problems that professional mathematicians couldn't and then wrote a publishable paper about it, without a human touching the keyboard. This isn't a lab curiosity. It's a signal that AI has crossed a threshold that your business strategy, governance policies, and insurance contracts most certainly haven't caught up with yet.
What This Means for Your Business
Imagine hiring a contractor who, instead of sending you drafts for approval, just completes the work, checks it themselves, fixes any mistakes they find, and sends you the final output. No check-ins. No "does this look right?" Just a finished product landing in your inbox.
That is, functionally, what Google DeepMind's Aletheia system has just demonstrated. Aletheia operates through a three-part agentic loop: a Generator that proposes solutions, a Verifier that checks for flaws, and a Reviser that corrects errors, cycling autonomously until it produces a clean output. It doesn't wait for a human to sign off between steps. It uses Google Search and web browsing to navigate research literature, reducing hallucinations and fictitious citations that persist in standard AI models.
The result? An AI agent that has produced an autonomous, publishable research paper without human intervention, classified by DeepMind itself as "essentially autonomous" and solved four previously open problems from the famous Erdős conjecture database.
Now, you might be thinking: that's impressive, but I'm not running a maths research lab. Fair point.
But here's why you should care: the architecture powering Aletheia (autonomous looping, self-verification, iterative revision, live internet access) is the same architecture that is rapidly becoming standard in the AI tools your business is probably already using or evaluating: Microsoft Copilot Agents, ChatGPT Tasks, Claude Projects, Gemini for Workspace. These are all moving in this direction.
The Rules You Need to Know About
Three regulatory frameworks are directly relevant here, and none of them are getting the attention they deserve in boardrooms.
The EU AI Act (Regulation EU 2024/1689) is the one with the sharpest teeth. It came into force in 2024 and obligations are phasing in now. Under Article 14, any AI system operating in a high-risk context (and that includes recruitment, credit decisions, access to services, and a growing list of professional applications) must have human oversight mechanisms in place. This means a real human must be capable of understanding, monitoring, and intervening in what the AI is doing. If your business has deployed AI that loops, auto-corrects, and produces final outputs without a human reviewing each step, you may already be in scope of these obligations without realising it. Fines run up to €15 million or 3% of global annual turnover, whichever is higher.
ISO/IEC 42001:2023 is the AI equivalent of an ISO 9001 quality management standard. Clause 6.1 requires you to assess the risks your AI systems create, and Clause 8.4 requires you to manage the AI lifecycle, meaning you need to know what your AI is actually doing, document it, and be able to show that to an auditor, a client, or an insurer. Most SMEs using AI tools have never done this exercise. The Aletheia paper is useful precisely because it models what good AI documentation looks like: transparent, auditable, with prompts and outputs publicly available. Could you say the same about how your business uses AI?
The NIST AI Risk Management Framework (NIST AI 100-1, 2023) asks a simple question under its GOVERN function: who in your organisation is accountable for the outputs your AI produces? Google DeepMind has helpfully proposed a classification system for AI autonomy levels, ranging from "human with secondary AI input" through "human-AI collaboration" to "essentially autonomous." Where does your AI sit on that spectrum? If you don't know, that's your first governance gap.
What Happens If You Do Nothing
The risk isn't abstract. It's practical, commercial, and arriving faster than most businesses expect.
If you're selling professional services and your AI is producing outputs that a client relies on, you need clarity on who is liable when something goes wrong. Most professional indemnity policies were written before agentic AI existed. If your AI autonomously produces a document, an analysis, or a recommendation and it contains a material error, your insurer may reasonably argue that an autonomous system was making decisions your policy doesn't cover.
If you're bidding for public sector or enterprise contracts, AI governance questions are already appearing in procurement frameworks. Being unable to demonstrate that you've assessed your AI's autonomy level, documented your oversight mechanisms, and aligned with relevant standards will cost you bids. Not eventually. Now.
And if a regulator comes knocking, particularly after the EU AI Act's high-risk provisions fully bite, "we didn't realise our AI was operating autonomously" is not a defence. Ignorance of the system's behaviour is itself evidence of a governance failure.
Three Things to Do This Week
- Map your AI tools against an autonomy scale. For every AI tool your business uses, ask one question: does a human review and approve its output before it reaches a client, a customer, or a consequential internal decision? If the answer is no, or "sometimes", you have an oversight gap that your governance documentation needs to address. You don't need a consultant to do this first pass. You need 90 minutes and a whiteboard.
- Check your professional indemnity policy wording. Call your broker this week and ask specifically whether your PI cover extends to errors in AI-generated outputs, including outputs produced by agentic or autonomous AI systems. If they can't answer with confidence, you need a rider or a review. This is a two-sentence email that could save you a six-figure dispute.
- Start an AI register. A simple document (a spreadsheet is fine) listing every AI tool your business uses, what it does, who is accountable for its outputs, and whether it operates autonomously. This is the foundation of ISO 42001 compliance and the first thing an auditor, insurer, or an enterprise client will ask for. Businesses that have this document are visibly different from those that don't. It takes an afternoon to create and signals to anyone who sees it that you're running AI like a grown-up organisation.
Complete Your AI Governance Framework
Our clients have become increasingly aware of the need to have a form of AI governance framework but failed to know where to start, so we created a tool to assist organisations in creating their own: aiframework.3peat.ai, free to use, specifically designed for leadership to start on the building blocks, to be fine-tuned over next iterations to make it an operational tool that employees can use in their daily practices.
Ready to create your own AI Framework?
Use our guided framework builder to list your AI systems, classify risk, and generate a practical governance framework your team can implement immediately.
Create your own AI Framework