Every year, NHS radiology departments conduct thousands of clinical audits. Most are never finished. Of those that are, the majority change nothing. This is not a failure of intent – it is a failure of infrastructure. And for too long, it has meant that one of the most data-rich environments in healthcare has been one of the least capable of learning from itself.

That is changing. AI-powered clinical audit tools are now making it possible to analyse tens of thousands of radiology reports overnight – turning what was previously a months-long, manually intensive process into something that happens while you sleep. In this post, we explain what AI clinical audit is, why traditional approaches have consistently fallen short, and what the shift to AI-enabled continuous audit means for patient care, NHS efficiency, and clinical practice.

In this article: What is AI clinical audit? | Why traditional audit fails | How NLP and LLMs are changing the picture | What continuous audit looks like in practice | The role of governance | FAQs

Radiology room in a hospital

What Is AI Clinical Audit – and Why Does It Matter?

Clinical audit is the process by which healthcare teams measure their practice against agreed standards, identify where gaps exist, and make changes to improve outcomes. In radiology, this means regularly reviewing how imaging studies are requested, reported, and acted upon – checking whether diagnostic thresholds are appropriate, whether report quality meets professional standards, and whether patient pathways are delivering the outcomes they should.

Done well, clinical audit is one of the most powerful quality improvement tools available to the NHS. The problem is that it has rarely been done well – not because clinicians don’t care, but because the tools available have been wholly inadequate for the scale of the task.

AI clinical audit uses large language models (LLMs) and natural language processing (NLP) to automatically extract, categorise, and analyse the content of radiology reports at scale. Rather than asking a junior doctor to read 100 reports by hand, AI systems can process tens of thousands of reports overnight – identifying patterns, flagging outliers, and generating insights that would be statistically impossible to reach through manual methods.

“You press enter in the evening and in the morning you have a lovely audit with tens of thousands of patients categorised.” – Consultant Radiologist, NHS Trust

Why Traditional Clinical Audit Has Failed NHS Radiology

To understand why AI audit matters, it helps to understand why the traditional approach is so consistently inadequate. There are four structural problems that have made manual audit almost useless as a driver of meaningful change.

1. Sample sizes are too small to detect what matters

A typical manual audit reviews somewhere between 50 and 200 cases. For common outcomes, this may be sufficient to spot obvious problems. For rare but clinically significant events – a complication occurring in one in five hundred patients, for example – a sample of this size is statistically powerless. The absence of evidence in a small sample is not evidence of absence. But in practice, it has too often been treated as such, allowing problems to persist because no one could prove they existed.

2. Audit cycles are routinely left incomplete

Clinical audit in the NHS is frequently assigned to junior doctors or registrars as a training activity. The problem is that audit cycles take months to complete – and in a system where clinical staff rotate between departments every four to six months, the person who starts an audit is often not the person who finishes it. The result is a graveyard of half-completed audit spreadsheets and presentations that were delivered once and never acted upon.

3. Data access is fragmented and slow

Radiology data is held across multiple systems – RIS (Radiology Information Systems), PACS (Picture Archiving and Communication Systems), and electronic patient records – that do not always speak to each other cleanly. Extracting a meaningful dataset for audit purposes has traditionally required significant manual effort from both clinicians and informatics teams, creating a bottleneck that discourages even well-intentioned audit activity.

4. Text data has been inaccessible at scale

The most valuable data in radiology is not structured. It is the free text of the radiology report itself – the consultant’s description of what they found, what they suspected, and what they recommended. Until the arrival of modern large language models, extracting reliable, structured information from this text at scale was not technically feasible. Earlier NLP approaches could work for narrow, well-defined tasks, but were too brittle to handle the natural variation in how different radiologists write.

“Data is the new oil – but the really valuable stuff is in text and voice, which have previously been inaccessible. With the arrival of LLMs, that has changed entirely.” – Roger Marlow, CTO, 33N

How Large Language Models Are Transforming Radiology Audit

Large language models – the same class of AI technology that underpins tools like Claude and GPT-4 – understand language contextually, in the way that a trained clinician does. They can handle double negatives, abbreviations, uncertainty language, and the natural variation in how different radiologists describe similar findings. This makes them fundamentally different from earlier text-mining approaches, which required rigid rule sets and broke down easily.

Applied to radiology audit, LLMs can reliably perform tasks that were previously beyond automated systems, including identifying whether a specific diagnosis was confirmed, suspected, or excluded; categorising report quality against professional standards; flagging cases where recommended follow-up was not clearly documented; and detecting changes in diagnostic patterns over time across large patient cohorts.

The scale this enables is not incremental – it is transformational. A department that was previously auditing 100 cases per year can now audit its entire caseload continuously. A trust that spent weeks gathering data for a quarterly review can run that analysis overnight. The question shifts from ‘do we have enough data to see anything?’ to ‘what do we want to understand?’

Key figure: British radiology consumes approximately £3.5 billion in NHS funding annually. Individual large departments may spend £25 million or more per year. AI-enabled audit provides, for the first time, the tools to understand whether that investment is being deployed appropriately.

Data analysis on computer

What Continuous AI Audit Looks Like in Practice

The CLEAR programme – developed by 33N in partnership with NHS trusts – is one of the most advanced implementations of AI-powered continuous audit currently operating in the NHS. It provides a worked example of what this approach looks like when deployed at clinical scale.

Using natural language processing applied to radiology report text, CLEAR analyses imaging data across multiple clinical pathways – including CT pulmonary angiography (CTPA) for suspected pulmonary embolism, one of the most commonly requested and resource-intensive investigations in NHS radiology.

The system automatically categorises reports by diagnostic outcome, tracks how positive detection rates change over time, and benchmarks performance against evidence-based guidelines. This creates a continuously updated picture of how the pathway is performing – not a snapshot taken once a year, but a living dataset that reflects current practice.

The clinical impact of this capability has been significant. In one participating trust, AI audit of the CTPA pathway revealed that the positive PE detection rate had fallen below levels consistent with optimal pathway calibration. This provided the statistical foundation for the trust’s VTE committee to adjust its D-dimer screening thresholds – a change that clinicians had long considered evidence-based but had been unable to justify without population-level data. The audit did not create the desire for change. It created the evidence that made change possible.

The Hidden Cost of Over-Diagnosis: Why Getting Thresholds Right Matters

One of the most important – and underappreciated – insights that AI audit can surface is the harm caused by diagnostic over-sensitivity. In NHS radiology, there is an institutional tendency to err on the side of investigation: when in doubt, scan. This feels like the safe option. In many individual cases, it is. But at population level, it carries real costs.

For pulmonary embolism specifically, the consequences of over-diagnosis are well documented. Patients who are incorrectly diagnosed with PE and started on anticoagulation therapy face elevated risks of gastrointestinal bleeding and intracranial haemorrhage. The radiation exposure from unnecessary CT pulmonary angiograms carries a small but meaningful long-term cancer risk. Over-investigated patients may experience anxiety, treatment side effects, and unnecessary clinical follow-up – all of which consume NHS resource that could be deployed elsewhere.

These harms are real. They are simply not counted in the same way as missed diagnoses. Because the counterfactual – ‘this patient would not have had this complication if we had not scanned them’ – is rarely visible at the individual patient level, the system does not naturally self-correct. AI audit, operating at population scale, can make the aggregate picture visible in a way that individual clinical judgment cannot.

Governance: Why Getting This Right Is Non-Negotiable

The potential of AI clinical audit is only realisable within a robust governance framework. This is not a caveat to be managed around – it is a prerequisite for the technology to function as intended and to maintain the clinical trust that is essential for its findings to be acted upon.

Data governance in this context means ensuring that appropriate data sharing agreements are in place between participating trusts; that data protection impact assessments have been completed and approved; that patient data is processed in accordance with UK GDPR; and that access to identifiable data is restricted to authorised personnel for specific, documented purposes. These requirements are well understood and achievable – but they require genuine engagement and cannot be bypassed in the name of speed.

Clinical validation is equally important. Before an AI audit tool is used to inform clinical governance decisions, its performance on the specific task for which it is being used needs to be demonstrated against a clinician-reviewed ground truth. This does not require perfection – no human reader of radiology reports achieves perfection. It requires evidence that the system performs at least comparably to a trained clinician on the relevant classification task.

AI audit is designed to improve the quality of clinical decision-making, not to replace it. The appropriate response to audit findings is always a clinical governance conversation. The technology provides the evidence; clinicians and governance teams determine what to do with it.

Frequently Asked Questions About AI Clinical Audit

What is AI clinical audit in the NHS?

AI clinical audit uses large language models and natural language processing to automatically analyse clinical records – such as radiology reports – at scale. It enables NHS departments to review thousands of patient cases overnight, identify patterns in clinical practice, and benchmark performance against evidence-based guidelines, replacing the slow and small-scale manual audit processes that have historically been the norm.

How is AI audit different from traditional clinical audit?

Traditional clinical audit reviews small samples – typically 50 to 200 cases – manually, over a period of weeks or months. AI audit can process tens of thousands of cases overnight, continuously, and across multiple sites simultaneously. It can detect patterns that are statistically invisible to small-sample audit, and it eliminates the bottleneck of manual data extraction and analysis.

Is AI audit safe to use in clinical settings?

Yes, when implemented with appropriate governance. AI audit tools used in the NHS should be validated against clinician-reviewed ground truth data before deployment, operate within robust data governance frameworks, and be used to inform – not replace – clinical decision-making. The findings of AI audit are inputs to clinical governance processes, not autonomous directives.

What clinical areas can AI audit be applied to?

Radiology is the most mature application, given the high volume of structured text data in radiology reports. However, the underlying approach – using LLMs to extract and categorise information from clinical free text – is applicable to any specialty that generates free-text records at scale, including endoscopy, cardiology, pathology, and outpatient clinic letters.

How does AI audit support NHS radiology pathway improvement?

AI audit provides the population-level evidence that pathway improvement decisions require. By analysing large cohorts of patient data, it can reveal whether diagnostic thresholds are appropriately calibrated, whether detection rates are consistent with clinical guidelines, and whether there is unwarranted variation between sites or over time. This evidence base supports clinical governance teams in making changes that would be difficult to justify on the basis of small-sample data alone.

What is the CLEAR programme?

The National CLEAR Programme supports clinicians and organisations to deliver transformation and workforce redesign projects to enhance patient care.

Clinicians from the CLEAR national team equip frontline staff with skills in data analysis, innovation and leadership to drive change or they carry out innovation projects on behalf of organisations with extensive clinical engagement.

 

Listen to the CLEAR Talk Podcast

This article draws on insights from the inaugural CLEAR Talk podcast, in which the 33N team and NHS clinical leaders discuss how AI is transforming radiology audit, the evidence behind pathway improvement, and what continuous audit means for the future of NHS quality improvement. Listen to the full episode here:

CLEARtalk Podcast

 

 

 

 

 

 

 

Want to learn more about deploying AI clinical audit at your trust? Get in touch with the CLEAR team.