Using a standardized assessment, UK researchers compared the performance of a commercially available artificial intelligence (AI) algorithm with human readers of screening mammograms. The results of their findings were published in radiologyJournal of the Radiological Society of North America (RSNA).
Mammography screening does not detect every breast cancer. False-positive interpretations can lead to women without cancer undergoing unnecessary imaging and biopsy. To improve the sensitivity and specificity of screening mammography, one solution is to have two readers interpret each mammogram.
According to the researchers, double reading increased cancer detection rates by 6 to 15% and kept recall rates low. However, this strategy is time-consuming and difficult to achieve when readers are scarce.
There is a lot of pressure to quickly deploy AI to solve these problems, but we need to get it right to protect women’s health.”
Yan Chen, PhD, Professor of Digital Screening, University of Nottingham, UK
Prof. Chen and her research team used test sets from the Personal Performance in Mammographic Screening, or PERFORMS, a quality assurance assessment used by the UK National Health Service Breast Screening Program (NHSBSP) to compare the performance of human readers with AI. A PERFORMS test consists of 60 challenging tests from the NHSBSP with abnormal, benign and normal findings. For each test mammogram, the reader’s score is compared to the ground truth of the AI results.
“It’s really important that human readers working in breast cancer screening demonstrate satisfactory performance,” she said. “The same will be true for AI once it enters clinical practice.”
The research team used data from two consecutive sets of PERFORMS tests, or 120 screening mammograms, and the same two sets to evaluate the performance of the AI algorithm. The researchers compared the results of the AI test with the results of 552 human readers, including 315 (57%) board-certified radiologists and 237 non-radiology readers, consisting of 206 radiographers and 31 breast clinicians.
“The 552 readers in our study represent 68% of readers in the NHSBSP, so this provides a robust performance comparison between human readers and AI,” said Prof. Chen.
Treating each breast separately, there were 161/240 (67%) normal breasts, 70/240 (29%) malignant breasts, and 9/240 (4%) benign breasts. Masses were the most common malignant mammographic feature (45/70 or 64.3%), followed by calcifications (9/70 or 12.9%), asymmetries (8/70 or 11.4%), and architectural distortions (8/ 70 or 11.4%) . The mean size of the malignant lesions was 15.5 mm.
No performance difference was observed between AI and human readers in breast cancer detection in 120 examinations. Human reader performance demonstrated an average of 90% sensitivity and 76% specificity. The AI was comparable in sensitivity (91%) and specificity (77%) to human readers.
“The results of this study provide strong supporting evidence that artificial intelligence for breast cancer screening can perform as well as human readers,” said Prof. Chen.
Prof. Chen said more research is needed before AI can be used as a second reader in clinical practice.
“I think it’s too early to say exactly how we’ll end up using AI in breast screening,” she said. “The large prospective clinical trials that are underway will tell us more.” But no matter how we use AI, the ability to provide ongoing performance monitoring will be critical to its success.”
Prof Chen said it was important to recognize that AI performance can change over time and algorithms can be affected by changes in the operating environment.
“It is vital that imaging centers have a process in place to ensure ongoing monitoring of AI once it becomes part of clinical practice,” she said. “There are no other studies to date that compare the performance of such a large number of human readers on routine quality assurance test sets with AI, so this study can provide a model for evaluating AI performance in a real-world setting.”
source:
Radiological Society of North America
Journal reference:
Chen, Y., et al. (2023) Performance of an AI algorithm for breast cancer detection using personal performance in a mammography screening scheme. radiology. doi.org/10.1148/radiol.223299.