This is a randomized, controlled, parallel-group, non-inferiority, single-blind trial conducted within the Swedish national screening program. Women eligible for mammography screening are recruited from four screening sites in southwest Sweden (Malmö, Lund, Landskrona, and Trelleborg). They are randomly assigned (1:1) to one group receiving artificial intelligence (AI)-assisted screening and one receiving screening with the standard dual reading.
The primary endpoint of the MASAI trial is the rate of interval cancer*, which will be assessed after all study participants have had at least two years of follow-up (estimated to be December 2024 plus 6 months to ensure all events are recorded in the cancer registry). Secondary outcome measures: women’s recall rates, cancer detection, false positive rates (suspected cancer that is not confirmed after further testing), type and stage of cancer detected, and screen reading workload.
*Interval cancers occur between two mammograms after a normal mammogram.
They are either occult cancers, not visible and missed because they are too diffuse or present in a very dense breast, or they are cancers that do not actually exist at the time of the examination and then develop very quickly in a short time. There are indeed very different mechanisms of evolution between cancers, with slow, very slow forms that will not impact women’s health, and others, on the contrary, rapid and aggressive, developing too quickly for screening to detect them; these interval cancers are screening failures. Read about it: https://cancer-rose.fr/en/2023/06/26/what-is-the-natural-history-of-cancer/
and see: https://cancer-rose.fr/en/2024/05/30/mechanisms-of-breast-cancer-2/
Context
Since 2019, articles have been published praising the merits of AI as an innovative method that can “predict” breast tumors. This method would revolutionize mammographic screening in the future and improve cancer detection while reducing the workload for radiologists related to reading on screen. Still, it was necessary to understand the clinical impact better, hence this MASAI trial.
In a meta-analysis published in the BMJ (2021, bibliographic research on 12 studies), authors concluded there is currently a lack of good quality evidence to replace human radiologists with AI technology for breast cancer. More precisely: “current data on AI do not yet allow us to judge its accuracy in breast cancer screening programs, and it is not clear at what stage of the clinical pathway AI could be most useful. AI systems are not specific enough to replace the radiologist’s double reading in screening programs. The promising results of small studies are not reproduced in large studies. Prospective studies are needed to measure the effect of AI in clinical practice.”
Hence, this MASAI trial is used to assess the clinical impact of AI in mammographic screening.
MASAI trial results and discussion
The results suggest that AI contributes to the early detection of clinically relevant breast cancer and reduces the workload of radiologists reading screens without increasing the number of false positives.
More precisely, the use of AI as a detection aid in mammography screening resulted, the authors say, in a significant 29% increase in cancer detection compared to standard double reading without AI (6.4 versus 5.0 per 1000 participants screened), with a similar false positive rate and a substantial 44% reduction in the workload related to screen reading.
The increase in detection mainly concerned small invasive cancers without lymph node invasion. We will see later (in the Cancer Rose comments section) the real meaning of this 29%.
According to the authors, AI-assisted screening detected more invasive cancers of a molecular subtype with a poorer prognosis, including more triple-negative cancers. The authors write: “The large increase in the number of small, node-negative invasive cancers detected suggests that it is possible to downstage cancers by detecting them earlier using AI.”
Since the primary outcome of the MASAI trial was the rate of interval cancer, the authors speculate that the increased detection of small, node-negative invasive cancers, some with poorer prognostic features, could lead to a subsequent decrease in interval cancers and high-grade cancers in the next screening cycle; However, they express an initial reservation: because to assess the veracity of this hypothesis and the real clinical impact, this supposition will have to be confirmed during long-term follow-up, to detect whether the rate of interval cancers is reduced compared to screening procedures without AI.
The authors have other reservations.
AI-supported screening has also led to a relative increase in the detection of in situ cancers, although their number is lower than that of invasive cancers. These in situ carcinomas considerably fuel overdiagnosis, that is, the reservoir of non-invasive cancers, a substantial proportion of which would not threaten the woman’s life or health if they had not been found. These are then useless diagnoses, followed by equally useless and aggressive treatments.
The authors note, however, that there was no increase in the number of low-grade ductal carcinomas in situ, which would have further increased the burden of overdiagnosis in breast cancer screening. Approximately half of the additional ductal carcinomas in situ detected were nuclear grade III, which is considered clinically relevant early detection because the biological profile of these in situ is more aggressive, and the probability of becoming invasive cancer is higher. However, the other half of the in situ found were of intermediate risk, and this portion could, therefore, potentially aggravate the overdiagnosis related to screening.
According to the authors, regarding the 29% more cancers detected, this reported increase would indicate a greater increase in cancer detection using AI. Still, it could also result from a learning curve of the participating radiologists who have become accustomed to reading on screen with AI.
For them, the radiologists must retain the final decision because the AI is likely to be wrong in both directions, so it is crucial that the radiologists ultimately are there and can rule out potentially inaccurate results reported by the AI to maintain a low false positive rate.
The authors found a non-significant 8% increase in the recall rate in the intervention group (screening with AI) compared with the control group (screening without AI). Still, a significantly greater proportion of the recalled records in the screening with AI group were true positives, resulting in only seven additional false positives compared with the usual screening group. Again, further follow-up will show the net effect of this approach on false positives.
There are, therefore, still many uncertainties, and the future evaluation of interval cancers in the MASAI trial will be instructive. The authors also stress the need to conduct additional randomized trials to study the real effect of the radiologist’s work with AI and its influence on medical decision-making.
The workload
The authors suggest that the significant reduction in workload (nearly by half) for radiologists enabled by the AI-assisted imaging reading procedure would allow breast radiologists to devote more time to more complex, patient-centered tasks.
Given that breast cancer treatment and associated costs increase with stage, downstaging cancer through earlier detection using AI would reduce morbidity and treatment costs.
Limitations of this study concern the generalizability of the procedure with AI.
The trial was conducted in a Swedish screening program, starting at age 40, with low baseline recall rates and the use of a single mammography and AI provider. Race and ethnicity were not recorded, as these data are not routinely collected in clinics for confidentiality reasons. Still, up to 35% of the target population has an immigrant background, according to official statistics. It is known that there can be variations in breast cancer types related to ethnicity, and this is a factor that must be taken into account if bias is to be avoided when interpreting the results.
Authors’ Conclusion
AI-assisted screening mainly contributed to the increase in the detection of small invasive cancers without lymph node involvement, which, in addition to good prognosis cancers, also included poorer prognosis cancers and triple-negative invasive cancers.
To a lesser extent, there is an increase in the detection of in situ cancers, including an increase in the detection of high-grade cancers without an increase in low-grade cancers, but with an increase in intermediate-stage in situ.
Overall, the use of AI has the potential to increase early detection of clinically relevant breast cancer without unduly increasing harm from false positives and overdiagnosis of low-grade in situ cancer, the study said.
However, further monitoring of interval cancer rates and cost-effectiveness is needed, and the study duration needs to be extended.
More nuanced conclusions from an editorial accompanying the study
The two Australian authors writing the editorial point out that the increased detection in the AI-assisted screening strategy included invasive breast cancers, generally with favourable prognostic features, including the absence of lymph node metastases, but also in situ forms. For them, given that in situ disease, in particular, can be clinically non-progressive, this necessarily has implications for overdiagnosis and consequent overtreatment. The assurance regarding early detection of clinically progressive cancers based on descriptive features does not negate that 30% of the differential detection in AI-assisted screening was in situ malignancy. Overdiagnosis should not be an overlooked or downplayed concern.
Therefore, as acknowledged by the trial authors themselves, it will be necessary to await the primary endpoint of this randomized trial, namely interval cancer rates, to demonstrate whether the increased cancer detection in AI-assisted screening, as reported in MASAI, has had the effect of actually reducing rates of the most aggressive breast cancers.
It is likely that, in time, standard double reading of mammograms will be phased out of organized breast cancer screening programs provided that additional trials or implementation studies demonstrate recovery of the most threatening interval cancers.
Another condition is that human readers remain a central screening element, in line with women’s expectations, and to the extent that AI is not 100% infallible.
Cancer Rose Comments
First, in this article, the detection rates are presented as significant.
Where do the 29% more cancers detected thanks to AI come from? As explained in the article, the cancer detection rates were, in absolute values, 6.4 per 1,000 women among the participants screened in the group with AI and 5.0 per 1,000 in the group without AI.
The relative difference (i.e., comparing the two groups) is obtained by calculating the ratio of the test group (with AI) / reference group (without AI). Namely here 6.4 / 5.0 : 1.29; the relative increase in detecting cancer thanks to AI is 1.29-1 = 0.29, namely the 29% that the authors put forward.
But the absolute difference in finding cancer using AI is only 6.4/1000 – 5.0/1000 = that is to say 1.4 patients/1000 women, or 0.14%. This is already a much less impressive result.
In summary, in the group where artificial intelligence was used, out of 1000 women, the number of cancers detected was 6.4. When artificial intelligence was not used, out of 1000 women, the number of cancers detected was 5.0. This difference, announced by the authors as “an increase of 29%,” is an artificially inflated way of presenting things and embellishing the situation.
Of these 1.4 per 1000 women detected in addition thanks to AI, it remains to be seen how many are invasive cancers and how many are in situ and are part of overdiagnosis.
Because overdiagnosis remains a problem, it can only be tolerated if it is minimal and if screening demonstrates its significant effectiveness in detecting the most serious cancers. This remains entirely to be proven.
A concern that is rarely highlighted is the human factor. On the one hand, the radiologist must be careful not to be too confident in this technology and, under the guise of optimizing the time saved, not to control the results by trusting it blindly.
Similarly, another concern is that the radiologist, being assisted, loses detection skills and abilities, which are acquired mainly over time and by repeating the exercise of reading the images.
As the authors put forward, humans remain indispensable because other capacities come into play in diagnosing an imager, namely their experience and their capacity for critical judgment on an AI result that seems questionable to them.
AI is not infallible, and in terms of screening, it remains to be demonstrated that this innovation rhymes with progress.
🛈 Nous sommes un collectif de professionnels de la santé, rassemblés en association. Nous agissons et fonctionnons sans publicité, sans conflit d’intérêt, sans subvention. Merci de soutenir notre action sur HelloAsso.
🛈 We are an French non-profit organization of health care professionals. We act our activity without advertising, conflict of interest, subsidies. Thank you to support our activity on HelloAsso.