Effect of screening by clinical breast examination on breast cancer incidence and mortality after 20 years: prospective, cluster randomised controlled trial in Mumbai

https://www.bmj.com/content/372/bmj.n256

Dr C.Bour and

– Contribution of Dr M.Gourmelon for the decoding the presentation of the risks of mortality. Appendix b

– Contribution of Dr V.Robert for the statistical analysis. Appendix c

27/02/2021

Three randomized controlled trials were conducted in Russia, China and the United Kingdom [appendix a] involving a total of almost 400,000 women, which showed neither a decrease in breast cancer mortality nor a decrease in mastectomies. On the other hand, false positives were increased, leading to additional examinations and biopsies with normal results, and they also caused an increase in women’s anxiety.

There was also a question of screening by trained professionals, the problem being the reproducibility from one practitioner to another which is not good. In short, clinical breast examination as a routine screening method has not been retained, as breast self-examination does not ultimately prove to be an effective method for the early detection of breast cancer. And it is still unclear whether screening by clinical breast examination can actually reduce breast cancer mortality

An Indian trial

A new study using the idea of clinical breast examination as mass screening is being conducted by a team of researchers in Mumbai, who are publishing the results of their 20-year randomized controlled trial.

The objective of the study was to test the effectiveness of breast cancer screening by clinical examination in reducing mortality from the disease and also in reducing the stage of cancer at diagnosis, compared with no screening.

151,538 women aged 35 to 64 years with no history of breast cancer participated in the study.
Women in the screening group (75,360) received four rounds of clinical breast examination screening (performed by trained primary care professionals) and cancer awareness information every two years. Women in the control group (76,178) received information on cancer awareness and eight rounds of active surveillance every two years.

It is examined whether, in the screening group, the cancer found is of a lower stage at the time of diagnosis compared with unscreened women, and of course whether mortality from the disease has decreased.

The main results of the study are as follows:

-Clinical breast examination performed every two years by primary care professionals significantly reduced the stage of breast cancer at diagnosis.
-Clinical breast examination led to a non-significant 15% reduction in overall breast cancer mortality; BUT this is a relative reduction, i.e., for the screened group compared to the unscreened group (control group). See: [appendix b]
-The authors conclude that there was a significant reduction of nearly 30% in mortality in women aged ≥ 50 years. BUT This is a subset analysis done post hoc, not provided for in the original study protocol, so after the data were known; again, this is not an absolute reduction but a relative reduction in risk by comparing two groups. See: [appendix b]
-No significant reduction in mortality was observed in women under 50.

From the authors’ perspective, what the Mumbai trial provides:

-In this 20-year study, clinical breast examination by trained health workers in Mumbai led to a reduction in breast cancer at diagnosis and a reduction in mortality from the disease by nearly 30% in women aged 50 years and older, but with no reduction in mortality observed in the group of women under 50 years.
-A 5% reduction in all-cause mortality was observed in the screening arm compared with the control arm, but this was not statistically significant.
–Clinical breast examination should be considered for breast cancer screening in low- and middle-income countries.

Review of the Study for Robustness

We present here the analysis of Dr. Robert, statistician of the Cancer Rose group, which you will find more exhaustively in the appendix at the end of the article [Appendix c].

In his opinion, the post-hoc analyses [1], which are included in the abstract and in the conclusion, are problematic.

We summarize his main conclusions about the study:

1-Post-hoc analyses raise suspicions of either a lack of scientific rigor or a lack of objectivity, with a propensity to want to demonstrate a posteriori, by an analysis not previously foreseen in the study protocol and made on the basis of the available data, at any price the effectiveness of screening by clinical breast examination.

2-The study is presented as randomized but in fact it is a cluster randomization (by groups of individuals and not by individuals). The authors do not give any information on the size of the clusters, nor on their characteristics. It is therefore impossible to know whether or not the randomization is sufficient to make the ‘screened’ and ‘control’ groups comparable.

3-The manner in which deaths are attributed (cause of death by breast cancer or other cause?) is debatable. When two physicians consulted to determine the cause are unanimous, the cause is retained, but if there is a disagreement a third opinion is required and the majority of opinions prevail; a rigorous process would require eliminating the disputed case.

Other remarks

Of course, there is a certain rate of over-diagnosis which dilutes the number of more advanced cancers in the total number of cancers found. This means that, as the results are expressed as a percentage, one has the impression that there are more cancers of an earlier stage in the screened group. It is surprising that the BMJ accepts this presentation in percentages which biases the results. And indeed, there are overall more women with breast cancer in the screening arm (198) than in the control arm (151).

(It is known that the presentation in percentages embellishes the data and gives a fictitious perception of reality.)

Thus, in Appendix 4 of the report of the citizens’ consultation [Appendix d] (page 155) for example, since the citizens had asked for an honest rendering of the data, graphic representations are proposed of what the percentages represent “in real life”).

The main point is that the overall result is not enthusiastic, since there is no statistically significant decrease in breast cancer mortality in the screened population as a whole.
And it is always problematic, as Dr Robert points out, to have the results of analyses carried out afterwards, once all the data are in hand, allowing one to “choose” what one wishes to put forward, thus leaving doubt as to the admissibility of the results.

Bernard Duperray[2], in his book “Breast Cancer Screening, the Great Illusion”, mentions the Shanghai study where the results found are almost opposite to those of the Mumbai trial: “In a trial carried out in Shanghai from October 1989 to October 1991 on nearly 270,000 women, 130,000 were trained in breast self-examination under medical supervision and compared with a control group (not screened). Cumulative breast cancer mortality rates after 10 to 11 years of follow-up were similar in both groups …… [3].

To conclude

It is important to keep in mind that this study took place in a very different setting than our Western populations. Initiatives to clinically examine women breast by trained personnel in a setting like Mumbai are likely to reduce the morbidity and stage of cancers for which women may be arriving too late for care.
The authors describe the difficulties[4] encountered in carrying out this trial, particularly from the point of view of financing, when Europe at the same time is capable of sacrificing 12 million euros for a study that failed before it even began, and which will not provide any usable information either on overdiagnosis or on the usefulness of mammographic screening.[5]

If this article lacks scientific robustness, and the interest of organized screening by clinical breast examination cannot be formally affirmed, the non-interest of this type of screening by clinical examination is not demonstrated either!
A more robust three-arm trial could be imagined, with a ‘no screening’ arm.

A ‘rapid response’ to the publication of this study[6] (Ismail Jatoi-Professor and Chief, Division of Surgical Oncology and Endocrine Surgery-University of Texas Health Science Center, San Antonio) is quoted here: “…. the risk of overdiagnosis will increase with the use of more modern screening technology (i.e., tomosynthesis, magnetic resonance imaging), which increases the rate of detection of more occult (non palpable) cancers.”

… “Taken together, the results of the Mumbai trial and the CNBSS[7] suggest that a clinical trial randomizing women aged 50 years and older to mammography screening versus clinical breast examination (CBE) screening is now warranted. If such a trial demonstrates that there is no additional benefit to mammography screening beyond what is achievable with CBE screening, then CBE screening should replace screening mammography as the optimal method of breast cancer screening.”

Ethically, a randomized clinical trial testing mammographic screening versus clinical screening would be justified, but in France it is unfortunately not possible, since the argument often put forward is that it is unethical to exclude women from mammographic screening. It seems to be considered more ethical to coercively and insistently call women to a radiological screening that has failed….

But the pandemic and future problems of health resource allocation may lead us to rethink this kind of testing around the world, especially for underprivileged populations, rather than furthering disappointing mammographic screening with technologies we know in the western world, which increasingly leads to overdiagnosis.

References

[1] In a scientific study, post hoc analysis (from the Latin post hoc, “after this”) consists of statistical analyses that have been specified after the data have been accessed. This usually creates a multiple testing problem because each potential analysis is in fact a statistical test. Multiple testing procedures are sometimes used to compensate, but this is often difficult or impossible to do accurately. Post-hoc analysis that is conducted and interpreted without adequate consideration of this problem is sometimes called “data dredging” by critics because the statistical associations it finds are often wrong.
(Wikipédia : https://en.wikipedia.org/wiki/Post_hoc_analysis )

[2] Bernard Duperray “dépistage du cancer du sein, la grande illusion”-éditions Thierry Souccar.

[3] THomas DB, gao Dl et al. Randomized trial of breast self-examination in Shanghai: nal results. Journal of the National Cancer Institute. 2002 Oct 2;94(19):1445-57.

Cox B. Variation in the effectiveness of breast screening by year of follow-up. Journal of the National Cancer Institute. Monographs. 1997;(22):69-72.

Retsky m. New concepts in breast cancer emerge from analyzing clinical data using numerical algorithms. International Journal of Environmental Research and Public Health. 2009 Jan;6(1):329-48.

[4] https://blogs.bmj.com/bmj/2021/02/24/the-story-of-the-mumbai-breast-screening-study/

[5] https://cancer-rose.fr/my-pebs/2019/06/13/argument-english/

[6] https://www.bmj.com/content/372/bmj.n256/rr-1

[7] Miller AB, Wall C, Baines CJ, Sun P, To T, Narod SA. Twenty five year follow-up for breast cancer incidence and mortality of the Canadian National Breast Screening Study: randomised screening trial. BMJ. 2014;348:g366.

Annexes

[a]

b] Explanation of the misleading nature of risk presentations in terms of relative percentage reduction in risk of death. Dr M. Gourmelon

1-Clinical breast examination led to a non-significant 15% reduction in overall breast cancer mortality-

The 15% comes from the following relative calculation:

breast cancer mortality in the screening group 213 deaths, 251 in the control group.

251 -213 = 38

3800/251 = 15,13 %

This is how we obtain a 15% relative reduction in breast cancer mortality.

But what is it in absolute terms?

213 deaths for 75360 women in the screening group.

21300/75360 = 0.2826% is therefore the absolute percentage of deaths of women from breast cancer in the screening group

251 deaths for 76178 women in the control group.

25100/76178 = 0.3295% is the absolute percentage of women who died of breast cancer in the control group.

We therefore have: 0.3295 – 0.2826 = 0.0469% rounded to 0.05% fewer women deaths.

The relative percentage is therefore 15%, and the absolute percentage is 0.05%.

The absolute percentage represents the reduction in mortality of women between the screening group and the control group.

The relative percentage expresses the difference between the total number of deaths in the screening group and the control group, a reduction for a group not for individuals.

However, what is important to know for good information for women is the reduction that they can expect from screening and not how much of a reduction screening brings in the screening group compared to the non-screening group.

But to “promote” screening, it is better to put forward a figure of 15% reduction, which does not concern women directly but which they will interpret as such, than 0.05%, which is the real reduction they can expect by undergoing screening.

This is all the subtlety of presentation of the figures that most readers of the studies will not be aware of, but that the authors, yes. And that the French citizen consultation of 2015 had asked that such presentations are no longer accepted.

2-Second in the same way:

The authors conclude that there is a significant reduction of almost 30% in mortality in women aged ≥ 50 years.

The 30% comes from the following calculation, in the same way:

64 women over 50 years of age who died of breast cancer in the screening group, 93 in the control group.

There is therefore a reduction in breast cancer deaths of : 93-64 = 29

so 2900/93 = 31.18% relative reduction in mortality in women over 50.

But what about the absolute reduction:

64 women died out of 20965 women over 50 in the screening group.

6400/20965 = 0.3053% absolute breast cancer mortality in the screening group

93 women died out of 21909 women over 50 in the control group

9300/21909 = 0,4245 %

or 0.4245-0.3053 = 0.1192

Therefore, in absolute terms, the reduction in mortality of women from breast cancer is 0.12%.

In the same way as before, calculating in absolute % expresses the mortality risk of women over 50 and therefore its reduction, whereas on the other hand, the relative % only expresses a reduction of one group in relation to another: the comparison between the group of screened women and the group of unscreened women and in no case the reduction of risk of women over 50 themselves.

3-Finally, the same is applicable:

A 5% reduction in all-cause mortality was observed in the screening arm compared with the control arm, but was not statistically significant :

11853 all-cause deaths in the control arm

11261 all-cause deaths in the screening arm

11853-11261= 592 fewer deaths.

59200/11853= 4.9945% so 5% fewer deaths between the screening group and the control group but not 5% fewer women who died, because for that you have to calculate in absolute percentage.

This gives:

1126100/75360= 14.94 %

1185300/76178= 15.56 %

15,56-14,94 = 0.62 %

c] Dr Robert’s analysis

1. With each post hoc analysis (each subgroup comparison), we give ourselves an additional chance of arriving at a statistically significant result by chance.

Thus, to arrive almost certainly at a statistically significant result, it would be sufficient to create the subgroups at random 100x in a row. With the usual significance level of 0.05, there would be a little more than 99 chances out of 100 that at least 1 of the 100 subgroup comparisons would give (by chance, since the subgroups were formed on a random basis) a p.value <0.05 (in other words, a statistically significant result).
(Editor’s note: Thus we keep this analysis which seems positive according to the criteria retained by the authors and we can ignore the other 99 studies which appear negative).

Thus the authors do not report 100 post hoc analyses but only 2 (one for the under 50s and one for the over 50s). But, with the 2 post hoc analyses + the main analysis (the one without subgroups), this still gives 3 “tickets” to try to have at least one statistically significant result. The risk of error in the conclusion is therefore no longer 0.05 but about 0.143.

More importantly, it is not known how many post hoc analyses were actually performed. The authors show results for subgroups under 50 and over 50. But in fact, we don’t know how many subgroups they tried before arriving at a subgroup with a statistically significant result. Nothing says that they didn’t try more than 36 years: no success; then more than 37 years: no success; then more than 38 years: no success; …; then more than 50 years: p.value = 0.02 we can publish.

The problem is that, in doing so, they would have given themselves 15 “tickets” to have a statistically significant result. The risk of error in the conclusion would therefore no longer be 0.05 but 0.537 (in other words, more than a 1 in 2 chance that the conclusion of a decrease in mortality in women over 50 years of age is due to chance and questionable statistical methodology).

The fact that the authors did post-hoc analyses (no matter how many) proves :
– a lack of statistical rigor or knowledge
– a lack of objectivity with a strong desire to demonstrate at all costs the effectiveness of screening by palpation (and from then on, one has the right to question the honesty of the study).

2. The study is presented as randomized but in fact it is a cluster randomization (by groups of individuals and not by individuals). The authors state that there are 20 clusters but give no information on the size of the clusters and the heterogeneity of risk factors between the different clusters. It is therefore impossible to know whether or not randomization is sufficient to make the screened and control groups comparable.

To understand the importance of the problem, let’s take a caricatured situation:

2 clusters, one high-risk, the other low-risk.
In this situation, randomization does not change the comparability of the groups. One will receive the high risk cluster and the other the low risk cluster. Randomization or not, the 2 groups will not be comparable.
If the clusters are perfectly identical in terms of risk factors, randomization is unnecessary. No matter how the clusters are assigned to each group, there will be no problem of comparability anyway since all clusters are identical in terms of risk factors.
If there are an infinite number of clusters, it does not matter whether they are identical or not in terms of risk, the randomization will balance the distribution of the high and low risk clusters between the 2 groups.

In practice, we are always in an intermediate situation, with a number of clusters > 2 but not infinite and clusters that are not perfectly identical in terms of risk. In order to judge whether or not cluster randomization is likely to produce comparable groups, it is therefore necessary to know both the number of clusters and the heterogeneity between clusters.

3. The way to decide whether or not a death is attributable to breast cancer is rather curious.

Two doctors give their opinion. If the two opinions converge, these identical opinions determine the attribution (OK with that).

If the two opinions differ, a 3rd opinion is requested and the majority opinion determines the attribution. And there is a problem. The majority decision is democratic and well adapted to politics but it is not scientific. If the opinions of the first two doctors differ, the situation is ambiguous.

And it would be more honest not to take these cases into consideration than to want to remove the ambiguity at all costs by a 3rd opinion not necessarily more reliable than the first two.

It is difficult to know how many cases of disagreement there were. There were 17% in one group and 10% in the other where the cause of death could not be attributed; but figures on the frequency of cases where the attribution was made despite a contrary opinion from at least one of the physicians cannot be found. This is crucial information.

At the very least, the robustness of the conclusions should have been checked by a sensitivity study taking into account non-attributions and ambiguities in the attribution of the cause of death (curiously, this was done for the analysis of the staging (analysis of the stage of the cancer at the time of its diagnosis) of cancers and not for the analysis of cancer deaths; in the case of deaths, this removes all reliability from the conclusions)

d] Consultation report, see page 155

🛈 Nous sommes un collectif de professionnels de la santé, rassemblés en association. Nous agissons et fonctionnons sans publicité, sans conflit d’intérêt, sans subvention. Merci de soutenir notre action sur HelloAsso.
🛈 We are an French non-profit organization of health care professionals. We act our activity without advertising, conflict of interest, subsidies. Thank you to support our activity on HelloAsso.