Assessing the applicability and interobserver variability of tumor budding and poorly differentiated clusters in colorectal cancer

Colorectal cancer (CRC) was the third most lethal cancer in 2022 worldwide. Tumor budding (TB) and poorly differentiated clusters (PDC) are prognostic factors. However, the lack of standardization in the assessment and reporting of TB and PDC can hinder their application in the pathologist’s daily practice. This study aims to address these challenges by determining the interobserver variability and the applicability of TB and PDC in CRC. In a 93-patient series, two independent pathologists assessed both variables according to ITBCC guidelines on H&E and AE1/AE3 slides. The overall concordance rate and kappa coefficient were 89.2% and 0.81 for both variables on H&E; for IHC, the results were 69.9% and 0.55 – 88.2% and 0.81 for TB and PDC, respectively. Concluding, H&E analysis had excellent agreement results for TB and PDC, indicating their reproducibility and applicability in the pathologist’s daily practice, while AE1/ AE3 IHC can still be used in specific situations.


Introduction
Colorectal cancer (CRC) is the second most common malignancy in both sexes and was the third most lethal cancer in 2022 worldwide (Estimate| 2023 -Cancer incidence in Brazsil 2023; Sistema de Informações Sobre Mortalidade (SIM) 2019).The TNM staging system is widely recognized as the gold standard for staging and prognostic assessment, but recent findings have shown the need for improved patient stratification due to differences in biological behavior and clinical outcome in the era of precision medicine (Dawson et al. 2019).
In response, the World Health Organization (WHO) and National Comprehensive Cancer Network (NCCN) reported a number of histological prognostic factors, including poorly differentiated/undifferentiated histology, perineural invasion, intramural and extramural vascular invasion, lymphatic invasion, positive margins, examination of fewer than 12 lymph nodes, and tumor budding (NCCN Clinical Practice Guidelines and in Oncology -Colon Cancer 2022; WHO Classification of Tumours Editorial Board 2019).These factors are considered high-risk indicators for recurrence and may inform the decision to administer adjuvant treatment (NCCN Clinical Practice Guidelines and in Oncology -Colon Cancer 2022).
Tumor budding (TB) is defined as single cells or small clusters of up to four cells at the invasive margin of colorectal cancer.Similarly, poorly differentiated clusters (PDC) are small groups of five or more cells without glandular differentiation and they are emerging as a promising prognostic factor not only for CRC but also others tumors type (Shivji et al. 2020;Lugli et al. 2021;Shimizu et al. 2018).
In recent years, evidence has accumulated indicating a poor outcome associated with TB and PDC (Shivji et al. 2020;Lugli et al. 2021;Rieger et al. 2017).The College of American Pathologists (CAP) and International Collaboration on Cancer Reporting (ICCR) have emphasized the prognostic value of TB and recommended its inclusion in protocols for colorectal cancer reporting (Lj et al. 2021;Loughrey et al. 2020).
However, there is a lack of standardization in the assessment and reporting of TB and PDC, which can hinder their application in the pathologist's daily practice.This is particularly true for TB, which, despite being standardized by the International Tumor Budding Consensus Conference (ITBCC) in 2016, still lacks homogenization (Lugli et al. 2017).Different approaches are used to identify TB, including the use of different stains (Hematoxylin and Eosin-H&E vs immunohistochemistry-IHC), types of TB analyzed (peritumoral, intratumoral or combined scores), and the cut-off values for risk stratification (Shimizu et al. 2018;Rieger et al. 2017;Slik et al. 2019;Dawson et al. 2020).The results of these disparities can have a negative impact on cancer reporting.
This study aims to address these challenges by determining and comparing TB and PDC both in H&E and IHC (AE1/AE3) slides following the ITBCC guidelines.The objective is to assess interobserver variability and reproducibility among different pathologists using both H&E and IHC techniques and assess the applicability of these methods in the pathologist's daily practice.

Patient cohort
Retrospectively 93 primary colorectal cancer patients diagnosed and treated by upfront surgical resection at Barretos Cancer Hospital between 2008 and 2016 were selected to this study (dos Santoset al. 2019).Sociodemographic and clinical variables were previously collected from patient's medical records archived in the medical archive service department.Those patients in need of postoperative therapy were treated with adjuvant chemotherapy according to the appropriate clinical guidelines.Patients with neoadjuvant therapy were excluded.
The pathology review was performed by M.T.R. according to TNM 8th edition and WHO Classification of Tumours -Digestive System Tumours 5th edition (WHO Classification of Tumours Editorial Board 2019).The assessed features are summarized in Table 1.

H&E and immunohistochemistry
The original H&E slides and formalin-fixed paraffinembedded (FFPE) tissue blocks were requested to the Department of Pathology to select the one slide and corresponding tissue block containing the significant degree of TB and PDC at the invasive front from each patient.Two new histological sections were performed (3 µm) from each selected tissue block, one to a new automated H&E staining with Dako CoverStainer ® (Agilent, USA) and the other one to an automated system using BenchMark Ventana ULTRA IHC/ISH System ® (Roche Diagnostics, Switzerland) to pankeratin cocktail AE1/AE3 (PCK26, Roche Diagnostics, ready-to-use, ref. 760-2595).The new H&E staining and IHC (AE1/AE3) slides from each patient were submitted to TB and PDC analysis.

Tumor budding (TB) and poorly differentiated clusters (PDC) assessment
The analysis followed the ITBCC recommendations for both H&E and IHC slides.In brief, the slide was scanned at medium power (10 × objective) to identify the hotspot at the invasive front and the counting was performed at 20 × objective.The stratification followed the counting: 0-4 as low budding, 5-9 as intermediate budding and 10 or more as high budding (Lugli et al. 2017).PDC were analyzed also following the ITBCC recommendations, including the stratification into low, intermediate and high cluster.All the analysis were performed under a binocular microscope (Novel BM2100 ® , WF 10x/20) with absolute count per 0.785mm 2 and normalization factor equal to 1.000.
Two gastrointestinal (GI) pathologists independently assessed all cases (M.T.R. and M.M.M.) using both the same H&E and IHC (AE1/AE3) slides.In case of discrepancies both pathologists analyzed the slide together to discuss the reasons for disagreement and to reach a consensus in order to determine the best assessment to each discordant case.The data from each independently analyses and consensus are presented in the results session.

Statistical analysis
All the data were stored in REDCap ® (Research Electronic Data Capture) and exported to SPSS for Windows ® program version 27 (IBM SPSS Statistics V27.0, USA).Kappa Coefficient (κ) and Overall Concordance Rate (%) were applied as measures of interobserver reproducibility.
The TB assessed on H&E and AE1/AE3 IHC for pathologist 1 was concordant in the majority of the cases, but discrepancies were observed in some of them (Fig. 1C and D, Table 2).
Similarly, the TB assessed on H&E and AE1/AE3 IHC for pathologist 2 was concordant in the majority of the cases, but discrepancies were observed in some of them (Fig. 1C and D, Table 2).
The reason for disagreement were distinct, with 19 (20.4%) cases having counts close to the strata cutoff, while 7 (7.5%)cases were counted in different areas by each pathologist.The other 2 (2.1%) cases had divergence regarding the viability of the cells.Among the 13 (14.0%)cases between intermediate and high budding, 10 (10.8%) cases were discordant due to counting close to the cutoff, and 3 (3.2%)cases due to counting in a different area.The 2 (2.1%) cases between low and high budding were counted in different areas.
Overall, these findings suggest that using H&E slides may be more reliable for TB analysis than AE1/AE3 slides.
Concerning the PDC analysis, both pathologists achieved a Kappa Coefficient (κ) of 0.81 and an Overall Concordance Rate of 89.2% on H&E slides (Table 3).However, 10 (10.8%) cases were discordant, 5 (5.4%) exhibiting a discrepancy between low and intermediate cluster and 5 (5.4%) cases showing a difference between intermediate and high cluster.From 10 discordant cases, 2 (20.0%) were T2 stage and 8 (80.0%) were T3 or higher stage.In 9 (9.6%) cases, counts were near the cutoff between strata, while in 1 (1.1%) case, there was a discrepancy due the counting in different areas by each pathologist.The 5 (5.4%) cases between the intermediate and high clusters were discordant due the counts close to the cutoff.
The reason for disagreement were as follow: 10 (10.8%) cases had counts close to the strata cutoff, and 1 (1.1%) case had counts in different areas by each pathologist.The case between intermediate and high cluster was discordant due to counting close to the cutoff.The only case between low and high clustering was due to counting in a different area.
Overall, the PDC analysis on AE1/AE3 slides showed a moderate agreement between the two pathologists, with a slightly lower overall concordance rate than the analysis on H&E slides.

Discussion
The assessment of tumor budding (TB) and poorly differentiated clusters (PDC) is important in determining tumor aggressiveness and prognosis (Dawson et al. 2019;Lee and Chan 2018;Konishi et al. 2017).In this study, we compared the traditional H&E method with the immunohistochemical (IHC) method using AE1/AE3 antibodies for TB and PDC analysis.
TB and PDC analysis showed a higher frequency of the higher stratum when performed in AE1/AE3 compared to the traditional method in H&E, by both pathologists, due to the better -and possibly faster -visualization of cells by the immunohistochemical method, which raised discussions about the possibility of new assessment methodologies and different stratification for AE1/AE3 (Rieger et al. 2017).IHC has been employed to develop programs for digital image analysis (Slik et al. 2019;Jiang et al. 2021;Caie et al. 2014), which offers a promising avenue for future research.However, it is worth mentioning that IHC can also highlight fragments of ruptured glands and cellular debris/necrosis, which can present as confounding factors when counting.
To evaluate the reproducibility of the analysis performed by both pathologists, we used the Kappa Coefficient (κ) and the Overall Concordance Rate.The H&E analysis obtained excellent results for BT and GPD (0.81 and 89.2% for both) in line with the literature (Lino-Silva et al. 2018), indicating that the application of ITBCC (Lugli et al. 2017) criteria is reproducible when done the way it is in pathologist's daily practice.
On the other hand, the TB analyses in AE1/AE3 obtained lower Kappa values and overall concordance rates compared to the H&E analysis, highlighting the need for standardized metrics and hotspot selection.Of note, AE1/AE3 is not routinely used in care practice.There were 28 (of 93) discordant cases, and the reasons for disagreement were counting close to the cut (20.4%),counting in a different area (2.1%), and divergence regarding cell viability (2.1%).Disagreements due to counting close to the cutoff point can be overcome with measures such as repeating the analysis by the evaluator or requesting a second opinion with the help of a colleague in the area.The count in a different area raises discussion about a step before the count itself: the hotspot selection.For this purpose, the entire invasion front (or at least ten fields) must be examined and, for that moment, it is worth thinking of some standardization metric as a tutorial to be done before the analysis; this could include residents in training and also already active pathologists, contributing to improved reproducibility.The differences in cell viability observed in our study highlight the fact that IHC staining, which not only highlights the budding cells but also other cellular fragments, requires greater attention during counting.Nonetheless, AE1/ AE3 staining remains an indispensable tool in situations where budding and clustering are difficult to discern in H&E-stained samples due to factors such as marked peritumoral inflammatory infiltrate and reactive stromal cells (Rieger et al. 2017;Lugli et al. 2017;Konishi et al. 2017).As such, pathologists should consider utilizing AE1/AE3 staining in these specific scenarios to ensure accurate and reliable analysis.
The PDC analyses in AE1/AE3 obtained similar Kappa and overall concordance rate results to those obtained in the H&E analysis, demonstrating that, although the AE1/ AE3 is not routinely used in daily practice, it is reproducible for PDC.Features such peritumoral inflammatory infiltrate and heterogeneity of PDC density seem have a minor impact on the analysis and reproducibility when compared to TB.The reason probably is due to the fact of PDC being bigger than TB and more easily visualized on microscopy examination (PDC has five cells or more, no lumen formation) with less obscuration of the front of invasion due to peritumoral inflammatory infiltrate (Shivji et al. 2020).
Interestingly, we observed reduced pathologist discordance among T2 stage when compared with higher T stage tumors on H&E for both TB and PDC, probably due to features regarding the nature of the front of invasion and the association between higher T stage and higher TB and PDC couting (Hong et al. 2017).On AE1/AE3 couting the discordant cases were more equally distributed between lower and higher T stage, reflecting a possible benefit to the identification of the front of invasion by AE1/AE3.

Conclusions
In summary, the present study demonstrated that tumor budding and poorly differentiated clusters on the H&E method are reproducible, accessible, and applicable to pathologists' daily practice.H&E assessment, according to the ITBCC criteria, represents an excellent choice for standardization, having a positive impact on cancer reports.IHC AE1/AE3 analysis remains a valuable tool that can be used in specific situations.
However, further studies with larger sample sizes, standardized metrics and hotspot selection are necessary to ensure AE1/AE3 reproducibility and accuracy for TB and PDC analysis.

Table 1
Colorectal cancer patients characteristics demographic and clinicopathological features (n = 93)