
imbalance mitigation offers a robust, clinically rele-
vant solution for upper GI cancer detection. Although
limited to a single hospital and showing some mid-
range probability miscalibration, the models demon-
strate strong clinical potential. Future work could
expand the dataset to cover more years, apply cali-
bration correction, and incorporate structured clinical
data to further improve sensitivity and robustness.
Ethical Considerations
The study was approved by the NHS Trust and Uni-
versity ethics committees (REC 22/PR/1559). All pa-
tient data were pseudonymised prior to analysis to en-
sure confidentiality.
ACKNOWLEDGEMENTS
This work used the ADA High Performance Comput-
ing cluster (HPC) at the University of East Anglia.
We thank the HPC support team for their assistance.
REFERENCES
Alexandre, L., Tsilegeridis-Legeris, T., and Lam, S. (2022).
Clinical and endoscopic characteristics associated with
post-endoscopy upper gastrointestinal cancers: a sys-
tematic review and meta-analysis. Gastroenterology,
162(4):1123–1135.
Alsentzer, E., Murphy, J. R., Boag, W., Weng, W.-H., Jin,
D., Naumann, T., and McDermott, M. (2019). Pub-
licly available clinical BERT embeddings. arXiv preprint
arXiv:1904.03323.
Beaton, D. R., Sharp, L., Lu, L., Trudgill, N. J., Thoufeeq,
M., Nicholson, B. D., Rogers, P., Docherty, J., Jenkins,
A., Morris, A. J., R
¨
osch, T., and Rutter, M. D. (2024).
Diagnostic yield from symptomatic gastroscopy in the
uk. Gut, 73(9):1421–1430.
Beg, S., Ragunath, K., Wyman, A., Banks, M., Markar,
S., Hawkey, C., Sanders, D., M
¨
onkem
¨
uller, K., Kaye,
P., and Fothergill, L. (2017). Quality standards in up-
per gastrointestinal endoscopy: a position statement of
the british society of gastroenterology (BSG) and asso-
ciation of upper gastrointestinal surgeons of great britain
and ireland (AUGIS). Gut, 66(11):1886–1899.
Cancer Research UK (2024a). Survival for oesophageal
cancer. https://www.cancerresearchuk.org/about-cancer/
oesophageal-cancer/survival, accessed: 2025-07-20.
Cancer Research UK (2024b). Survival for stomach
cancer. https://www.cancerresearchuk.org/about-cancer/
stomach-cancer/survival, accessed: 2025-07-20.
Cancer Research UK (2025). Common cancers compared.
https://www.cancerresearchuk.org/health-professional/
cancer-statistics/survival/common-cancers-compared,
accessed: 2025-07-20.
Cheng, J. (2022). Neural network assisted pathology case
identification. J. Pathol. Inform., 13:100008.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.
(2019). Bert: Pre-training of deep bidirectional trans-
formers for language understanding. In Proceedings of
NAACL-HLT 2019, pages 4171–4186.
Iyer, P. G., Sachdeva, K., Leggett, C. L., Willis, B. C.,
and Rubin, D. L. (2023). Development of electronic
health record-based machine learning models to predict
barrett’s esophagus and esophageal adenocarcinoma risk.
Clin. Transl. Gastroenterol., 14(10):e00637.
Johnson, J. M. and Khoshgoftaar, T. M. (2019). Survey on
deep learning with class imbalance. J. Big Data, 6(1):1–
54.
Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., and
Kang, J. (2020). BioBERT: a pre-trained biomedical lan-
guage representation model for biomedical text mining.
Bioinformatics, 36(4):1234–1240.
Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Doll
´
ar, P.
(2017). Focal loss for dense object detection. In Pro-
ceedings of ICCV 2017, pages 2980–2988.
Loshchilov, I. and Hutter, F. (2017). Decoupled weight de-
cay regularization. arXiv preprint arXiv:1711.05101.
Lundberg, S. M. and Lee, S.-I. (2017). A unified approach
to interpreting model predictions. Adv. Neural Inf. Pro-
cess. Syst., 30.
Niculescu-Mizil, A. and Caruana, R. (2005). Predicting
good probabilities with supervised learning. In Proceed-
ings of ICML 2005, pages 625–632.
Oliwa, T., Maron, S. B., Chase, L. M., Fiehn, O., and
Altman, R. B. (2019). Obtaining knowledge in pathol-
ogy reports through a natural language processing ap-
proach with classification, named-entity recognition, and
relation-extraction heuristics. JCO Clin. Cancer Inform.,
3:1–8.
Pan, J., Ding, S., Yang, S., Li, G., and Liu, X. (2020).
Endoscopy report mining for intelligent gastric cancer
screening. Expert Syst., 37(3):e12504.
Si, Y., Wang, J., Roberts, K., and Xu, H. (2022). Bench-
marking transformers on clinical notes classification. J.
Biomed. Inform., 127:104008.
Syed, S., Angel, A. J., Syeda, H. B., Jackson, T., and
Patel, R. (2022). The h-ANN model: comprehensive
colonoscopy concept compilation using combined con-
textual embeddings. In Proceedings of BIOSTEC 2022,
volume 5, page 189.
Wang, Z., Zheng, X., Zhang, J., and Zhang, M. (2024).
Three-branch bert-based text classification network for
gastroscopy diagnosis text. Int. J. Crowd Sci., 8(1):56–
63.
Wani, S., Yadlapati, R., Singh, S., Sawas, T., and Katzka,
D. A. (2022). Post-endoscopy esophageal neoplasia in
barrett’s esophagus: consensus statements from an in-
ternational expert panel. Gastroenterology, 162(2):366–
372.
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue,
C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz,
M., and Brew, J. (2020). Transformers: State-of-the-art
natural language processing. In Proceedings of EMNLP
2020: System Demonstrations, pages 38–45.
World Health Organization (WHO) (2025). Cancer. https:
//www.who.int/news-room/fact-sheets/detail/cancer, ac-
cessed: 2025-07-20.
KDIR 2025 - 17th International Conference on Knowledge Discovery and Information Retrieval
508