loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Kazhan Misri 1 ; Leo Alexandre 2 and Beatriz De La Iglesia 1

Affiliations: 1 University of East Anglia, School of Computing Science, U.K. ; 2 University of East Anglia, Norwich Medical School, U.K.

Keyword(s): Transformer Models, Upper GI Cancer, Clinical Text Classification.

Abstract: Clinical free text reports from endoscopy and histology are a valuable yet underexploited source of information for supporting upper gastrointestinal (GI) cancer diagnosis. Our initial learning task was to classify procedures as cancer-positive or cancer-negative based on downstream registry-confirmed diagnoses. For this, we developed a patient-level dataset of 63,040 endoscopy reports linked with histology data and cancer registry outcomes, allowing supervised learning on real-world clinical data. We fine-tuned two transformer-based models: general-purpose BERT and domain-specific BioClinicalBERT and evaluated methods to address severe class imbalance, including random minority upsampling and class weighting. BioClinicalBERT combined with up-sampling achieved the best recall (sensitivity) of 85% and reduced false negatives compared to BERT’s recall of 78%. Calibration analysis indicated that predicted probabilities were broadly reliable. We also applied SHapley Additive exPlanations (SHAP) to interpret model decisions by highlighting influential clinical terms, fostering transparency and trust. Our findings demonstrate the potential of scalable, interpretable natural language processing models to extract clinically meaningful insights from unstructured narratives, providing a foundation for future retrospective review of cancer diagnosis and clinical decision support tools. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.166

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Misri, K., Alexandre, L. and De La Iglesia, B. (2025). From Free Text to Upper Gastrointestinal Cancer Diagnosis: Fine-Tuning Language Models on Endoscopy and Histology Narratives. In Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KDIR; ISBN ; ISSN 2184-3228, SciTePress, pages 501-508. DOI: 10.5220/0013836200004000

@conference{kdir25,
author={Kazhan Misri and Leo Alexandre and Beatriz {De La Iglesia}},
title={From Free Text to Upper Gastrointestinal Cancer Diagnosis: Fine-Tuning Language Models on Endoscopy and Histology Narratives},
booktitle={Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KDIR},
year={2025},
pages={501-508},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013836200004000},
isbn={},
issn={2184-3228},
}

TY - CONF

JO - Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KDIR
TI - From Free Text to Upper Gastrointestinal Cancer Diagnosis: Fine-Tuning Language Models on Endoscopy and Histology Narratives
SN -
IS - 2184-3228
AU - Misri, K.
AU - Alexandre, L.
AU - De La Iglesia, B.
PY - 2025
SP - 501
EP - 508
DO - 10.5220/0013836200004000
PB - SciTePress