According to our results, our model outperforms the 
four  single-modal  networks  used  for  comparison  in 
terms  of  accuracy,  sensitivity,,and  specificity.The 
clinical  information  about  patients  and  geometric 
features  of  images  play  a  role  in  improving  the 
classification  of  thyroid    tumors  and  also  validate 
the superiority of the model. 
Our study also has some limitations: 
(1) The  current  collected  and  collated  multimodal 
dataset  is  relatively  small,  and  the  performance  of 
the  model  may  be  better  if  more  samples  are 
available in the future. 
(2) In the feature fusion part, we use early fusion, 
which directly splices the three output feature   
vectors, and can try other different feature fusion 
methods. 
(3) In this study, the objective is to classify the 
benign  and  malignant  thyroid  tumors.  The  images 
input  to  the  model  are  the  parts  of  the  ultrasound 
images  that  contain  only  the  lesions,  and  it  is  still 
necessary  to  segment  the  images  according  to  the 
doctor's  labeled  images  when  collecting  and 
organizing the data in the preliminary stage. 
To  the  best  of  our  knowledge,  previous  studies 
have  shown  that  deep  learning  algorithms 
outperform  medical  professionals  in  certain  clinical 
outcomes,  however,  the  use  of  deep  learning 
approaches alone is not applicable in clinical settings 
(Ko  et  al.,  2019),  therefore,  the  main  objective  of 
this  study  is  to  assist  physicians  in  diagnosis  and 
reduce  overdiagnosis  and  overtreatment.  In  future 
studies  the  multimodal  model  will  be  further 
improved  by  expanding  the  dataset  used  in  the 
experiment  and  adding  more  different  clinical  data 
as features in the clinical information. In the feature 
fusion  part,  different  fusion  strategies  are  used  to 
compare the  effects of different fusion strategies on 
the  model  performance  so  as  to  improve  the 
performance. 
REFERENCES 
Changfa, X., Xuesi, D., He, L., Maomao, C., Dianqin, S., 
Siyi, H., . . . Wanqing, C. (2022). Cancer statistics in 
China  and  United  States,  2022:  profiles,  trends,  and 
determinants. Chinese Medical Journal.   
CHEN,  J.,  &  JIANG,  L.  (2017).  Accurate  pathological 
diagnosis  of  thyroid  cancer  in  the  era  of  precision 
medicine.  Chinese Journal of Clinical Oncology, 44 
(04), 181-185.   
Durante, C., Grani, G., Lamartina, L., Filetti, S., Mandel, S. 
J.,  &  Cooper,  D.  S.  (2018).  The  diagnosis  and 
management  of  thyroid  nodules:  a  review.  Jama, 319 
(9), 914-924.   
Fujita,  H.  (2020).  AI-based  computer-aided  diagnosis 
(AI-CAD): the latest review to read first. Radiological 
physics and technology, 13 (1), 6-19.   
Gong,  R.  (2013).  THYROID TUMOR CLASSIFICATION 
BASED ON MUTLI-MODE ULTRASOUND IMAGE 
Harbin Institute of Technology].   
Greff,  K.,  Srivastava,  R.  K.,  Koutník,  J.,  Steunebrink,  B. 
R., & Schmidhuber,  J. (2016). LSTM: A search space 
odyssey.  IEEE transactions on neural networks and 
learning systems, 28 (10), 2222-2232.   
Guang-Yuan,  Z.,  Xia-Bi,  L.,  &  Guang-Hui,  H.  (2018). 
Survey on  Medical  Image  Computer Aided  Detection 
and  Diagnosis  Systems.  Journal of software, 29  (05), 
1471-1514.   
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. 
Q. (2017). Densely connected convolutional networks. 
Proceedings  of  the  IEEE  conference  on  computer 
vision and pattern recognition,   
Huang, Y., Du, C., Xue, Z., Chen, X., Zhao, H., & Huang, 
L.  (2021).  What  makes  multi-modal  learning  better 
than  single  (provably).  Advances in Neural 
Information Processing Systems, 34, 10944-10956.   
Jeong, E. Y., Kim, H. L., Ha, E. J., Park, S. Y., Cho, Y. J., 
&  Han,  M.  (2019).  Computer-aided  diagnosis  system 
for  thyroid  nodules  on  ultrasonography:  diagnostic 
performance  and  reproducibility  based  on  the 
experience level of operators. European radiology, 29 
(4), 1978-1985.   
Juan-Xiu, T., Guo-Cai, L.,  Shan-Shan,  G.,  Zhong-Jian, J., 
Jin-Guang,  L.,  &  Dong-Dong,  G.  (2018).  Deep 
Learning  in  Medical  Image  Analysis  and  Its 
Challenges. Acta automatica Sinica, 44 (03), 401-424.   
Ko, S. Y., Lee, J. H., Yoon, J. H., Na, H., Hong, E.,  Han, 
K., . . . Park, V. Y. (2019). Deep convolutional neural 
network  for  the  diagnosis  of  thyroid  nodules  on 
ultrasound. Head & neck, 41 (4), 885-891.   
Kootte, R. S., Levin, E., Salojärvi, J., Smits, L. P., Hartstra, 
A.  V.,  Udayappan,  S.  D.,  .  .  .  Holst,  J.  J.  (2017). 
Improvement  of  insulin  sensitivity  after  lean  donor 
feces  in  metabolic  syndrome  is  driven  by  baseline 
intestinal microbiota composition. Cell metabolism,
 26 
(4), 611-619. e616.   
Li, X., Zhang, S., Zhang, Q., Wei, X., Pan, Y., Zhao, J., . . . 
Li,  J.  (2019).  Diagnosis  of  thyroid  cancer  using  deep 
convolutional  neural  network  models  applied  to 
sonographic  images:  a  retrospective,  multicohort, 
diagnostic  study.  The Lancet Oncology, 20 (2), 
193-201.   
Peng, S., Liu, Y., Lv, W., Liu, L., Zhou, Q., Yang, H., . . . 
Zhang,  X.  (2021).  Deep  learning-based  artificial