Guidotti,  Riccardo,  Anna  Monreale,  Salvatore  Ruggieri, 
Franco Turini, Fosca Giannotti, and Dino Pedreschi. "A 
survey  of  methods  for  explaining  black  box 
models." ACM computing surveys  (CSUR) 51,  no.  5 
(2018): 1-42. 
Gundersen,  O.  E.  (2020).  The  Reproducibility  Crisis  Is 
Real. AI Magazine, 41(3), 103-106. 
Johnson,  A.  E.,  Pollard,  T.  J.,  &  Mark,  R.  G.  (2017, 
November). Reproducibility in critical care: a mortality 
prediction  case  study.  In Machine Learning for 
Healthcare Conference (pp. 361-376). 
Kim, A. A., Zaim, S. R., & Subbian, V. (2020). Assessing 
Reproducibility and Veracity across Machine Learning 
Techniques in Biomedicine: A Case Study using TCGA 
Data.  International Journal of Medical Informatics, 
104148. 
Liaw,  A.,  &  Wiener,  M.  (2002).  Classification  and 
regression by Random Forest. R news, 2(3), 18-22. 
Lipton, Z. C. (2018). The mythos of model interpretability. 
Queue, 16(3), 31-57. 
Liu, Y., Chen, P. H. C., Krause, J., & Peng, L. (2019). How 
to read articles that use machine learning: users’ guides 
to the medical literature. Jama, 322(18), 1806-1816. 
Liu,  X.,  Rivera,  S.  C.,  Moher,  D.,  Calvert,  M.  J.,  & 
Denniston,  A.  K.  (2020).  Reporting  guidelines  for 
clinical trial reports for  interventions  involving 
artificial  intelligence:  the  CONSORT-AI  extension. 
bmj, 370. 
Lundberg, S. M., & Lee, S. I. (2017). A unified approach to 
interpreting  model  predictions. In Advances in neural 
information processing systems (pp. 4765-4774). 
Luo,  Yen-Fu,  and  Anna  Rumshisky.  "Interpretable  topic 
features  for  post-icu  mortality  prediction."  In AMIA 
Annual  Symposium  Proceedings,  vol.  2016,  p.  827. 
American Medical Informatics Association, 2016. 
Luo W, Phung D, Tran T, Gupta S, Rana S, Karmakar C, 
Shilton  A,  Yearwood  J,  Dimitrova  N,  Ho  TB, 
Venkatesh S. Guidelines for developing and reporting 
machine  learning  predictive  models  in  biomedical 
research: a multidisciplinary view. Journal of medical 
Internet research. 2016;18(12):e323. 
Michalski, R. S., "A Theory and Methodology of Inductive 
Learning," Chapter in the book, Machine Learning: An 
Artificial Intelligence Approach, R. S. Michalski, T. J. 
Carbonell  and  T.  M.  Mitchell  (Eds.),  pp.  83-134, 
TIOGA Publishing Co., Palo Alto, 1983. 
Michalski,  R.  S., "ATTRIBUTIONAL  CALCULUS:  A 
Logic  and  Representation  Language  for  Natural 
Induction," Reports of the Machine Learning and 
Inference Laboratory,  MLI  04-2,  George  Mason 
University, Fairfax, VA, April, 2004. 
Michalski, R. S. and Wojtusiak, J., "Semantic and Syntactic 
Attribute  Types  in  AQ  Learning," Reports of the 
Machine Learning and Inference Laboratory, MLI 07-
1, George Mason University, Fairfax, VA, 2007. 
Moher,  D.,  Hopewell,  S.,  Schulz,  K.  F.,  &  Montori,  V. 
(2010).  G?  tzsche,  PC;  Devereaux,  PJ;  Elbourne,  D; 
Egger, M; Altman, DG; CONSORT 2010 explanation 
and  elaboration:  updated  guidelines  for  reporting 
parallel group randomised trials. BMJ, 340, c869. 
Morgan, S.L. and Winship C., Counterfactuals and Causal 
Inference: Methods and Principles for Social Research, 
2nd Edition, Cambridge University Press, 2015. 
Pearl, J. Causality, Cambridge University Press, 2000. 
Pearl, J. (2019). The seven tools of causal inference, with 
reflections on machine learning. Communications of the 
ACM, 62(3), 54–60. https://doi.org/10.1145/3241036 
Pineau,  J.,  Vincent-Lamarre, P.,  Sinha,  K.,  Larivière,  V., 
Beygelzimer, A., d'Alché-Buc, F., ... & Larochelle, H. 
(2020).  Improving  Reproducibility  in  Machine 
Learning Research (A Report from the NeurIPS 2019 
Reproducibility  Program).  arXiv preprint 
arXiv:2003.12206. 
Renard,  F.,  Guedria,  S.,  De  Palma,  N.,  &  Vuillerme,  N. 
(2020). Variability and reproducibility in deep learning 
for  medical  image  segmentation.  Scientific Reports, 
10(1), 1-16. 
Ribeiro,  Marco  Tulio,  Singh,  Sameer,  and  Guestrin, 
Carlos.“why  should  I  trust  you?”:  Explaining  the 
predictions  of  any  classifier.  In Knowledge discovery 
and Data Mining (KDD), 2016. 
Sciikit-learn website, Probability Calibration: https://scikit-
learn.org/stable/modules/calibration.html 
Stevens, L. M., Mortazavi, B. J., Deo, R. C., Curtis, L., & 
Kao,  D.  P.  (2020).  Recommendations  for  reporting 
machine  learning  analyses  in  clinical  research. 
Circulation: Cardiovascular Quality and Outcomes, 
CIRCOUTCOMES-120. 
Tonekaboni,  S.,  Joshi,  S.,  McCradden,  M.  D.,  & 
Goldenberg,  A.  (2019).  What  clinicians  want: 
contextualizing  explainable  machine  learning  for 
clinical end use. arXiv preprint arXiv:1905.05134. 
Vollmer, S., Mateen, B. A., Bohner, G., Király, F. J., Ghani, 
R.,  Jonsson,  P.,  ...  &  Granger,  D.  (2020).  Machine 
learning and artificial intelligence research for patient 
benefit:  20  critical  questions  on  transparency, 
replicability, ethics, and effectiveness. bmj, 368. 
Wicks, P., Liu, X., & Denniston, A. K. (2020). Going on up 
to the SPIRIT in AI: will new reporting guidelines for 
clinical trials of AI interventions improve their rigour?. 
BMC medicine, 18(1), 1-3. 
Wojtusiak,  J.,  Michalski,  R.  S.,  Kaufman,  K.  and 
Pietrzykowski,  J., "Multitype  Pattern  Discovery  Via 
AQ21: A Brief Description of the Method and Its Novel 
Features," Reports of the Machine Learning and 
Inference Laboratory,  MLI  06-2,  George  Mason 
University, Fairfax, VA, June, 2006. 
Wojtusiak,  J.,  Elashkar,  E.  and  Mogharab  Nia,  R., "C-
LACE2: computational risk assessment tool for 30-day 
post  hospital  discharge  mortality," Health and 
Technology, Springer, 2018. 
Yu, K. H., Lee, T. L. M., Yen, M. H., Kou, S. C., Rosen, 
B., Chiang, J. H., & Kohane, I. S. (2020). Reproducible 
Machine Learning Methods for Lung Cancer Detection 
Using  Computed  Tomography  Images:  Algorithm 
Development  and  Validation.  Journal of medical 
Internet research, 22(8), e16709.