Authors:
Md Abu Ahammed Babu
1
;
2
;
Sushant Kumar Pandey
3
;
Darko Durisic
2
;
Ashok Chaitanya Koppisetty
2
and
Miroslaw Staron
1
Affiliations:
1
Department of Computer Science and Engineering, University of Gothenburg and Chalmers University of Technology, Gothenburg, Sweden
;
2
Research and Development, Volvo Car Corporation, Gothenburg, Sweden
;
3
Computer Science and Artificial Intelligence, University of Groningen, Groningen, The Netherlands
Keyword(s):
Data Leakage Detection, Object Detection, YOLOv7, Cirrus, Kitti, Automotive Perception Systems.
Abstract:
Data leakage is a very common problem that is often overlooked during splitting data into train and test sets before training any ML/DL model. The model performance gets artificially inflated with the presence of data leakage during the evaluation phase which often leads the model to erroneous prediction on real-time deployment. However, detecting the presence of such leakage is challenging, particularly in the object detection context of perception systems where the model needs to be supplied with image data for training. In this study, we conduct a computational experiment to develop a method for detecting data leakage. We then conducted an initial evaluation of the method as a first step on a public dataset, “Kitti”, which is a popular and widely accepted benchmark dataset in the automotive domain. The evaluation results show that our proposed D-LeDe method are able to successfully detect potential data leakage caused by image similarity. A further validation was also provided to
justify the evaluation outcome by conducting pair-wise image similarity analysis using perceptual hash (pHash) distance.
(More)