Double Trouble? Impact and Detection of Duplicates in Face Image Datasets

Torsten Schlett; Christian Rathgeb; Juan Tapia; Christoph Busch

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Double Trouble? Impact and Detection of Duplicates in Face Image Datasets

Topics: Biometrics; Image and Video Analysis and Understanding

In Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods ICPRAM - Volume 1, 801-808, 2024 , Rome, Italy

Authors: Torsten Schlett ; Christian Rathgeb ; Juan Tapia and Christoph Busch

Affiliation: da/sec - Biometrics and Security Research Group, Hochschule Darmstadt, Germany

Keyword(s): Biometrics, Face Images, Dataset Cleaning, Mislabeling, Image Hash, Face Recognition, Quality Assessment.

Abstract: Various face image datasets intended for facial biometrics research were created via web-scraping, i.e. the collection of images publicly available on the internet. This work presents an approach to detect both exactly and nearly identical face image duplicates, using file and image hashes. The approach is extended through the use of face image preprocessing. Additional steps based on face recognition and face image quality assessment models reduce false positives, and facilitate the deduplication of the face images both for intra- and inter-subject duplicate sets. The presented approach is applied to five datasets, namely LFW, TinyFace, Adience, CASIA-WebFace, and C-MS-Celeb (a cleaned MS-Celeb-1M variant). Duplicates are detected within every dataset, with hundreds to hundreds of thousands of duplicates for all except LFW. Face recognition and quality assessment experiments indicate a minor impact on the results through the duplicate removal. The final deduplication data is made av ailable at https://github.com/dasec/dataset-duplicates. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.9

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Schlett, T., Rathgeb, C., Tapia, J., Busch and C. (2024). Double Trouble? Impact and Detection of Duplicates in Face Image Datasets. In Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods - ICPRAM; ISBN 978-989-758-684-2; ISSN 2184-4313, SciTePress, pages 801-808. DOI: 10.5220/0012422500003654

@conference{icpram24,
author={Torsten Schlett and Christian Rathgeb and Juan Tapia and Christoph Busch},
title={Double Trouble? Impact and Detection of Duplicates in Face Image Datasets},
booktitle={Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods - ICPRAM},
year={2024},
pages={801-808},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012422500003654},
isbn={978-989-758-684-2},
issn={2184-4313},
}

TY - CONF

JO - Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods - ICPRAM
TI - Double Trouble? Impact and Detection of Duplicates in Face Image Datasets
SN - 978-989-758-684-2
IS - 2184-4313
AU - Schlett, T.
AU - Rathgeb, C.
AU - Tapia, J.
AU - Busch, C.
PY - 2024
SP - 801
EP - 808
DO - 10.5220/0012422500003654
PB - SciTePress