BornFS: Feature Selection with Balanced Relevance and Nuisance and Its Application to Very Large Datasets

Kilho Shin, Chris Liu, Katsuyuki Maeda, Hiroaki Ohshima

2024

Abstract

In feature selection, we grapple with two primary challenges: devising effective evaluative indices for selected feature subsets and crafting scalable algorithms rooted in these indices. Our study addresses both. Beyond assessing the size and class relevance of selected features, we introduce a groundbreaking index, nuisance. It captures class-uncorrelated information, which can muddy subsequent processes. Our experiments confirm that a harmonious balance between class relevance and nuisance augments classification accuracy. To this end, we present the Balance-Optimized Relevance and Nuisance Feature Selection (BornFS) algorithm. It not only exhibits scalability to handle large datasets but also outperforms traditional methods by achieving better balance among the introduced indices. Notably, when applied to a dataset of 800,000 Windows executables, using LCC as a preprocessing filter, BornFS slashes the feature count from 10 million to under 200, maintaining a high accuracy in malware detection. Our findings shine a light on feature selection’s complexities and pave the way forward.

Download


Paper Citation


in Harvard Style

Shin K., Liu C., Maeda K. and Ohshima H. (2024). BornFS: Feature Selection with Balanced Relevance and Nuisance and Its Application to Very Large Datasets. In Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART; ISBN 978-989-758-680-4, SciTePress, pages 1100-1107. DOI: 10.5220/0012436000003636


in Bibtex Style

@conference{icaart24,
author={Kilho Shin and Chris Liu and Katsuyuki Maeda and Hiroaki Ohshima},
title={BornFS: Feature Selection with Balanced Relevance and Nuisance and Its Application to Very Large Datasets},
booktitle={Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART},
year={2024},
pages={1100-1107},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012436000003636},
isbn={978-989-758-680-4},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART
TI - BornFS: Feature Selection with Balanced Relevance and Nuisance and Its Application to Very Large Datasets
SN - 978-989-758-680-4
AU - Shin K.
AU - Liu C.
AU - Maeda K.
AU - Ohshima H.
PY - 2024
SP - 1100
EP - 1107
DO - 10.5220/0012436000003636
PB - SciTePress