INFORMATION GAIN FEATURE SELECTION BASED ON FEATURE INTERACTIONS

Date

2013-12

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Analyzing high-dimensional data stands as a great challenge in machine learning. In order to deal with the curse of dimensionality, many effective and efficient feature-selection algorithms have been developed recently. However, most feature-selection algorithms assume independence of features; they identify relevant features mainly on their individual high correlation with the target concept. These algorithms can have good performance when the assumption of feature independence is true. But they may perform poorly in domains where there exist feature interactions. Due to the existence of feature interactions, a single feature with little correlation with the target concept can be in fact highly correlated when looked together with other features. Removal of these features can harm the performance of the classification model severely.

In this thesis, we first present a general view of feature interaction. We formally define feature interaction in terms of information theory. We propose a practical algorithm to identify feature interactions and perform feature selection based on the identified feature interactions. After that, we compare the performance of our algorithm with some well-known feature selection algorithms that assume feature independence. By comparison, we show that by taking feature interactions into account, our feature selection algorithm can achieve better performance in datasets where interactions abound.

Description

Keywords

Feature selection, Machine learning, Feature interaction, Information gain

Citation