Spectral Feature Selection for Data Mining

Chapman and Hall/CRC Press, 2011

by

Work in progress

go here for the old version: https://web.archive.org/web/20190523234113/http://dmml.asu.edu/sfs/



If you need help email: (faalataw at asu . edu)

About the Book

Spectral Feature Selection for Data Mining introduces a novel feature selection technique that establishes a general platform for studying existing feature selection algorithms and developing new algorithms for emerging problems in real-world applications. This technique represents a unified framework for supervised, unsupervised, and semisupervised feature selection.

The book explores the latest research achievements, sheds light on new research directions, and stimulates readers to make the next creative breakthroughs. It presents the intrinsic ideas behind spectral feature selection, its theoretical foundations, its connections to other algorithms, and its use in handling both large-scale data sets and small sample problems. The authors also cover feature selection and feature extraction, including basic concepts, popular existing algorithms, and applications.

A timely introduction to spectral feature selection, this book illustrates the potential of this powerful dimensionality reduction technique in high-dimensional data processing. Readers learn how to use spectral feature selection to solve challenging problems in real-life applications and discover how general feature selection and extraction are connected to spectral feature selection.

Features

  • Presents the principles of spectral feature selection, a new technique that addresses the challenges of high-dimensional data

  • Describes new techniques for high-performance parallel feature selection and multisource feature selection

  • Covers existing dimensionality reduction methods

  • Requires only some basic knowledge of linear algebra, probability theory, and convex optimization

  • Includes concrete examples and figures in each chapter

Table of content

Preface

Authors

Symbol Description


1. Data of High Dimensionality and Challenges

1.1 Dimensionality Reduction Techniques

1.2 Feature Selection for Data Mining

1.2.1 A General Formulation for Feature Selection

1.2.2 Feature Selection in a Learning Process

1.2.3 Categories of Feature Selection Algorithms

1.2.4 Challenges in Feature Selection Research

1.3 Spectral Feature Selection

1.4 Organization of the Book


2. Univariate Formulations for Spectral Feature Selection

2.1 Modeling Target Concept via Similarity Matrix

2.2 The Laplacian Matrix of a Graph

2.3 Evaluating Features on the Graph

2.4 An Extension for Feature Ranking Functions

2.5 Spectral Feature Selection via Ranking

2.5.1 SPEC for Unsupervised Learning

2.5.2 SPEC for Supervised Learning

2.5.3 SPEC for Semi-Supervised Learning

2.5.4 Time Complexity of SPEC

2.6 Robustness Analysis for SPEC

2.7 Discussions


3. Multivariate Formulations for Spectral Feature Selection

3.1 The Similarity Preserving Nature of SPEC

3.2 A Sparse Multi-Output Regression Formulation

3.3 Solving the L21-regularized Regression Problem

3.3.1 The Coordinate Gradient Descent Method (CGD)

3.3.2 The Accelerated Gradient Descent Method (AGD)

3.4 Efficient Multivariate Spectral Feature Selection

3.5 A Formulation Based on Matrix Comparison

3.6 Feature Selection with Proposed Formulations


4. Connections to Existing Algorithms

4.1 Connections to Existing Feature Selection Algorithms

4.1.1 Laplacian Score

4.1.2 Fisher Score

4.1.3 Relief & ReliefF

4.1.4 Trace Ratio Criterion

4.1.5 Hilbert-Schmidt Independence Criterion (HSIC)

4.1.6 A Summary of the Equivalence Relationships

4.2 Connections to Other Learning Models

4.2.1 Linear Discriminant Analysis

4.2.2 Least Square Support Vector Machine

4.2.3 Principal Component Analysis

4.2.4 Simultaneous Feature Selection and Feature Extraction

4.3 An Experimental Study of the Algorithms

4.3.1 A Study of the Supervised Case

4.3.2 A Study of the Unsupervised Case

4.4 Discussions


5. Large-Scale Spectral Feature Selection

5.1 Data Partitioning for Parallel Processing

5.2 MPI for Distributed Parallel Computing

5.2.1 MPI_BCAST

5.2.2 MPI_SCATTER

5.2.3 MPI_REDUCE

5.3 Parallel Spectral Feature Selection

5.3.1 Major Computation Steps of Univariate Formulations

5.3.2 Major Computation Steps of Multivariate Formulations

5.4 Computing the Similarity Matrix in Parallel

5.4.1 Computing the Sample Similarity

5.4.2 Inducing Sparsity

5.4.3 Enforcing Symmetry

5.5 Parallelization of the Univariate Formulations

5.6 Parallel MRSF

5.6.1 Initializing the Active Set

5.6.2 Computing the Tentative Solution

5.6.3 Computing the Optimal Solution

5.6.4 Checking the Global Optimality

5.6.5 Summary

5.7 Parallel MCSF

5.8 Discussions


6. Multi-source Spectral Feature Selection

6.1 Categorization of Different Types of Knowledge

6.2 A Framework Based on Combining Similarity Matrices

6.2.1 Knowledge Conversion

6.2.2 MSFS: The Framework

6.3 A Framework Based on Rank Aggregation

6.3.1 Handling Knowledge in KOFS

6.3.2 Ranking Using Internal Knowledge

6.3.3 Aggregating Feature Ranking Lists

6.4 Experimental Results

6.4.1 Data and Knowledge Sources

6.4.2 Experiment Setup

6.4.3 Performance Evaluation

6.4.4 Empirical Findings

6.4.5 Discussion of Biological Relevance

6.5 Discussions


Bibliography

Index

Getting the Book

Please feel free to contact the authors if you find any good, bad or ugly in the book.

Supplementary Materials

No Supplementary Materials provided by the author

Errata

This file lists the errata for the book. Please feel free to contact me if you find any error in the book.

Further Reading and Related Books

  • This file lists the references mentioned in the book.

  • Huan Liu's edited book: Computational Methods of Feature Selection, introducing a collection of the most successful algorithms for feature selection

  • Huan Liu's book: Feature Selection for Knowledge Discovery and Data Mining, presenting the fundamentals of feature selection

  • Oleg Okun's edited book: Feature Selection and Ensemble Methods for Bioinformatics: Algorithmic Classification and Implementations, focusing more on applying feature selection for gene selection