Spectral Feature Selection for Data Mining

Chapman and Hall/CRC Press, 2011

by

Zheng Zhao, SAS Institute

Huan Liu, Arizona State University

book cover

About the Book

Spectral Feature Selection for Data Mining introduces a novel feature selection technique that establishes a general platform for studying existing feature selection algorithms and developing new algorithms for emerging problems in real-world applications. This technique represents a unified framework for supervised, unsupervised, and semisupervised feature selection.

The book explores the latest research achievements, sheds light on new research directions, and stimulates readers to make the next creative breakthroughs. It presents the intrinsic ideas behind spectral feature selection, its theoretical foundations, its connections to other algorithms, and its use in handling both large-scale data sets and small sample problems. The authors also cover feature selection and feature extraction, including basic concepts, popular existing algorithms, and applications.

A timely introduction to spectral feature selection, this book illustrates the potential of this powerful dimensionality reduction technique in high-dimensional data processing. Readers learn how to use spectral feature selection to solve challenging problems in real-life applications and discover how general feature selection and extraction are connected to spectral feature selection.

Features

Presents the principles of spectral feature selection, a new technique that addresses the challenges of high-dimensional data

Describes new techniques for high-performance parallel feature selection and multisource feature selection

Covers existing dimensionality reduction methods

Requires only some basic knowledge of linear algebra, probability theory, and convex optimization

Includes concrete examples and figures in each chapter


Getting the Book

  • CRC Press
    Use Promotion Code 457GM to save 20% off your purchase today
  • Amazon
Please feel free to contact us if you have found any "good", "bad" or "ugly" in the book.

Supplementary Materials

Here provides some supplementary materials, including figures, source code for the algorithm and some toy data sets in matlab format.

 

Chapter Resource
1. Data of High Dimensionality and Challenges ch1
2. Univariate Formulations for Spectral Feature Selection ch2
3. Multivariate Formulations for Spectral Feature Selection ch3
4. Connections to Existing Algorithms ch4
5. Large-Scale Spectral Feature Selection ch5
6. Multi-source Spectral Feature Selection ch6
Bibliography references


Errata

This file lists the errata for the book. Please feel free to contact me if you find any error in the book.


Further Reading and Related Books


Table of Contents

Preface
Authors
Symbol Description

1. Data of High Dimensionality and Challenges
    1.1 Dimensionality Reduction Techniques
    1.2 Feature Selection for Data Mining
         1.2.1 A General Formulation for Feature Selection
         1.2.2 Feature Selection in a Learning Process
         1.2.3 Categories of Feature Selection Algorithms
         1.2.4 Challenges in Feature Selection Research
    1.3 Spectral Feature Selection
    1.4 Organization of the Book

2. Univariate Formulations for Spectral Feature Selection
    2.1 Modeling Target Concept via Similarity Matrix
    2.2 The Laplacian Matrix of a Graph
    2.3 Evaluating Features on the Graph
    2.4 An Extension for Feature Ranking Functions
    2.5 Spectral Feature Selection via Ranking
         2.5.1 SPEC for Unsupervised Learning
         2.5.2 SPEC for Supervised Learning
         2.5.3 SPEC for Semi-Supervised Learning
         2.5.4 Time Complexity of SPEC
    2.6 Robustness Analysis for SPEC
    2.7 Discussions

3. Multivariate Formulations for Spectral Feature Selection
    3.1 The Similarity Preserving Nature of SPEC
    3.2 A Sparse Multi-Output Regression Formulation
    3.3 Solving the L21-regularized Regression Problem
         3.3.1 The Coordinate Gradient Descent Method (CGD)
         3.3.2 The Accelerated Gradient Descent Method (AGD)
    3.4 Efficient Multivariate Spectral Feature Selection
    3.5 A Formulation Based on Matrix Comparison
    3.6 Feature Selection with Proposed Formulations

4. Connections to Existing Algorithms
    4.1 Connections to Existing Feature Selection Algorithms
         4.1.1 Laplacian Score
         4.1.2 Fisher Score
         4.1.3 Relief & ReliefF
         4.1.4 Trace Ratio Criterion
         4.1.5 Hilbert-Schmidt Independence Criterion (HSIC)
         4.1.6 A Summary of the Equivalence Relationships
    4.2 Connections to Other Learning Models
         4.2.1 Linear Discriminant Analysis
         4.2.2 Least Square Support Vector Machine
         4.2.3 Principal Component Analysis
         4.2.4 Simultaneous Feature Selection and Feature Extraction
    4.3 An Experimental Study of the Algorithms
         4.3.1 A Study of the Supervised Case
         4.3.2 A Study of the Unsupervised Case
    4.4 Discussions

5. Large-Scale Spectral Feature Selection
    5.1 Data Partitioning for Parallel Processing
    5.2 MPI for Distributed Parallel Computing
         5.2.1 MPI_BCAST
         5.2.2 MPI_SCATTER
         5.2.3 MPI_REDUCE
    5.3 Parallel Spectral Feature Selection
         5.3.1 Major Computation Steps of Univariate Formulations
         5.3.2 Major Computation Steps of Multivariate Formulations
    5.4 Computing the Similarity Matrix in Parallel
         5.4.1 Computing the Sample Similarity
         5.4.2 Inducing Sparsity
         5.4.3 Enforcing Symmetry
    5.5 Parallelization of the Univariate Formulations
    5.6 Parallel MRSF
         5.6.1 Initializing the Active Set
         5.6.2 Computing the Tentative Solution
         5.6.3 Computing the Optimal Solution
         5.6.4 Checking the Global Optimality
         5.6.5 Summary
    5.7 Parallel MCSF
    5.8 Discussions

6. Multi-source Spectral Feature Selection
    6.1 Categorization of Different Types of Knowledge
    6.2 A Framework Based on Combining Similarity Matrices
         6.2.1 Knowledge Conversion
         6.2.2 MSFS: The Framework
    6.3 A Framework Based on Rank Aggregation
         6.3.1 Handling Knowledge in KOFS
         6.3.2 Ranking Using Internal Knowledge
         6.3.3 Aggregating Feature Ranking Lists
    6.4 Experimental Results
         6.4.1 Data and Knowledge Sources
         6.4.2 Experiment Setup
         6.4.3 Performance Evaluation
         6.4.4 Empirical Findings
         6.4.5 Discussion of Biological Relevance
    6.5 Discussions

Bibliography
Index


      Hit Counter by Digits