Energy Distance Correlation with Extended Bayesian Information Criteria for Feature Selection in High Dimensional Models

Energy Distance Correlation with Extended Bayesian Information Criteria for Feature Selection in High Dimensional Models
Author :
Publisher :
Total Pages : 61
Release :
ISBN-10 : OCLC:1269407857
ISBN-13 :
Rating : 4/5 ( Downloads)

Book Synopsis Energy Distance Correlation with Extended Bayesian Information Criteria for Feature Selection in High Dimensional Models by : Isaac Xoese Ocloo

Download or read book Energy Distance Correlation with Extended Bayesian Information Criteria for Feature Selection in High Dimensional Models written by Isaac Xoese Ocloo and published by . This book was released on 2021 with total page 61 pages. Available in PDF, EPUB and Kindle. Book excerpt: In this research, we investigate the sequential lasso method for feature selection in sparse high dimensional linear models. It was recently proposed by Luo and Chen (2014). In this project, wepropose a new method by introducing the energy distance correlation by Szekely et al. (2007) to replace the ordinary correlation in Luo and Chen's algorithm. We continue to adopt the extended Bayesian Information Criteria as the stopping criteria in the computing algorithm. The advantageof energy distance correlation is that it is able to detect linear and non-linear association betweentwo variables, while the ordinary correlation can detect only linear part of association between twovariables. As a result, it appears that the new method is shown to be more powerful than Luo andChen's method for feature selections. This is demonstrated by simulation studies and illustrated by two real-life examples. It is shown that the proposed new algorithm is also selection consistent. For the first part of our research we examine through simulations the model size selectionby Adaptive Lasso and SCAD after a sure screening method proposed by Li et al. (2012) usingdistance correlation is applied to the data first. We observe that the average model size selectedwas quite high. In the second part we describe the new sequential variable selection method which we call energy distance correlation with extended Bayesian Information Criteria (Edc+EBIC). At each stageof the sequential procedure we maximize the energy distance correlation between the response andeach of the predictor variables. This maximization is done such that if a variable is selected in theprevious stage, it's contribution to the response is removed so that it won't have a chance of beingselected again. The active set of selected variables is updated once a variable is selected and theEBIC of the set is calculated. The process stops if the EBIC for the current active set is greater thanthe EBIC of the previous active set. We compare the performance of Edc+EBIC with sequentialLasso, Adaptive Lasso, SCAD and SIS+SCAD. We observed that our proposed method on averagehas a positive discovery rate close to 100%, a low false discovery rate and an average model sizeas expected in our simulation set-up.


Energy Distance Correlation with Extended Bayesian Information Criteria for Feature Selection in High Dimensional Models Related Books

Energy Distance Correlation with Extended Bayesian Information Criteria for Feature Selection in High Dimensional Models
Language: en
Pages: 61
Authors: Isaac Xoese Ocloo
Categories: Mathematical statistics
Type: BOOK - Published: 2021 - Publisher:

DOWNLOAD EBOOK

In this research, we investigate the sequential lasso method for feature selection in sparse high dimensional linear models. It was recently proposed by Luo and
Statistical Inference from High Dimensional Data
Language: en
Pages: 314
Authors: Carlos Fernandez-Lozano
Categories: Science
Type: BOOK - Published: 2021-04-28 - Publisher: MDPI

DOWNLOAD EBOOK

• Real-world problems can be high-dimensional, complex, and noisy • More data does not imply more information • Different approaches deal with the so-call
Feature Selection for High-Dimensional Data
Language: en
Pages:
Authors: Verónica Bolón-Canedo
Categories:
Type: BOOK - Published: 2015 - Publisher:

DOWNLOAD EBOOK

This book offers a coherent and comprehensive approach to feature subset selection in the scope of classification problems, explaining the foundations, real app
Statistical Foundations of Data Science
Language: en
Pages: 942
Authors: Jianqing Fan
Categories: Mathematics
Type: BOOK - Published: 2020-09-21 - Publisher: CRC Press

DOWNLOAD EBOOK

Statistical Foundations of Data Science gives a thorough introduction to commonly used statistical models, contemporary statistical machine learning techniques
Regression and Time Series Model Selection
Language: en
Pages: 479
Authors: Allan D. R. McQuarrie
Categories: Mathematics
Type: BOOK - Published: 1998 - Publisher: World Scientific

DOWNLOAD EBOOK

This important book describes procedures for selecting a model from a large set of competing statistical models. It includes model selection techniques for univ