Energy Distance Correlation with Extended Bayesian Information Criteria for Feature Selection in High Dimensional Models

Energy Distance Correlation with Extended Bayesian Information Criteria for Feature Selection in High Dimensional Models
Author :
Publisher :
Total Pages : 61
Release :
ISBN-10 : OCLC:1269407857
ISBN-13 :
Rating : 4/5 ( Downloads)

Book Synopsis Energy Distance Correlation with Extended Bayesian Information Criteria for Feature Selection in High Dimensional Models by : Isaac Xoese Ocloo

Download or read book Energy Distance Correlation with Extended Bayesian Information Criteria for Feature Selection in High Dimensional Models written by Isaac Xoese Ocloo and published by . This book was released on 2021 with total page 61 pages. Available in PDF, EPUB and Kindle. Book excerpt: In this research, we investigate the sequential lasso method for feature selection in sparse high dimensional linear models. It was recently proposed by Luo and Chen (2014). In this project, wepropose a new method by introducing the energy distance correlation by Szekely et al. (2007) to replace the ordinary correlation in Luo and Chen's algorithm. We continue to adopt the extended Bayesian Information Criteria as the stopping criteria in the computing algorithm. The advantageof energy distance correlation is that it is able to detect linear and non-linear association betweentwo variables, while the ordinary correlation can detect only linear part of association between twovariables. As a result, it appears that the new method is shown to be more powerful than Luo andChen's method for feature selections. This is demonstrated by simulation studies and illustrated by two real-life examples. It is shown that the proposed new algorithm is also selection consistent. For the first part of our research we examine through simulations the model size selectionby Adaptive Lasso and SCAD after a sure screening method proposed by Li et al. (2012) usingdistance correlation is applied to the data first. We observe that the average model size selectedwas quite high. In the second part we describe the new sequential variable selection method which we call energy distance correlation with extended Bayesian Information Criteria (Edc+EBIC). At each stageof the sequential procedure we maximize the energy distance correlation between the response andeach of the predictor variables. This maximization is done such that if a variable is selected in theprevious stage, it's contribution to the response is removed so that it won't have a chance of beingselected again. The active set of selected variables is updated once a variable is selected and theEBIC of the set is calculated. The process stops if the EBIC for the current active set is greater thanthe EBIC of the previous active set. We compare the performance of Edc+EBIC with sequentialLasso, Adaptive Lasso, SCAD and SIS+SCAD. We observed that our proposed method on averagehas a positive discovery rate close to 100%, a low false discovery rate and an average model sizeas expected in our simulation set-up.


Energy Distance Correlation with Extended Bayesian Information Criteria for Feature Selection in High Dimensional Models Related Books