November 13, 2017 - 1:10pm
Rm. C105, MBA Class of 1986 Building in the Stanford Graduate School of Business
We consider the problem of variable selection based on $n$ observations from a high-dimensional linear regression model. The unknown parameter of the model is assumed to belong to the class $S$ of all $s$-sparse vectors in $R^p$ whose non-zero components are greater than $a > 0$. Variable selection in this context is an extensively studied problem and various methods of recovering sparsity pattern have been suggested. However, in the theory not much is known beyond the consistency of selection. For Gaussian design, which is of major importance in the context of compressed sensing, necessary and sufficient conditions of consistency for some configurations of $n,p,s,a$ are available. They are known to be achieved by the exhaustive search selector, which is not realizable in polynomial time and requires the knowledge of $s$.
This talk will focus on the issue of optimality in variable selection based on the Hamming risk criterion. We first consider a general setting of variable selection problem and we derive the explicit expression for the minimax Hamming risk on $S$. Then, we specify it for the Gaussian sequence model and for high-dimensional linear regression with Gaussian design. In the latter model, we propose an adaptive algorithm independent of $s,a$, and of the noise level that nearly attains the value of the minimax risk. This algorithm is the first method, which is both realizable in polynomial time and is consistent under almost the same (minimal) sufficient conditions as the exhaustive search selector. This talk is based on a joint work with C.Butucea, M.Ndaoud and N.Stepanova.
Institute for Research in the Social Sciences and Graduate School of Business