Optimization of ℓp-regularized Linear Models via Coordinate Descent

JACEK KLIMASZEWSKI,

MARCIN KORZEŃ

Abstrakt

In this paper we demonstrate, how `p-regularized univariate quadratic loss function can be effectively optimized (for 0 6 p 6 1) without approximation of penalty term and provide analytical solution for p = 1 2 . Next we adapt this approach for important multivariate cases like linear and logistic regressions, using Coordinate Descent algorithm. At the end we compare sample complexity of `1 with `p, 0 6 p < 1 regularized models for artificial and real datasets.

Słowa kluczowe: Classification, Coordinate Descent, Regression, Sparsity
References

[1] Tikhonov A.N., On the stability of inverse problems (in Russian). Doklady Akademii Nauk SSSR, 1943, 39 (5), pp. 195–198.


[2] Frank I.E., Friedman J.H., A Statistical View of Some Chemometrics Regression Tools. Technometrics, 1993, 35, pp. 109–148.


[3] Williams P.M., Bayesian Regularisation and Pruning using a Laplace Prior. Neural Computation, 1994, 7, pp. 117–143.
[4] Tibshirani R., Regression Shrinkage and Selection Via the Lasso. Journal of the Royal Statistical Society, Series B, 1996, 58, pp. 267–288.
[5] Fan J., Li R., Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. Journal of the American Statistical Association, 2001, 96, pp. 1348–1360.
[6] Mazumder R., Friedman J., Hastie T., SparseNet: Coordinate Descent With Nonconvex Penalties. Journal of the American Statistical Association, 2011, 106 (495), pp. 1125–1138.
[7] Nikolova M., Analysis of the Recovery of Edges in Images and Signals by Minimizing Nonconvex Regularized Least-Squares. Multiscale Modeling & Simulation, 2005, 4 (3), pp. 960–991.
[8] Bredies K., Lorenz D., Reiterer S., Minimization of Non-smooth, Non-convex Functionals by Iterative Thresholding. Journal of Optimization Theory and Applications, 2015, 165, pp. 78–112. [9] Moreau J.J., Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes Rendus de l’Acad´emie des Sciences (Paris), S´erie A, 1962, 255, pp. 2897–2899.
[10] Donoho D., Johnstone I., Ideal Spatial Adaptation by Wavelet Shrinkage. Biometrika, 1994, 81, pp. 425–455.
[11] Nickalls R.W.D., A New Approach to Solving the Cubic: Cardan’s Solution Revealed. The Mathematical Gazette, 1993, 77 (480), pp. 354–359.
[12] Kincaid D., Cheney W., Numerical Analysis: Mathematics of Scientific Computing. American Mathematical Society, 2002.
[13] Green P.J., Iteratively Reweighted Least Squares for Maximum Likelihood Estimation, and some Robust and Resistant Alternatives. Journal of the Royal Statistical Society. Series B (Methodological), 1984, 46 (2).
[14] Friedman J., Hastie T., Tibshirani R., Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 2010, 33 (1), pp. 1–22. [15] Golub T., Slonim D., Tamayo P., Huard C., Gaasenbeek M., Mesirov J., Coller H., Loh M., Downing J., Caligiuri M., Bloomfield C., Lander E., Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 1999, 286 (5439), pp. 531–537.
[16] Dettling M., BagBoosting for Tumor Classification with Gene Expression Data. Bioinformatics, 2004, 20 (18), pp. 3583–3593.
[17] Zou H., Hastie T., Regularization and Variable Selection via the Elastic Net.Journal of the Royal Statistical Society, Series B, 2005, 67, pp. 301–320.

Czasopismo ukazuje się w sposób ciągły on-line.
Pierwotną formą czasopisma jest wersja elektroniczna.

Wersja papierowa czasopisma dostępna na www.wuj.pl