A Mini-Batch Proximal Stochastic Recursive Gradient Algorithm with Diagonal Barzilai–Borwein Stepsize

doi:10.1007/s40305-022-00436-2

Journal of the Operations Research Society of China ›› 2023, Vol. 11 ›› Issue (2): 277-307.doi: 10.1007/s40305-022-00436-2

• Special Issue: Machine Learning and Optimization Algorithm • Previous Articles Next Articles

A Mini-Batch Proximal Stochastic Recursive Gradient Algorithm with Diagonal Barzilai–Borwein Stepsize

Teng-Teng Yu^1,3, Xin-Wei Liu², Yu-Hong Dai³, Jie Sun^2,4

1 School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China;
2 Institute of Mathematics, Hebei University of Technology, Tianjin 300401, China;
3 LSEC, ICMSEC, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China;
4 School of Business, National University of Singapore, Singapore 119245, Singapore

Received:2021-11-13 Revised:2022-05-06 Online:2023-06-30 Published:2023-05-24
Contact: Xin-Wei Liu, Teng-Teng Yu, Yu-Hong Dai, Jie Sun E-mail:mathlxw@hebut.edu.cn;ytt2021@lsec.cc.ac.cn;dyh@lsec.cc.ac.cn;jsun@nus.edu.sg
Supported by:
the National Natural Science Foundation of China ( Nos. 11671116, 11701137, 12071108, 11991020, 11991021 and 12021001), the Major Research Plan of the NSFC (No. 91630202), the Strategic Priority Research Program of Chinese Academy of Sciences (No. XDA27000000), and the Natural Science Foundation of Hebei Province (No. A2021202010)

Abstract

Abstract: Many machine learning problems can be formulated as minimizing the sum of a function and a non-smooth regularization term. Proximal stochastic gradient methods are popular for solving such composite optimization problems. We propose a minibatch proximal stochastic recursive gradient algorithm SRG-DBB, which incorporates the diagonal Barzilai–Borwein (DBB) stepsize strategy to capture the local geometry of the problem. The linear convergence and complexity of SRG-DBB are analyzed for strongly convex functions. We further establish the linear convergence of SRGDBB under the non-strong convexity condition. Moreover, it is proved that SRG-DBB converges sublinearly in the convex case. Numerical experiments on standard data sets indicate that the performance of SRG-DBB is better than or comparable to the proximal stochastic recursive gradient algorithm with best-tuned scalar stepsizes or BB stepsizes. Furthermore, SRG-DBB is superior to some advanced mini-batch proximal stochastic gradient methods.

Key words: Stochastic recursive gradient, Proximal gradient algorithm, Barzilai–Borwein method, Composite optimization

CLC Number:

Teng-Teng Yu, Xin-Wei Liu, Yu-Hong Dai, Jie Sun. A Mini-Batch Proximal Stochastic Recursive Gradient Algorithm with Diagonal Barzilai–Borwein Stepsize[J]. Journal of the Operations Research Society of China, 2023, 11(2): 277-307.

TrendMD

References

[1] Bottou, L., Curtis, F.E., Nocedal, J.:Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223-311(2018)
[2] Shalev-Shwartz, S., Ben-David, S.:Understanding Machine Learning:From theory to algorithms. Cambridge University Press, NY, USA (2014)
[3] Sra, S., Nowozin, S., Wright, S.J.:Optimization for Machine Learning. MIT Press, Cambridge, London, England (2012)
[4] Hastie, T., Tibshirani, R., Friedman, J.:The Elements of Statistical Learning:Data Mining, Inference, and Prediction. Springer, NY, USA (2009)
[5] Recht, B., Ré, C.:Parallel stochastic gradient algorithms for large-scale matrix completion. Math. Prog. Comp. 5(2), 201-226(2013)
[6] Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.:Deep Learning. MIT Press, Cambridge, London, England (2016)
[7] Li, X.L.:Preconditioned stochastic gradient descent. IEEE T. Neur. Net. Lear. 29(5), 1454-1466(2017)
[8] Zhang, S., Choromanska, A.E., LeCun, Y.:Deep learning with elastic averaging SGD. In:Advances in Neural Information Processing Systems, pp. 685-693(2015)
[9] Jin, X.B., Zhang, X.Y., Huang, K., Geng, G.G.:Stochastic conjugate gradient algorithm with variance reduction. IEEE T. Neur. Net. Lear. 30(5), 1360-1369(2018)
[10] Robbins, H., Monro, S.:A stochastic approximation method. Ann. Math. Stat. 22(3), 400-407(1951)
[11] Roux, N.L., Schmidt, M., Bach, F.R.:A stochastic gradient method with an exponential convergence rate for finite training sets. In:Advances in Neural Information Processing Systems, pp. 2663-2671(2012)
[12] Schmidt, M., Le Roux, N., Bach, F.:Minimizing finite sums with the stochastic average gradient. Math. Program. 162(1-2), 83-112(2017)
[13] Johnson, R., Zhang, T.:Accelerating stochastic gradient descent using predictive variance reduction. In:Advances in Neural Information Processing Systems, pp. 315-323(2013)
[14] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.:SARAH:A novel method for machine learning problems using stochastic recursive gradient. In:Proceedings of the 34th international conference on machine, pp. 2613-2621(2017)
[15] Xiao, L., Zhang, T.:A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24(4), 2057-2075(2014)
[16] Konečnỳ, J., Richtárik, P.:Semi-stochastic gradient descent methods. Front. Appl. Math. Stat. 3(9), 1-14(2017)
[17] Konečnỳ, J., Liu, J., Richtárik, P., Takáč, M.:Mini-batch semi-stochastic gradient descent in the proximal setting. IEEE JSTSP 10(2), 242-255(2015)
[18] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.:ProxSARAH:an efficient algorithmic framework for stochastic composite nonconvex optimization. J. Mach. Learn. Res. 21(110), 1-48(2020)
[19] Barzilai, J., Borwein, J.M.:Two-point step size gradient methods. IMA J. Numer. Anal. 8(1), 141-148(1988)
[20] Dai, Y.H., Huang, Y., Liu, X.W.:A family of spectral gradient methods for optimization. Comput. Optim. Appl. 74(1), 43-65(2019)
[21] Bai, J., Hager, W.W., Zhang, H.:An inexact accelerated stochastic ADMM for separable convex optimization. Comput. Optim. Appl. 81(1), 479-518(2022)
[22] Fletcher, R.:On the Barzilai-Borwein method. In:Qi, L., Teo, K., Yang, X. (eds.) Optimization and Control with Applications, vol. 96, pp. 235-256. Springer, Boston, USA (2005)
[23] Tan, C., Ma, S., Dai, Y.H., Qian, Y.:Barzilai-Borwein step size for stochastic gradient descent. In:Advances in Neural Information Processing Systems, pp. 685-693(2016)
[24] Liu, Y., Wang, X., Guo, T.:A linearly convergent stochastic recursive gradient method for convex optimization. Optim. Lett. 14, 2265-2283(2020)
[25] Yu, T., Liu, X.W., Dai, Y.H., Sun, J.:Stochastic variance reduced gradient methods using a trustregion-like scheme. J. Sci. Comput. 87, 5(2021)
[26] Yu, T., Liu, X.W., Dai, Y.H., Sun, J.:A minibatch proximal stochastic recursive gradient algorithm using a trust-region-like scheme and Barzilai-Borwein stepsizes. IEEE T. Neur. Net. Lear. 32(10), (2021)
[27] Park, Y., Dhar, S., Boyd, S., Shah, M.:Variable metric proximal gradient method with diagonal Barzilai-Borwein stepsize. In:2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 3597-3601(2020)
[28] Yu, T., Liu, X.W., Dai, Y.H., Sun, J.:Variable metric proximal stochastic variance reduced gradient methods for nonconvex nonsmooth optimization. J. Ind. Manag. Optim. (2021). https://doi.org/10.3934/jimo.2021084
[29] Wang, X., Wang, S., Zhang, H.:Inexact proximal stochastic gradient method for convex composite optimization. Comput. Optim. Appl. 68(3), 579-618(2017)
[30] Wang, X., Wang, X., Yuan, Y.X.:Stochastic proximal quasi-newton methods for non-convex composite optimization. Optim. Method Softw. 34(5), 922-948(2019)
[31] Nesterov, Y.:Introductory Lectures on Convex Programming. Springer, Boston, MA, USA (1998)
[32] Beck, A.:First-order Methods in Optimization. SIAM, Philadelphia, PA, USA (2017)
[33] Beck, A., Teboulle, M.:A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183-202(2009)
[34] Karimi, H., Nutini, J., Schmidt, M.:Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition. In:Joint European conference on machine learning and knowledge discovery in databases, pp. 795-811(2016)
[35] Gong, P., Ye, J.:Linear convergence of variance-reduced stochastic gradient without strong convexity. arXiv:1406.1102(2014). Accessed 4 June 2014
[36] Zhang, H.:The restricted strong convexity revisited:analysis of equivalence to error bound and quadratic growth. Optim. Lett. 11(4), 817-833(2017)
[37] Lan, G.:An optimal method for stochastic composite optimization. Math. Program. 133(1-2), 365-397(2012)
[38] Reddi, S.J., Sra, S., Poczos, B., Smola, A.J.:Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. In:Advances in Neural Information Processing Systems, pp. 1145-1153(2016)

A Mini-Batch Proximal Stochastic Recursive Gradient Algorithm with Diagonal Barzilai–Borwein Stepsize

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 5

Recommended Articles

Metrics

Comments

[1]	Tian-De Guo, Yan Liu, Cong-Ying Han. An Overview of Stochastic Quasi-Newton Methods for Large-Scale Machine Learning [J]. Journal of the Operations Research Society of China, 2023, 11(2): 245-275.
[2]	Yong-Yong Chen, Fang-Fang Xu. Randomized Algorithms for Orthogonal Nonnegative Matrix Factorization [J]. Journal of the Operations Research Society of China, 2023, 11(2): 327-345.
[3]	Zoya Duriagina, Igor Lemishka, Igor Litvinchev, Jose Antonio Marmolejo, Alexander Pankratov, Tatiana Romanova, Georgy Yaskov. Optimized Filling of a Given Cuboid with Spherical Powders for Additive Manufacturing [J]. Journal of the Operations Research Society of China, 2021, 9(4): 853-868.
[4]	Jun-Feng Yang, Yin Zhang. Local Linear Convergence of an ADMM-Type Splitting Framework for Equality Constrained Optimization [J]. Journal of the Operations Research Society of China, 2021, 9(2): 308-319.
[5]	Zhou-Chen Lin. How Can Machine Learning and Optimization Help Each Other Better? [J]. Journal of the Operations Research Society of China, 2020, 8(2): 341-351.