An Accelerated Stochastic Mirror Descent Method

Expand
  • 1 LSEC, ICMSEC, AMSS, Chinese Academy of Sciences, Beijing 100190, China;
    2 Department of Mathematics, University of Chinese Academy of Sciences, Beijing 100049, China

Received date: 2023-03-03

  Revised date: 2023-04-05

  Online published: 2024-08-15

Supported by

This work was partially supported by the National Natural Science Foundation of China (No.1228201).

Abstract

Driven by large-scale optimization problems arising from machine learning, the development of stochastic optimization methods has witnessed a huge growth. Numerous types of methods have been developed based on vanilla stochastic gradient descent method. However, for most algorithms, convergence rate in stochastic setting cannot simply match that in deterministic setting. Better understanding the gap between deterministic and stochastic optimization is the main goal of this paper. Specifically, we are interested in Nesterov acceleration of gradient-based approaches. In our study, we focus on acceleration of stochastic mirror descent method with implicit regularization property. Assuming that the problem objective is smooth and convex or strongly convex, our analysis prescribes the method parameters which ensure fast convergence of the estimation error and satisfied numerical performance.

Cite this article

Bo-Ou Jiang, Ya-Xiang Yuan . An Accelerated Stochastic Mirror Descent Method[J]. Journal of the Operations Research Society of China, 2024 , 12(3) : 549 -571 . DOI: 10.1007/s40305-023-00492-2

References

[1] Ghadimi, S., Guanghui Lan, G.H., Hongchao Zhang, H.C.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math. Program. 155, 267-305(2016)
[2] Kulunchakov, A., Mairal, J.: A generic acceleration framework for stochastic composite optimization. Adv. Neural Inf. Process. Syst. 32, 12556-12567(2019)
[3] Gasnikov, A.V., Nesterov, Y.: Universal method for stochastic composite optimization problems. Comput. Math. Math. Phys. 58, 48-64(2018)
[4] Milzarek, A., Xiao, X.T.,Wen, Z.W., Ulbrich, M.: On the local convergence of a stochastic semismooth Newton method for nonsmooth nonconvex optimization. Sci. China Math. 65, 2151-2170(2015)
[5] Wang, X.Y., Wang, X., Yuan, Y.X.: Stochastic proximal quasi-Newton methods for non-convex composite optimization. Optim. Methods Softw. 34, 922-948(2019)
[6] Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4, 1168-1200(2005)
[7] Daubechies, I., Defrise, M., Mol, C.D.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 57, 1413-1457(2003)
[8] Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183-202(2009)
[9] Woodworth, B.E., Srebro, N.: Tight complexity bounds for optimizing composite objectives. NIPS 29, 3639-3647(2016)
[10] Allen-Zhu, Z.: Katyusha: the first direct acceleration of stochastic gradient methods. J. Mach. Learn. Res. 18, 1-51(2018)
[11] Roux, N.L., Schmidt,M.W., Bach, F.R.: A stochastic gradientmethod with an exponential convergence rate for finite training sets. NIPS 25, 2663-2671(2012)
[12] Defazio, A., Bach, F.R., Lacoste-Julien, S.: Saga: a fast incremental gradient method with support for non-strongly convex composite objectives. Adv. Neural Inf. Process. Syst. 27, 615-629(2014)
[13] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Adv. Neural Inf. Process. Syst. 26, 315-323(2013)
[14] Shalev-Shwartz, S., Tong Zhang, T.: Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization. Math. Program. 155, 105-145(2013)
[15] Mairal, J.: Incremental majorization-minimization optimization with application to large-scale machine learning. SIAM J. Optim. 25, 829-855(2015)
[16] Shamir, O.: A stochastic PCA and SVD algorithm with an exponential convergence rate. In: International Conference on Machine Learning, pp. 144-152(2015)
[17] Zhou, K.W.: Direct acceleration of saga using sampled negativemomentum.In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 1602-1610(2019)
[18] Fang, C., Li, C.J., Lin, Z.C., Zhang, T.: Spider: near-optimal non-convex optimization via stochastic path integrated differential estimator. Adv. Neural Inf. Process. Syst. 31, 689-699(2018)
[19] Paquette, C., Lin, H.Z., Drusvyatskiy, D., Mairal, J., Harchaoui, Z.: Catalyst for gradient-based nonconvex optimization. In: International Conference on Artificial Intelligence and Statistics, pp. 613-622(2018)
[20] Lei, L., Ju, C., Chen, J.B., Jordan, M.I.: Non-convex finite-sum optimization via SCSG methods. Adv. Neural Inf. Process. Syst. 30, 2345-2355(2017)
[21] Polyak, B.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4, 1-17(1964)
[22] Nesterov, Y.: A method for solving the convex programming problem with convergence rate o(1/k2). Proc. USSR Acad. Sci. 269, 543-547(1983)
[23] Nesterov, Y.: Introductory lectures on convex programming volume I: basic course. Lect. Notes 3(4), 5(1998)
[24] Su, W.J., Boyd, S., Candès, E.J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. J. Mach. Learn. Res. 17, 1-43(2014)
[25] Shi, B., Du, S., Jordan, M.I., Su, W.J.: Understanding the acceleration phenomenon via high-resolution differential equations. Math. Program. 195, 79-148(2018)
[26] Allen-Zhu, Z., Orecchia, L.: Linear coupling: an ultimate unification of gradient and mirror descent. Information Technology Convergence and Services (2014) https://doi.org/10.48550/arXiv.1407.1537
[27] Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167-175(2003)
[28] Zhou, Z.Y., Mertikopoulos, P., Bambos, N., Boyd, S., Glynn, P.W.: Stochastic mirror descent in variationally coherent optimization problems. In: Advances in Neural Information Processing Systems, pp. 7043-7052(2017)
[29] Lessard, L., Recht, B., Packard, A.: Analysis and design of optimization algorithms via integral quadratic constraints. SIAM J. Optim. 26, 57-95(2016)
[30] Hu, B., Lessard, L.: Dissipativity theory for Nesterov’s accelerated method. In: International Conference on Machine Learning, pp. 1549-1557(2017)
[31] Robbins, H.E.: A stochastic approximation method. Ann. Math. Stat. 22, 400-407(1951)
[32] Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60, 223-311(2018)
[33] Nguyen, L.M., Liu, J., Scheinberg, K., Takác, M.: Sarah: a novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613-2621(2017)
[34] Schmidt, M.W., Roux, N.L., Bach, F.R.: Minimizing finite sums with the stochastic average gradient. Math. Program. 162, 83-112(2013)
[35] Allen-Zhu, Z.: Natasha: faster non-convex stochastic optimization via strongly non-convex parameter. In: International Conference on Machine Learning. PMLR (2017). arXiv:1702.00763
[36] Azizan, N., Hassibi, B.: Stochastic gradient/mirror descent: minimax optimality and implicit regularization (2018). arXiv:1806.00952
[37] Luo, Y.L., Huo, X.M., Mei, Y.J.: Implicit regularization properties of variance reduced stochastic mirror descent. In: IEEE International Symposium on Information Theory (ISIT), pp. 696-701(2022)
[38] Ilandarideva, S., Bekri, Y., Juditsky, A.B., Perchet, V.: Stochastic mirror descent for large-scale sparse recovery (2022). arXiv:2210.12882
[39] Zhao, P.L., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1-9(2015)
Options
Outlines

/