Special Issue: Machine Learning and Optimization Algorithm

A Framework of Convergence Analysis of Mini-batch Stochastic Projected Gradient Methods

Expand
  • 1 School of Sciences, Dalian Ocean University, Dalian 116023, China;
    2 School of Mathematical Sciences, Dalian University of Technology, Dalian 116023, China

Received date: 2018-10-30

  Revised date: 2019-10-11

  Online published: 2023-05-24

Supported by

the National Natural Science Foundation of China (Nos. 11871135 and 11801054) and the Fundamental Research Funds for the Central Universities (No. DUT19K46)

Abstract

In this paper, we establish a unified framework to study the almost sure global convergence and the expected convergencerates of a class ofmini-batch stochastic(projected) gradient (SG) methods, including two popular types of SG: stepsize diminished SG and batch size increased SG. We also show that the standard variance uniformly bounded assumption, which is frequently used in the literature to investigate the convergence of SG, is actually not required when the gradient of the objective function is Lipschitz continuous. Finally, we show that our framework can also be used for analyzing the convergence of a mini-batch stochastic extragradient method for stochastic variational inequality.

Cite this article

Jian Gu, Xian-Tao Xiao . A Framework of Convergence Analysis of Mini-batch Stochastic Projected Gradient Methods[J]. Journal of the Operations Research Society of China, 2023 , 11(2) : 347 -369 . DOI: 10.1007/s40305-019-00276-7

References

[1] Shalev-Shwartz, S., Ben-David, S.:Understanding Machine Learning:From Theory to Algorithms. Cambridge University Press, New York (2014)
[2] Robbins, H., Monro, S.:A stochastic approximation method. Ann. Math. Stat. 22, 400-407(1951)
[3] Bottou, L., Curtis, F.E., Nocedal, J.:Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223-311(2018)
[4] Kushner, H.J., Yin, G.G.:Stochastic approximation and recursive algorithms and applications. Springer, New York (2003)
[5] Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.:Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574-1609(2009)
[6] Friedlander, M.P., Schmidt, M.:Hybrid deterministic-stochastic methods for data fitting. SIAM J. Sci. Comput. 34(3), A1380-A1405(2012)
[7] Byrd, R.H., Chin, G.M., Nocedal, J., Wu, Y.:Sample size selection in optimization methods for machine learning. Math. Program. 134(1, Ser. B), 127-155(2012)
[8] Bubeck, S.:Convex optimization:algorithms and complexity. Found. Trends Mach. Learn. 8(3-4), 231-357(2015)
[9] Ghadimi, S., Lan, G.:Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341-2368(2013)
[10] Nedić, A., Lee, S.:On stochastic subgradient mirror-descent algorithm with weighted averaging. SIAM J. Optim. 24(1), 84-107(2014)
[11] Ghadimi, S., Lan,G., Zhang,H.:Mini-batch stochastic approximationmethodsfor nonconvex stochastic composite optimization. Math. Program. 155(1-2, Ser. A), 267-305(2016)
[12] Iusem, A.N., Jofré, A., Oliveira, R.I., Thompson, P.:Extragradient method with variance reduction for stochastic variational inequalities. SIAM J. Optim. 27(2), 686-724(2017)
[13] Jofré, A., Thompson, P.:On variance reduction for stochastic smooth convex optimization with multiplicative noise (2017). arXiv:1705.02969
[14] Nguyen, L., Nguyen, P.H., van Dijk, M., Richtarik, P., Scheinberg, K., Takac, M.:SGD and hogwild! Convergence without the bounded gradients assumption. In:Proceedings of the 35th international conference on machine learning, pp. 3750-3758(2018)
[15] Johnson, R., Zhang, T.:Accelerating stochastic gradient descent using predictive variance reduction. In:Proceedings of the 26th International Conference on Neural Information Processing Systems, pp. 315-323(2013)
[16] Schmidt, M., Le Roux, N., Bach, F.:Minimizing finite sums with the stochastic average gradient. Math. Program. 162(1-2, Ser. A), 83-112(2017)
[17] Defazio, A., Bach, F., Lacoste-Julien, S.:SAGA:a fast incremental gradient method with support for non-strongly convex composite objectives. In:Advances in Neural Information Processing Systems, pp. 1646-1654(2014)
[18] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.:SARAH:a novel method for machine learning problems using stochastic recursive gradient. In:Proceedings of the 34th international conference on machine learning, pp. 2613-2621(2017)
[19] Korpelevich, G.M.:The extragradient method for finding saddle points and other problems. Ekon. Mat. Metody 12, 747-756(1976)
[20] Facchinei, F., Pang, J.S.:Finite-dimensional variational inequalities and complementarity problems, vol. I. Springer Series in Operations Research. Springer, New York (2003)
[21] Robbins, H., Siegmund, D.:A convergence theorem for non negative almost supermartingales and some applications. In:Proceedings of a Symposium Held at the Center for Tomorrow, the Ohio State University, pp. 233-257(1971)
[22] Billingsley, P.:Probability and measure, 3rd edn. Wiley Series in Probability and Mathematical Statistics. Wiley, New York (1995)
Options
Outlines

/