In this paper, we establish a unified framework to study the almost sure global convergence and the expected convergencerates of a class ofmini-batch stochastic(projected) gradient (SG) methods, including two popular types of SG: stepsize diminished SG and batch size increased SG. We also show that the standard variance uniformly bounded assumption, which is frequently used in the literature to investigate the convergence of SG, is actually not required when the gradient of the objective function is Lipschitz continuous. Finally, we show that our framework can also be used for analyzing the convergence of a mini-batch stochastic extragradient method for stochastic variational inequality.
[1] Shalev-Shwartz, S., Ben-David, S.:Understanding Machine Learning:From Theory to Algorithms. Cambridge University Press, New York (2014)
[2] Robbins, H., Monro, S.:A stochastic approximation method. Ann. Math. Stat. 22, 400-407(1951)
[3] Bottou, L., Curtis, F.E., Nocedal, J.:Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223-311(2018)
[4] Kushner, H.J., Yin, G.G.:Stochastic approximation and recursive algorithms and applications. Springer, New York (2003)
[5] Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.:Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574-1609(2009)
[6] Friedlander, M.P., Schmidt, M.:Hybrid deterministic-stochastic methods for data fitting. SIAM J. Sci. Comput. 34(3), A1380-A1405(2012)
[7] Byrd, R.H., Chin, G.M., Nocedal, J., Wu, Y.:Sample size selection in optimization methods for machine learning. Math. Program. 134(1, Ser. B), 127-155(2012)
[8] Bubeck, S.:Convex optimization:algorithms and complexity. Found. Trends Mach. Learn. 8(3-4), 231-357(2015)
[9] Ghadimi, S., Lan, G.:Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341-2368(2013)
[10] Nedić, A., Lee, S.:On stochastic subgradient mirror-descent algorithm with weighted averaging. SIAM J. Optim. 24(1), 84-107(2014)
[11] Ghadimi, S., Lan,G., Zhang,H.:Mini-batch stochastic approximationmethodsfor nonconvex stochastic composite optimization. Math. Program. 155(1-2, Ser. A), 267-305(2016)
[12] Iusem, A.N., Jofré, A., Oliveira, R.I., Thompson, P.:Extragradient method with variance reduction for stochastic variational inequalities. SIAM J. Optim. 27(2), 686-724(2017)
[13] Jofré, A., Thompson, P.:On variance reduction for stochastic smooth convex optimization with multiplicative noise (2017). arXiv:1705.02969
[14] Nguyen, L., Nguyen, P.H., van Dijk, M., Richtarik, P., Scheinberg, K., Takac, M.:SGD and hogwild! Convergence without the bounded gradients assumption. In:Proceedings of the 35th international conference on machine learning, pp. 3750-3758(2018)
[15] Johnson, R., Zhang, T.:Accelerating stochastic gradient descent using predictive variance reduction. In:Proceedings of the 26th International Conference on Neural Information Processing Systems, pp. 315-323(2013)
[16] Schmidt, M., Le Roux, N., Bach, F.:Minimizing finite sums with the stochastic average gradient. Math. Program. 162(1-2, Ser. A), 83-112(2017)
[17] Defazio, A., Bach, F., Lacoste-Julien, S.:SAGA:a fast incremental gradient method with support for non-strongly convex composite objectives. In:Advances in Neural Information Processing Systems, pp. 1646-1654(2014)
[18] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.:SARAH:a novel method for machine learning problems using stochastic recursive gradient. In:Proceedings of the 34th international conference on machine learning, pp. 2613-2621(2017)
[19] Korpelevich, G.M.:The extragradient method for finding saddle points and other problems. Ekon. Mat. Metody 12, 747-756(1976)
[20] Facchinei, F., Pang, J.S.:Finite-dimensional variational inequalities and complementarity problems, vol. I. Springer Series in Operations Research. Springer, New York (2003)
[21] Robbins, H., Siegmund, D.:A convergence theorem for non negative almost supermartingales and some applications. In:Proceedings of a Symposium Held at the Center for Tomorrow, the Ohio State University, pp. 233-257(1971)
[22] Billingsley, P.:Probability and measure, 3rd edn. Wiley Series in Probability and Mathematical Statistics. Wiley, New York (1995)