[1] Shalev-Shwartz, S., Ben-David, S.:Understanding Machine Learning:From Theory to Algorithms. Cambridge University Press, New York (2014) [2] Robbins, H., Monro, S.:A stochastic approximation method. Ann. Math. Stat. 22, 400-407(1951) [3] Bottou, L., Curtis, F.E., Nocedal, J.:Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223-311(2018) [4] Kushner, H.J., Yin, G.G.:Stochastic approximation and recursive algorithms and applications. Springer, New York (2003) [5] Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.:Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574-1609(2009) [6] Friedlander, M.P., Schmidt, M.:Hybrid deterministic-stochastic methods for data fitting. SIAM J. Sci. Comput. 34(3), A1380-A1405(2012) [7] Byrd, R.H., Chin, G.M., Nocedal, J., Wu, Y.:Sample size selection in optimization methods for machine learning. Math. Program. 134(1, Ser. B), 127-155(2012) [8] Bubeck, S.:Convex optimization:algorithms and complexity. Found. Trends Mach. Learn. 8(3-4), 231-357(2015) [9] Ghadimi, S., Lan, G.:Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341-2368(2013) [10] Nedić, A., Lee, S.:On stochastic subgradient mirror-descent algorithm with weighted averaging. SIAM J. Optim. 24(1), 84-107(2014) [11] Ghadimi, S., Lan,G., Zhang,H.:Mini-batch stochastic approximationmethodsfor nonconvex stochastic composite optimization. Math. Program. 155(1-2, Ser. A), 267-305(2016) [12] Iusem, A.N., Jofré, A., Oliveira, R.I., Thompson, P.:Extragradient method with variance reduction for stochastic variational inequalities. SIAM J. Optim. 27(2), 686-724(2017) [13] Jofré, A., Thompson, P.:On variance reduction for stochastic smooth convex optimization with multiplicative noise (2017). arXiv:1705.02969 [14] Nguyen, L., Nguyen, P.H., van Dijk, M., Richtarik, P., Scheinberg, K., Takac, M.:SGD and hogwild! Convergence without the bounded gradients assumption. In:Proceedings of the 35th international conference on machine learning, pp. 3750-3758(2018) [15] Johnson, R., Zhang, T.:Accelerating stochastic gradient descent using predictive variance reduction. In:Proceedings of the 26th International Conference on Neural Information Processing Systems, pp. 315-323(2013) [16] Schmidt, M., Le Roux, N., Bach, F.:Minimizing finite sums with the stochastic average gradient. Math. Program. 162(1-2, Ser. A), 83-112(2017) [17] Defazio, A., Bach, F., Lacoste-Julien, S.:SAGA:a fast incremental gradient method with support for non-strongly convex composite objectives. In:Advances in Neural Information Processing Systems, pp. 1646-1654(2014) [18] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.:SARAH:a novel method for machine learning problems using stochastic recursive gradient. In:Proceedings of the 34th international conference on machine learning, pp. 2613-2621(2017) [19] Korpelevich, G.M.:The extragradient method for finding saddle points and other problems. Ekon. Mat. Metody 12, 747-756(1976) [20] Facchinei, F., Pang, J.S.:Finite-dimensional variational inequalities and complementarity problems, vol. I. Springer Series in Operations Research. Springer, New York (2003) [21] Robbins, H., Siegmund, D.:A convergence theorem for non negative almost supermartingales and some applications. In:Proceedings of a Symposium Held at the Center for Tomorrow, the Ohio State University, pp. 233-257(1971) [22] Billingsley, P.:Probability and measure, 3rd edn. Wiley Series in Probability and Mathematical Statistics. Wiley, New York (1995) |