[1] Bottou, L., Curtis, F.E., Nocedal, J.:Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223-311(2018) [2] Shalev-Shwartz, S., Ben-David, S.:Understanding Machine Learning:From theory to algorithms. Cambridge University Press, NY, USA (2014) [3] Sra, S., Nowozin, S., Wright, S.J.:Optimization for Machine Learning. MIT Press, Cambridge, London, England (2012) [4] Hastie, T., Tibshirani, R., Friedman, J.:The Elements of Statistical Learning:Data Mining, Inference, and Prediction. Springer, NY, USA (2009) [5] Recht, B., Ré, C.:Parallel stochastic gradient algorithms for large-scale matrix completion. Math. Prog. Comp. 5(2), 201-226(2013) [6] Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.:Deep Learning. MIT Press, Cambridge, London, England (2016) [7] Li, X.L.:Preconditioned stochastic gradient descent. IEEE T. Neur. Net. Lear. 29(5), 1454-1466(2017) [8] Zhang, S., Choromanska, A.E., LeCun, Y.:Deep learning with elastic averaging SGD. In:Advances in Neural Information Processing Systems, pp. 685-693(2015) [9] Jin, X.B., Zhang, X.Y., Huang, K., Geng, G.G.:Stochastic conjugate gradient algorithm with variance reduction. IEEE T. Neur. Net. Lear. 30(5), 1360-1369(2018) [10] Robbins, H., Monro, S.:A stochastic approximation method. Ann. Math. Stat. 22(3), 400-407(1951) [11] Roux, N.L., Schmidt, M., Bach, F.R.:A stochastic gradient method with an exponential convergence rate for finite training sets. In:Advances in Neural Information Processing Systems, pp. 2663-2671(2012) [12] Schmidt, M., Le Roux, N., Bach, F.:Minimizing finite sums with the stochastic average gradient. Math. Program. 162(1-2), 83-112(2017) [13] Johnson, R., Zhang, T.:Accelerating stochastic gradient descent using predictive variance reduction. In:Advances in Neural Information Processing Systems, pp. 315-323(2013) [14] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.:SARAH:A novel method for machine learning problems using stochastic recursive gradient. In:Proceedings of the 34th international conference on machine, pp. 2613-2621(2017) [15] Xiao, L., Zhang, T.:A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24(4), 2057-2075(2014) [16] Konečnỳ, J., Richtárik, P.:Semi-stochastic gradient descent methods. Front. Appl. Math. Stat. 3(9), 1-14(2017) [17] Konečnỳ, J., Liu, J., Richtárik, P., Takáč, M.:Mini-batch semi-stochastic gradient descent in the proximal setting. IEEE JSTSP 10(2), 242-255(2015) [18] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.:ProxSARAH:an efficient algorithmic framework for stochastic composite nonconvex optimization. J. Mach. Learn. Res. 21(110), 1-48(2020) [19] Barzilai, J., Borwein, J.M.:Two-point step size gradient methods. IMA J. Numer. Anal. 8(1), 141-148(1988) [20] Dai, Y.H., Huang, Y., Liu, X.W.:A family of spectral gradient methods for optimization. Comput. Optim. Appl. 74(1), 43-65(2019) [21] Bai, J., Hager, W.W., Zhang, H.:An inexact accelerated stochastic ADMM for separable convex optimization. Comput. Optim. Appl. 81(1), 479-518(2022) [22] Fletcher, R.:On the Barzilai-Borwein method. In:Qi, L., Teo, K., Yang, X. (eds.) Optimization and Control with Applications, vol. 96, pp. 235-256. Springer, Boston, USA (2005) [23] Tan, C., Ma, S., Dai, Y.H., Qian, Y.:Barzilai-Borwein step size for stochastic gradient descent. In:Advances in Neural Information Processing Systems, pp. 685-693(2016) [24] Liu, Y., Wang, X., Guo, T.:A linearly convergent stochastic recursive gradient method for convex optimization. Optim. Lett. 14, 2265-2283(2020) [25] Yu, T., Liu, X.W., Dai, Y.H., Sun, J.:Stochastic variance reduced gradient methods using a trustregion-like scheme. J. Sci. Comput. 87, 5(2021) [26] Yu, T., Liu, X.W., Dai, Y.H., Sun, J.:A minibatch proximal stochastic recursive gradient algorithm using a trust-region-like scheme and Barzilai-Borwein stepsizes. IEEE T. Neur. Net. Lear. 32(10), (2021) [27] Park, Y., Dhar, S., Boyd, S., Shah, M.:Variable metric proximal gradient method with diagonal Barzilai-Borwein stepsize. In:2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 3597-3601(2020) [28] Yu, T., Liu, X.W., Dai, Y.H., Sun, J.:Variable metric proximal stochastic variance reduced gradient methods for nonconvex nonsmooth optimization. J. Ind. Manag. Optim. (2021). https://doi.org/10.3934/jimo.2021084 [29] Wang, X., Wang, S., Zhang, H.:Inexact proximal stochastic gradient method for convex composite optimization. Comput. Optim. Appl. 68(3), 579-618(2017) [30] Wang, X., Wang, X., Yuan, Y.X.:Stochastic proximal quasi-newton methods for non-convex composite optimization. Optim. Method Softw. 34(5), 922-948(2019) [31] Nesterov, Y.:Introductory Lectures on Convex Programming. Springer, Boston, MA, USA (1998) [32] Beck, A.:First-order Methods in Optimization. SIAM, Philadelphia, PA, USA (2017) [33] Beck, A., Teboulle, M.:A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183-202(2009) [34] Karimi, H., Nutini, J., Schmidt, M.:Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition. In:Joint European conference on machine learning and knowledge discovery in databases, pp. 795-811(2016) [35] Gong, P., Ye, J.:Linear convergence of variance-reduced stochastic gradient without strong convexity. arXiv:1406.1102(2014). Accessed 4 June 2014 [36] Zhang, H.:The restricted strong convexity revisited:analysis of equivalence to error bound and quadratic growth. Optim. Lett. 11(4), 817-833(2017) [37] Lan, G.:An optimal method for stochastic composite optimization. Math. Program. 133(1-2), 365-397(2012) [38] Reddi, S.J., Sra, S., Poczos, B., Smola, A.J.:Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. In:Advances in Neural Information Processing Systems, pp. 1145-1153(2016) |