[1] Ghadimi, S., Guanghui Lan, G.H., Hongchao Zhang, H.C.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math. Program. 155, 267-305(2016) [2] Kulunchakov, A., Mairal, J.: A generic acceleration framework for stochastic composite optimization. Adv. Neural Inf. Process. Syst. 32, 12556-12567(2019) [3] Gasnikov, A.V., Nesterov, Y.: Universal method for stochastic composite optimization problems. Comput. Math. Math. Phys. 58, 48-64(2018) [4] Milzarek, A., Xiao, X.T.,Wen, Z.W., Ulbrich, M.: On the local convergence of a stochastic semismooth Newton method for nonsmooth nonconvex optimization. Sci. China Math. 65, 2151-2170(2015) [5] Wang, X.Y., Wang, X., Yuan, Y.X.: Stochastic proximal quasi-Newton methods for non-convex composite optimization. Optim. Methods Softw. 34, 922-948(2019) [6] Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4, 1168-1200(2005) [7] Daubechies, I., Defrise, M., Mol, C.D.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 57, 1413-1457(2003) [8] Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183-202(2009) [9] Woodworth, B.E., Srebro, N.: Tight complexity bounds for optimizing composite objectives. NIPS 29, 3639-3647(2016) [10] Allen-Zhu, Z.: Katyusha: the first direct acceleration of stochastic gradient methods. J. Mach. Learn. Res. 18, 1-51(2018) [11] Roux, N.L., Schmidt,M.W., Bach, F.R.: A stochastic gradientmethod with an exponential convergence rate for finite training sets. NIPS 25, 2663-2671(2012) [12] Defazio, A., Bach, F.R., Lacoste-Julien, S.: Saga: a fast incremental gradient method with support for non-strongly convex composite objectives. Adv. Neural Inf. Process. Syst. 27, 615-629(2014) [13] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Adv. Neural Inf. Process. Syst. 26, 315-323(2013) [14] Shalev-Shwartz, S., Tong Zhang, T.: Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization. Math. Program. 155, 105-145(2013) [15] Mairal, J.: Incremental majorization-minimization optimization with application to large-scale machine learning. SIAM J. Optim. 25, 829-855(2015) [16] Shamir, O.: A stochastic PCA and SVD algorithm with an exponential convergence rate. In: International Conference on Machine Learning, pp. 144-152(2015) [17] Zhou, K.W.: Direct acceleration of saga using sampled negativemomentum.In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 1602-1610(2019) [18] Fang, C., Li, C.J., Lin, Z.C., Zhang, T.: Spider: near-optimal non-convex optimization via stochastic path integrated differential estimator. Adv. Neural Inf. Process. Syst. 31, 689-699(2018) [19] Paquette, C., Lin, H.Z., Drusvyatskiy, D., Mairal, J., Harchaoui, Z.: Catalyst for gradient-based nonconvex optimization. In: International Conference on Artificial Intelligence and Statistics, pp. 613-622(2018) [20] Lei, L., Ju, C., Chen, J.B., Jordan, M.I.: Non-convex finite-sum optimization via SCSG methods. Adv. Neural Inf. Process. Syst. 30, 2345-2355(2017) [21] Polyak, B.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4, 1-17(1964) [22] Nesterov, Y.: A method for solving the convex programming problem with convergence rate o(1/k2). Proc. USSR Acad. Sci. 269, 543-547(1983) [23] Nesterov, Y.: Introductory lectures on convex programming volume I: basic course. Lect. Notes 3(4), 5(1998) [24] Su, W.J., Boyd, S., Candès, E.J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. J. Mach. Learn. Res. 17, 1-43(2014) [25] Shi, B., Du, S., Jordan, M.I., Su, W.J.: Understanding the acceleration phenomenon via high-resolution differential equations. Math. Program. 195, 79-148(2018) [26] Allen-Zhu, Z., Orecchia, L.: Linear coupling: an ultimate unification of gradient and mirror descent. Information Technology Convergence and Services (2014) https://doi.org/10.48550/arXiv.1407.1537 [27] Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167-175(2003) [28] Zhou, Z.Y., Mertikopoulos, P., Bambos, N., Boyd, S., Glynn, P.W.: Stochastic mirror descent in variationally coherent optimization problems. In: Advances in Neural Information Processing Systems, pp. 7043-7052(2017) [29] Lessard, L., Recht, B., Packard, A.: Analysis and design of optimization algorithms via integral quadratic constraints. SIAM J. Optim. 26, 57-95(2016) [30] Hu, B., Lessard, L.: Dissipativity theory for Nesterov’s accelerated method. In: International Conference on Machine Learning, pp. 1549-1557(2017) [31] Robbins, H.E.: A stochastic approximation method. Ann. Math. Stat. 22, 400-407(1951) [32] Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60, 223-311(2018) [33] Nguyen, L.M., Liu, J., Scheinberg, K., Takác, M.: Sarah: a novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613-2621(2017) [34] Schmidt, M.W., Roux, N.L., Bach, F.R.: Minimizing finite sums with the stochastic average gradient. Math. Program. 162, 83-112(2013) [35] Allen-Zhu, Z.: Natasha: faster non-convex stochastic optimization via strongly non-convex parameter. In: International Conference on Machine Learning. PMLR (2017). arXiv:1702.00763 [36] Azizan, N., Hassibi, B.: Stochastic gradient/mirror descent: minimax optimality and implicit regularization (2018). arXiv:1806.00952 [37] Luo, Y.L., Huo, X.M., Mei, Y.J.: Implicit regularization properties of variance reduced stochastic mirror descent. In: IEEE International Symposium on Information Theory (ISIT), pp. 696-701(2022) [38] Ilandarideva, S., Bekri, Y., Juditsky, A.B., Perchet, V.: Stochastic mirror descent for large-scale sparse recovery (2022). arXiv:2210.12882 [39] Zhao, P.L., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1-9(2015) |