On the Convergence of Asynchronous Parallel Iteration with Unbounded Delays

doi:10.1007/s40305-017-0183-1

Abstract

Abstract:

Recent years have witnessed the surge of asynchronous parallel (asyncparallel) iterative algorithms due to problems involving very large-scale data and a large number of decision variables. Because of asynchrony, the iterates are computed with outdated information, and the age of the outdated information, which we call delay, is the number of times it has been updated since its creation. Almost all recent works prove convergence under the assumption of a finite maximum delay and set their stepsize parameters accordingly. However, the maximum delay is practically unknown. This paper presents convergence analysis of an async-parallel method from a probabilistic viewpoint, and it allows for large unbounded delays. An explicit formula of stepsize that guarantees convergence is given depending on delays' statistics. With p + 1 identical processors, we empirically measured that delays closely follow the Poisson distribution with parameter p, matching our theoretical model, and thus, the stepsize can be set accordingly. Simulations on both convex and nonconvex optimization problems demonstrate the validness of our analysis and also show that the existing maximum-delay-induced stepsize is too conservative, often slows down the convergence of the algorithm.

Key words: Asynchronous unbounded delays, Nonconvex, Convex

Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin. On the Convergence of Asynchronous Parallel Iteration with Unbounded Delays[J]. Journal of the Operations Research Society of China, 2019, 7(1): 5-42.

TrendMD

References

[1] WhiteHouse:Big Data:Seizing Opportunities Preserving Values (2014)
[2] Cortes, C., Vapnik, V.:Support-vector networks. Mach. Learn. 20(3), 273-297(1995)
[3] Tibshirani, R.:Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58, 267-288(1996)
[4] Zhou, H., Li, L., Zhu, H.:Tensor regression with applications in neuroimaging data analysis. J. Am. Stat. Assoc. 108(502), 540-552(2013)
[5] Zou, H., Hastie, T., Tibshirani, R.:Sparse principal component analysis. J. Comput. Graph. Stat. 15(2), 265-286(2006)
[6] Cichocki, A., Zdunek, R., Phan, A.H., Amari, Si:Nonnegative Matrix and Tensor Factorizations:Applications to Exploratory Multi-way Data Analysis and Blind Source Separation. Wiley, London (2009)
[7] Beck, A., Teboulle, M.:A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183-202(2009)
[8] Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.:Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574-1609(2009)
[9] Nesterov, Y.:Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341-362(2012)
[10] Dang, C.D., Lan, G.:Stochastic block mirror descent methods for nonsmooth and stochastic optimization. SIAM J. Optim. 25(2), 856-881(2015)
[11] Xu, Y., Yin, W.:Block stochastic gradient iteration for convex and nonconvex optimization. SIAM J. Optim. 25(3), 1686-1716(2015)
[12] Liu, J., Wright, S., Re, C., Bittorf, V., Sridhar, S.:An asynchronous parallel stochastic coordinate descent algorithm. In:Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 469-477(2014)
[13] Peng, Z., Xu, Y., Yan, M., Yin, W.:Arock:an algorithmic framework for asynchronous parallel coordinate updates. SIAM J. Sci. Comput. 38(5), A2851-A2879(2016)
[14] Strikwerda, J.C.:A probabilistic analysis of asynchronous iteration. Linear Algebra Appl. 349(13), 125-154(2002)
[15] Hannah, R., Yin, W.:On unbounded delays in asynchronous parallel fixed-point algorithms. arXiv preprint arXiv:1609.04746(2016)
[16] Liu, J., Wright, S.J.:Asynchronous stochastic coordinate descent:parallelism and convergence properties. SIAM J. Optim. 25(1), 351-376(2015)
[17] Cannelli, L., Facchinei, F., Kungurtsev, V., Scutari, G.:Asynchronous parallel algorithms for nonconvex big-data optimization:model and convergence. arXiv preprint arXiv:1607.04818(2016)
[18] Davis, D.:The asynchronous PALM algorithm for nonsmooth nonconvex problems. arXiv preprint arXiv:1604.00526(2016)
[19] Mokhtai, A., Koppel, A., Ribeiro, A.:A class of parallel doubly stochastic algorithms for large-scale learning. arXiv preprint arXiv:1606.04991(2016)
[20] Recht, B., Re, C., Wright, S., Niu, F.:Hogwild:A lock-free approach to parallelizing stochastic gradient descent. In:Advances in Neural Information Processing Systems, pp. 693-701(2011)
[21] Hildreth, C.:A quadratic programming procedure. Naval Res. Logist. Q. 4(1), 79-85(1957)
[22] Grippo, L., Sciandrone, M.:On the convergence of the block nonlinear Gauss-Seidel method under convex constraints. Oper. Res. Lett. 26(3), 127-136(2000)
[23] Luo, Z.Q., Tseng, P.:On the convergence of the coordinate descent method for convex differentiable minimization. J. Optim. Theory Appl. 72(1), 7-35(1992)
[24] Tseng, P.:Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109(3), 475-494(2001)
[25] Hong, M., Wang, X., Razaviyayn, M., Luo, Z.Q.:Iteration complexity analysis of block coordinate descent methods. Math. Program. 163(1-2), 85-114(2017)
[26] Tseng, P., Yun, S.:A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117(1-2), 387-423(2009)
[27] Lu, Z., Xiao, L.:On the complexity analysis of randomized block-coordinate descent methods. Math. Program. 152(1-2), 615-642(2015)
[28] Richtárik, P., Takáč, M.:Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144(1-2), 1-38(2014)
[29] Rosenfeld, J.L.:A case study in programming for parallel-processors. Commun. ACM 12(12), 645-655(1969)
[30] Chazan, D., Miranker, W.:Chaotic relaxation. Linear Algebra Appl. 2(2), 199-222(1969)
[31] Bertsekas, D.P.:Distributed asynchronous computation of fixed points. Math. Program. 27(1), 107-120(1983)
[32] Tseng, P., Bertsekas, D.P., Tsitsiklis, J.N.:Partially asynchronous, parallel algorithms for network flow and other problems. SIAM J. Control Optim. 28(3), 678-710(1990)
[33] Frommer, A., Szyld, D.B.:On asynchronous iterations. J. Comput. Appl. Math. 123(1), 201-216(2000)
[34] Gut, A.:A Graduate Course:A Graduate Course. Springer, Berlin (2006)
[35] Lai, M.J., Yin, W.:Augmented 1 and nuclear-norm models with a globally linearly convergent algorithm. SIAM J. Imaging Sci. 6(2), 1059-1091(2013)
[36] Rockafellar, R.T., Wets, R.J.B.:Variational Analysis, vol. 317. Springer, Berlin (2009)
[37] Paatero, P., Tapper, U.:Positive matrix factorization:a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5(2), 111-126(1994)
[38] Xu, Y., Yin, W.:A globally convergent algorithm for nonconvex optimization based on block coordinate update. J. Sci. Comput. 72(2), 700-734(2017)