[1] Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55, 78-87(2012) [2] LeCun, Y., Bengio, Y., Hinton, G.E.: Deep learning. Nature 521, 436-444(2015) [3] Bennett, K.P., Parrado-Hernández, E.: The interplay of optimization and machine learning research. Journal of Machine Learning Research 7, 1265-1281(2006) [4] Yang, Y., Sun, J., Li, H., Xu, Z.: Deep ADMM-net for compressive sensing MRI. In: Advances in Neural Information Processing Systems, pp. 10-18(2016) [5] Li, H., Yang, Y., Lin, Z.: Optimization algorithm inspired deep neural network structure design. In: Asian Conference on Machine Learning, pp. 614-629(2018) [6] Xie, X., Wu, J., Zhong, Z., Liu, G., Lin, Z.: Differentiable linearized ADMM. In: International Conference on Machine Learning, pp. 6902-6911(2019) [7] Polyak, B.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4, 1-17(1964) [8] Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(1/k2). Soviet Mathematics Doklady 27, 372-376(1983) [9] Wolpert, D.H.: The lack of a prior distinctions between learning algorithms. Neural Comput. 8, 1341-1390(1996) [10] Wolpert, D.H., Macready, W.G.: No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1, 67-82(1997) [11] Wang, J.: Geometric Structure of High-Dimensional Data and Dimensionality Reduction. Springer, Berlin (2012) [12] Liu, G., Lin, Z., Yan, S., Sun, J., Ma, Y.: Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 35, 171-184(2013) [13] Yin, M., Gao, J., Lin, Z.: Laplacian regularized low-rank representation and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 38, 504-517(2016) [14] Sun, R., Luo, Z.-Q.: Guaranteed matrix completion via nonconvex factorization. In: The IEEE Symposium on Foundations of Computer Science, pp. 270-289(2015) [15] Ge, R., Jin, C., Zheng, Y.: No spurious local minima in nonconvex low rank problems: a unified geometric analysis. In: International Conference on Machine Learning, pp. 1233-1242(2017) [16] Li, H., Lin, Z.: Provable accelerated gradient method for nonconvex low rank optimization. Mach. Learn. 103, 1-30(2019) [17] Ge, R., Ma, T.: On the optimization landscape of tensor decompositions. In: Advances in Neural Information Processing Systems, pp. 3656-3666(2017) [18] Jin, C., Ge, R., Netrapalli, P., Kakade, S.M., Jordan, M.I.: How to escape saddle points efficiently. In: International Conference on Machine Learning, pp. 1724-1732(2017) [19] Li, H., Lin, Z.: Accelerated proximal gradient methods for nonconvex programming, In: Advances in Neural Information Processing Systems, pp. 379-387(2015) [20] Hong, M., Luo, Z., Razaviyayn, M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optim. 26, 337-364(2016) [21] Wang, Y., Yin, W., Zeng, J.: Global convergence of ADMM in nonconvex nonsmooth optimization. J. Sci. Comput. 78, 29-63(2019) [22] Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Lojasiewicz inequality. Math. Oper. Res. 35, 438-457(2010) [23] Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 360, 223-311(2018) [24] Fang, C., Cheng, F., Lin, Z.: Faster and non-ergodic O(1/K) stochastic alternating direction method of multipliers In: Advances in Neural Information Processing Systems, pp. 4479-4488(2017) [25] Lan, G., Lee, S., Zhou, Y.: Communication-efficient algorithms for decentralized and stochastic optimization. Math. Program. 158, 1-48(2018) [26] Fang, C., Lin, Z.: Parallel asynchronous stochastic variance reduction for nonconvex optimization. In: AAAI Conference on Artificial Intelligence, pp. 794-800(2017) [27] Su,L.,Vaidya,N.H.:Fault-tolerantmulti-agentoptimization:Optimaliterativedistributedalgorithms. In: Proceedings of the 2016 ACM Symposium on Principles of Distributed Computing, pp. 425-434(2016) [28] Hutter, F., Caruana, R., Bardenet, R., Bilenko, M., Guyon, I., Kegl, B., Larochelle, H.: AutoML 2014@ ICML. In: AutoML 2014 Workshop @ ICML (2014) [29] Elsken, T., Metzen, J.H., Hutter, F.: Neural architecture search: a survey. J. Mach. Learn. Res. 20, 1-21(2019) [30] Liu, H., Simonyan, K., Yang, Y.: DARTS: Differentiable architecture search, In: International Conference on Learning Representations (2019) arXiv:1806.09055v2 [31] Lin, Z., Liu, R., Su, Z.: Linearized alternating direction method with adaptive penalty for low-rank representation. In: Advances in Neural Information Processing Systems, pp. 612-620(2011) [32] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. News. Physiol. Sci. 1(3), 315-323(2013) [33] Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121-2159(2011) [34] Rosenfeld, N., Balkanski, E., Globerson, A., Singer, Y.: Learning to optimize combinatorial functions. In: International Conference on Machine Learning, pp. 4371-4380(2018) [35] Cassioli, A., Lorenzo, D.D., Locatelli, M., Schoen, F., Sciandrone, M.: Machine learning for global optimization. Computational Optimization and Applications 51, 279-303(2012) [36] Li, K., Malik, J.: Learning to optimize. In: International Conference on Learning Representation (2017) [37] Andrychowicz, M., Denil, M., Colmenarejo, S.G., Hoffman, M.W., Pfau, D., Schaul, T., de Freitas, N.: Learning to learn by gradient descent by gradient descent. In: Advances in Neural Information Processing Systems, pp. 3981-3989(2016) [38] Liu, R., Cheng, S., He, Y., Fan, X., Lin, Z., Luo, Z.: On the convergence of learning-based iterative methods for nonconvex inverse problems. IEEE Trans. Pattern Anal. Mach. Intell (2018) arXiv:1808.05331 [39] Reed, S., de Freitas, N.: Neural programmer-interpreters. In: International Conference on Learning Representation (2016) arXiv:1511.06279 [40] Cai, J., Shin, R., Song, D.: Making neural programming architectures generalize via recursion. In: International Conference on Learning Representations (2017) arXiv:1704.06611 [41] Li, C., Tarlow, D., Gaunt, A.L., Brockschmidt, M., Kushman, N.: Neural program lattices. In: International Conference on Learning Representation, Paris, 4-6 June (2017) |