How Can Machine Learning and Optimization Help Each Other Better?

Journal of the Operations Research Society of China >

2020 , Vol. 8 >Issue 2: 341 - 351

DOI: https://doi.org/10.1007/s40305-019-00285-6

Special Issue: Mathematical Optimization: Past, Present and Future

How Can Machine Learning and Optimization Help Each Other Better?

Expand

Key Laboratory of Machine Perception, School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China

Received date: 2019-01-23

Revised date: 2019-06-30

Online published: 2020-07-07

Supported by

Zhou-Chen Lin was supported by the National Natural Science Foundation of China (Nos. 61625301 and 61731018).

Fold

Abstract

Optimization is an indispensable part of machine learning as machine learning needs to solve mathematical models efficiently. On the other hand, machine learning can also provide new momenta and new ideas for optimization. This paper aims at investigating how to make the interactions between optimization and machine learning more effective.

Key words： Optimization; Machine learning; Generalization ability; Data

Cite this article

Zhou-Chen Lin . How Can Machine Learning and Optimization Help Each Other Better?[J]. Journal of the Operations Research Society of China, 2020 , 8(2) : 341 -351 . DOI: 10.1007/s40305-019-00285-6

References

[1] Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55, 78-87(2012)
[2] LeCun, Y., Bengio, Y., Hinton, G.E.: Deep learning. Nature 521, 436-444(2015)
[3] Bennett, K.P., Parrado-Hernández, E.: The interplay of optimization and machine learning research. Journal of Machine Learning Research 7, 1265-1281(2006)
[4] Yang, Y., Sun, J., Li, H., Xu, Z.: Deep ADMM-net for compressive sensing MRI. In: Advances in Neural Information Processing Systems, pp. 10-18(2016)
[5] Li, H., Yang, Y., Lin, Z.: Optimization algorithm inspired deep neural network structure design. In: Asian Conference on Machine Learning, pp. 614-629(2018)
[6] Xie, X., Wu, J., Zhong, Z., Liu, G., Lin, Z.: Differentiable linearized ADMM. In: International Conference on Machine Learning, pp. 6902-6911(2019)
[7] Polyak, B.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4, 1-17(1964)
[8] Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(1/k2). Soviet Mathematics Doklady 27, 372-376(1983)
[9] Wolpert, D.H.: The lack of a prior distinctions between learning algorithms. Neural Comput. 8, 1341-1390(1996)
[10] Wolpert, D.H., Macready, W.G.: No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1, 67-82(1997)
[11] Wang, J.: Geometric Structure of High-Dimensional Data and Dimensionality Reduction. Springer, Berlin (2012)
[12] Liu, G., Lin, Z., Yan, S., Sun, J., Ma, Y.: Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 35, 171-184(2013)
[13] Yin, M., Gao, J., Lin, Z.: Laplacian regularized low-rank representation and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 38, 504-517(2016)
[14] Sun, R., Luo, Z.-Q.: Guaranteed matrix completion via nonconvex factorization. In: The IEEE Symposium on Foundations of Computer Science, pp. 270-289(2015)
[15] Ge, R., Jin, C., Zheng, Y.: No spurious local minima in nonconvex low rank problems: a unified geometric analysis. In: International Conference on Machine Learning, pp. 1233-1242(2017)
[16] Li, H., Lin, Z.: Provable accelerated gradient method for nonconvex low rank optimization. Mach. Learn. 103, 1-30(2019)
[17] Ge, R., Ma, T.: On the optimization landscape of tensor decompositions. In: Advances in Neural Information Processing Systems, pp. 3656-3666(2017)
[18] Jin, C., Ge, R., Netrapalli, P., Kakade, S.M., Jordan, M.I.: How to escape saddle points efficiently. In: International Conference on Machine Learning, pp. 1724-1732(2017)
[19] Li, H., Lin, Z.: Accelerated proximal gradient methods for nonconvex programming, In: Advances in Neural Information Processing Systems, pp. 379-387(2015)
[20] Hong, M., Luo, Z., Razaviyayn, M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optim. 26, 337-364(2016)
[21] Wang, Y., Yin, W., Zeng, J.: Global convergence of ADMM in nonconvex nonsmooth optimization. J. Sci. Comput. 78, 29-63(2019)
[22] Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Lojasiewicz inequality. Math. Oper. Res. 35, 438-457(2010)
[23] Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 360, 223-311(2018)
[24] Fang, C., Cheng, F., Lin, Z.: Faster and non-ergodic O(1/K) stochastic alternating direction method of multipliers In: Advances in Neural Information Processing Systems, pp. 4479-4488(2017)
[25] Lan, G., Lee, S., Zhou, Y.: Communication-efficient algorithms for decentralized and stochastic optimization. Math. Program. 158, 1-48(2018)
[26] Fang, C., Lin, Z.: Parallel asynchronous stochastic variance reduction for nonconvex optimization. In: AAAI Conference on Artificial Intelligence, pp. 794-800(2017)
[27] Su,L.,Vaidya,N.H.:Fault-tolerantmulti-agentoptimization:Optimaliterativedistributedalgorithms. In: Proceedings of the 2016 ACM Symposium on Principles of Distributed Computing, pp. 425-434(2016)
[28] Hutter, F., Caruana, R., Bardenet, R., Bilenko, M., Guyon, I., Kegl, B., Larochelle, H.: AutoML 2014@ ICML. In: AutoML 2014 Workshop @ ICML (2014)
[29] Elsken, T., Metzen, J.H., Hutter, F.: Neural architecture search: a survey. J. Mach. Learn. Res. 20, 1-21(2019)
[30] Liu, H., Simonyan, K., Yang, Y.: DARTS: Differentiable architecture search, In: International Conference on Learning Representations (2019) arXiv:1806.09055v2
[31] Lin, Z., Liu, R., Su, Z.: Linearized alternating direction method with adaptive penalty for low-rank representation. In: Advances in Neural Information Processing Systems, pp. 612-620(2011)
[32] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. News. Physiol. Sci. 1(3), 315-323(2013)
[33] Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121-2159(2011)
[34] Rosenfeld, N., Balkanski, E., Globerson, A., Singer, Y.: Learning to optimize combinatorial functions. In: International Conference on Machine Learning, pp. 4371-4380(2018)
[35] Cassioli, A., Lorenzo, D.D., Locatelli, M., Schoen, F., Sciandrone, M.: Machine learning for global optimization. Computational Optimization and Applications 51, 279-303(2012)
[36] Li, K., Malik, J.: Learning to optimize. In: International Conference on Learning Representation (2017)
[37] Andrychowicz, M., Denil, M., Colmenarejo, S.G., Hoffman, M.W., Pfau, D., Schaul, T., de Freitas, N.: Learning to learn by gradient descent by gradient descent. In: Advances in Neural Information Processing Systems, pp. 3981-3989(2016)
[38] Liu, R., Cheng, S., He, Y., Fan, X., Lin, Z., Luo, Z.: On the convergence of learning-based iterative methods for nonconvex inverse problems. IEEE Trans. Pattern Anal. Mach. Intell (2018) arXiv:1808.05331
[39] Reed, S., de Freitas, N.: Neural programmer-interpreters. In: International Conference on Learning Representation (2016) arXiv:1511.06279
[40] Cai, J., Shin, R., Song, D.: Making neural programming architectures generalize via recursion. In: International Conference on Learning Representations (2017) arXiv:1704.06611
[41] Li, C., Tarlow, D., Gaunt, A.L., Brockschmidt, M., Kushman, N.: Neural program lattices. In: International Conference on Learning Representation, Paris, 4-6 June (2017)

Options

Abstract

Outlines

/

〈

〉