Binary Random Projections with Controllable Sparsity Patterns

Expand
  • 1. The Chinese University of Hong Kong, Shenzhen 518172, Guangdong, China;
    2. Shenzhen Research Institute of Big Data, Shenzhen 518052, Guangdong, China;
    3. University of Minnesota, Minneapolis, MN 55455, USA

Received date: 2021-07-12

  Revised date: 2021-10-15

  Online published: 2022-09-06

Supported by

Wen-Ye Li's work was partially supported by Guangdong Fundamental Research Fund (No.2021A1515011825) and Shenzhen Fundamental Research Fund (No.KQJSCX20170728162302784).

Abstract

Random projection is often used to project higher-dimensional vectors onto a lowerdimensional space,while approximately preserving their pairwise distances.It has emerged as a powerful tool in various data processing tasks and has attracted considerable research interest.Partly motivated by the recent discoveries in neuroscience,in this paper we study the problem of random projection using binary matrices with controllable sparsity patterns.Specifically,we proposed two sparse binary projection models that work on general data vectors.Compared with the conventional random projection models with dense projection matrices,our proposed models enjoy significant computational advantages due to their sparsity structure,as well as improved accuracies in empirical evaluations.

Cite this article

Wen-Ye Li, Shu-Zhong Zhang . Binary Random Projections with Controllable Sparsity Patterns[J]. Journal of the Operations Research Society of China, 2022 , 10(3) : 507 -528 . DOI: 10.1007/s40305-021-00387-0

References

[1] Johnson, W., Lindenstrauss, J.:Extensions of Lipschitz mappings into a Hilbert space. Contemp. Math. 26(189-206), 1(1984)
[2] Vempala, S.:The Random Projection Method, vol. 65. American Mathematical Soc (2005)
[3] Kanerva, P., Kristoferson, J., Holst, A.:Random indexing of text samples for latent semantic analysis. In:Proceedings of the Annual Meeting of the Cognitive Science Society, vol. 22(2000)
[4] Bingham, E., Mannila, H.:Random projection in dimensionality reduction:applications to image and text data. In:Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 245-250(2001)
[5] Manning, C., Raghavan, P., Schütze, H.:Introduction to Information Retrieval. Cambridge University Press (2008)
[6] Leskovec, J., Rajaraman,A., Ullman, J.:Mining of Massive Data Sets. Cambridge University Press (2020)
[7] Achlioptas, D.:Database-friendly random projections:Johnson-Lindenstrauss with binary coins. J. Comput. Syst. Sci. 66(4), 671-687(2003)
[8] Dasgupta, A., Kumar, R., Sarlós, T.:A sparse Johnson-Lindenstrauss transform. In:Proceedings of the 42nd ACM Symposium on Theory of Computing, pp. 341-350(2010)
[9] Kane, D., Nelson, J.:Sparser Johnson-Lindenstrauss transforms. J. ACM 61(1), 1-23(2014)
[10] Dasgupta, S., Stevens, C., Navlakha, S.:A neural algorithm for a fundamental computing problem. Science 358(6364), 793-796(2017)
[11] Lin, A., Bygrave, A., DeCalignon, A., Lee, T., Miesenböck, G.:Sparse, decorrelated odor coding in the mushroom body enhances learned odor discrimination. Nat. Neurosci. 17(4), 559(2014)
[12] Zheng, Z., Lauritzen, S., Perlman, E., Robinson, C., et al.:A complete electron microscopy volume of the brain of adult drosophila melanogaster. Cell 174(3), 730-743(2018)
[13] Allen-Zhu, Z., Gelashvili, R., Micali, S., Shavit, N.:Sparse sign-consistent Johnson-Lindenstrauss matrices:compression with neuroscience-based constraints. Proc. Natl. Acad. Sci. 111(47), 16872-16876(2014)
[14] Larsen, K., Nelson, J.:Optimality of the Johnson-Lindenstrauss lemma. In:2017 IEEE 58th Annual Symposium on Foundations of Computer Science, pp. 633-638. IEEE (2017)
[15] Li, P., Hastie, T., Church, K.:Very sparse random projections. In:Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 287-296(2006)
[16] Bourgain, J., Dirksen, S., Nelson, J.:Toward a unified theory of sparse dimensionality reduction in Euclidean space. Geom. Funct. Anal. 25(4), 1009-1088(2015)
[17] Ailon,N.,Chazelle,B.:ThefastJohnson-Lindenstrausstransformandapproximatenearestneighbors. SIAM J. Comput. 39(1), 302-322(2009)
[18] Jagadeesan, M.:Understanding sparse JL for feature hashing. In:Advances in Neural Information Processing Systems, pp. 15177-15187(2019)
[19] Olsen, S., Bhandawat, V., Wilson, R.:Divisive normalization in olfactory population codes. Neuron 66(2), 287-299(2010)
[20] Papadopoulou, M., Cassenaer, S., Nowotny, T., Laurent, G.:Normalization for sparse encoding of odors by a wide-field interneuron. Science 332(6030), 721-725(2011)
[21] Stevens, C.:What the fly's nose tells the fly's brain. Proc. Natl. Acad. Sci. 112(30), 9460-9465(2015)
[22] Li, W.:Modeling winner-take-all competition in sparse binary projections. In:Machine Learning and Knowledge Discovery in Databases, pp. 456-472. Springer, Cham (2021)
[23] Li, W., Mao, J., Zhang, Y., Cui, S.:Fast similarity search via optimal sparse lifting. In:Advances in Neural Information Processing Systems, pp. 176-184(2018)
[24] Ma, C., Gu, C., Li, W., Cui, S.:Large-scale image retrieval with sparse binary projections. In:Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1817-1820(2020)
[25] Bennett, G.:Probability inequalities for the sum of independent random variables. J. Am. Stat. Assoc. 57(297), 33-45(1962)
[26] Boucheron, S., Lugosi, G., Massart, P.:Concentration Inequalities:A Nonasymptotic Theory of Independence. Oxford University Press (2013)
[27] Pennington, J., Socher, R., Manning, C.:GloVe:global vectors for word representation. In:Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1532-1543(2014)
[28] Russakovsky, O., Deng, J., Su, H., et al.:Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211-252(2015)
[29] Lewis, D., Yang, Y., Rose, T., Li, F.:RCV1:a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361-397(2004)
[30] Lehmann, E., Romano, J.:Testing Statistical Hypotheses. Springer (2006)
[31] Andoni, A., Indyk, P.:Near-optimal hashing algorithms for near neighbor problem in high dimension. Commun. ACM 51(1), 117-122(2008)
[32] Rachkovskij, D.:Vector data transformation using random binary matrices. Cybern. Syst. Anal. 50(6), 960-968(2014)
Options
Outlines

/