Mixture of Metrics Optimization for Machine Learning Problems

Magdalena Wiercioch,

Marek Śmieja


The selection of data representation and metric for a given data set is one of the most crucial problems in machine learning since it affects the results of classification and clustering methods. In this paper we investigate how to combine a various data representations and metrics into a single function which better reflects the relationships between data set elements than a single representation-metric pair. Our approach relies on optimizing a linear combination of selected distance measures with use of least square approximation. The application of our method for classification and clustering of chemical compounds seems to increase the accuracy of these methods.

Słowa kluczowe: metric learning, classification, clustering, chemical compound activity, fingerprint
[1] Aczel A., Sounderpandian J., Complete Business Statistics. McGraw Hill, New York 2009.
[2] Atkeson C., Moore A., Schaal S., Locally weighted learning. Artificial Intelligence Review, 1997, 11, pp. 11–73.
[3] Bar-Hillel A., Hertz T., Shental N., Weinshall D., Learning a Mahalanobis metric from equivalence constraints. Journal of Machine Learning Research, 2005, 6, pp. 937–965.
[4] Cover T., Hart P., Nearest Neighbor Pattern Classification. IEEE Transactions on Information Theory, 1967, 13, pp. 21–27.
[5] Cox T.F., Cox M.A.A., Multidimensional Scaling. Chapman and Hall, London 1994.
[6] Deng Z., Chuaqui C., Singh J., Knowledge-based design of target-focused libraries using protein-ligand interaction constraints. Journal of Medicinal Chemistry, 2006, 49(2), pp. 490–500.
[7] Domeniconi C., Gunopulos D., Adaptive nearest neighbor classification using support vector machines. Advances in Neural Information Processing Systems, 2002, 14, pp. 665–672.
[8] Geppert H., Vogt M., Bajorath J., Current Trends in Ligand-Based Virtual Screening: Molecular Representations, Data Mining Methods, New Application Areas, and Performance Evaluation. Journal of Chemical Information and Modeling, 2010, 50, pp. 205–216.
[9] Goldberger J., Roweis S., Hinton G., Salakhutdinov R., Neighbourhood Components Analysis. Advances in Neural Information Processing Systems, 2004, 17, pp. 513–520.
[10] Hastie T., Tibshirani R., Discriminant Adaptive Nearest Neighbor Classification. IEEE Trans. Pattern Anal. Mach. Intell., 1996, 18, pp. 607–616.
[11] Hubert L., Arabie P., Comparing partitions. Journal of Classification, 1985, 2, pp. 193–218.
[12] Jaakkola T.S., Haussler D., Exploiting Generative Models in Discriminative Classifiers. Proceedings of the 1998 Conference on Advances in Neural Information Processing Systems II, 1999, pp. 487–493.
[13] Kedem D., Tyree S., Weinberger K.Q., Sha F., Lanckriet G., Non-linear Metric Learning. Advances in Neural Information Processing Systems, 2012, 25, pp. 2582– 2590. Available via http://books.nips.cc/papers/files/nips25/NIPS2012 1223.pdf.
[14] Klekota J., Roth F.P., Chemical Substructures That Enrich for Biological Activity. Bioinformatics 2008, 21, pp. 2518–2525.
[15] Kohavi R., A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI ’95), 1995, pp. 1137–1143.
[16] Lloyd S., Least Squares Quantization in PCM. IEEE Trans. Inf. Theor., 1982, 28, pp. 129–137.
[17] Roweis S.T., Saul L.K., Nonlinear dimensionality reduction by locally linear embedding. Science, 2000, 290, pp. 2323–2326.
[18] Scholkopf B., Smola A.J., Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, 2001.
[19] Shalev-Shwartz S., Singer Y., Ng A.Y., Online and Batch Learning of Pseudometrics. Proceedings of the Twenty-first International Conference on Machine Learning (ICML ’04), 2004, pp. 743–750.
[20] Shental N., Hertz T., Weinshall D., Pavel M., Adjustment Learning and Relevant Component Analysis. Proceedings of the 7th European Conference on Computer Vision-Part IV (ECCV ’02), 2002, pp. 776–792. [21] ´Smieja M., Warszycki D., Tabor J., Bojarski A.J., Asymmetric Clustering Index in a Case Study of 5-HT1A Receptor Ligands. PloS ONE 9(7): e102069, doi:10.1371/journal.pone.0102069, 2014.
[22] Sneath P.H.A., The Application of Computers to Taxonomy. J. Gen. Microbiol., 1957, 17, pp. 201–226.
[23] Takeda H., Farsiu S. and Milanfar P., Robust kernel regression for restoration and reconstruction of images from sparse noisy data. IEEE International Conference on Image Processing, 2006, pp. 1257–1260.
[24] Xing E.P., Ng A.Y., Jordan M.I., Russell S., Distance Metric Learning, With Application To Clustering With Side-Information,. Advances in Neural Information Processing Systems, 2003, 15, pp. 505–512.
[25] Warszycki D., Mordalski S., Kristiansen K., Kafel R., Sylte I., Chilmonczyk, Z., Bojarski A. J., A Linear Combination of Pharmacophore Hypotheses as a New Tool in Search of New Active Compounds An Application for 5-HT1A Receptor Ligands. PloS ONE 8(12): e84510, doi:10.1371/journal.pone.0084510, 2013.
[26] Weinberger K.Q., Saul L.K., Distance Metric Learning for Large Margin Nearest Neighbor Classification. J. Mach. Learn. Res., 2009, 10, pp. 207–244.
[27] Weinberger K.Q., Saul L.K., Fast solvers and efficient implementations for distance metric learning. ACM International Conference Proceeding Series, 2008, 307, pp. 1160–1167.