Extreme classification under limited space and time budget

Kalina Jasińska,

Krzysztof Dembczyński,

Nikos Karampatziakis

Abstrakt

We discuss a new framework for solving extreme classification (i.e., learning problems with an extremely large label space), in which we reduce the original problem to a structured prediction problem. Thanks to this we can obtain learning algorithms that work under a strict time and space budget. We mainly focus on a recently introduced algorithm, referred to as LTLS, which is to our best knowledge the first truly logarithmic time and space (in the number of labels) method for extreme classification. We compare this algorithm with two other approaches that also rely on transformation to structured prediction problems. The first algorithm encodes original labels as binary sequences. The second algorithm follows the label tree approach. The comparison shows the trade-off between computational complexity (in time and space) and predictive performance.

Słowa kluczowe: supervised learning, space and time complexity of learning algorithms, extreme classification, multi-class classification, learning reductions
References

[1] Viterbi A.J., Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 1967.

[2] Seshadri N., Sundberg C.E.W., List viterbi decoding algorithms with applications.IEEE Transactions on Communications, 1994.

[3] Doppa J., Fern A., Tadepalli P., Hc-search: Learning heuristics and cost functions for structured prediction. In: Journal of Artificial Intelligence Research (JAIR),

2013.

[4] Jasinska K., Karampatziakis N., Log-time and log-space extreme classication. In: Workshop on Extreme Classification at Neural Information Processing Systems

(NIPS), 2016.

21

[5] Dembczynski K., Kotłowski W., Waegeman W., Busa-Fekete R., Hüllermeier E.,Consistency of probabilistic classifier trees. In: European Conference on Machine

Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD). Springer-Verlag 2016.

[6] Jasinska K., Dembczynski K., Busa-Fekete R., Pfannschmidt K., Klerx T., Hüllermeier

E., Extreme F-measure maximization using sparse probability estimates. In: International Confernece on Machine Learning (ICML), 2016.

[7] Yen I.E.H., Huang X., Ravikumar P., Zhong K., Dhillon I., Pd-sparse: A primal and dual sparse approach to extreme multiclass and multilabel classification. In:

International Conference on Machine Learning (ICML), 2016.

[8] Bhatia K., Jain H., Kar P., Varma M., Jain P., Sparse local embeddings for extreme multi-label classification. In: Neural Information Processing Systems

(NIPS), 2015.

[9] Yu H., Jain P., Kar P., Dhillon I.S., Large-scale multi-label learning with missing labels. In: International Conference on Machine Learning (ICML), 2014.

[10] Weston J., Bengio S., Usunier N., Wsabie: Scaling up to large vocabulary image annotation. In: International Joint Conference on Artificial Intelligence (IJCAI),

2011.

[11] Mineiro P., Karampatziakis N., Fast label embeddings via randomized linear algebra. In: European Conference on Machine Learning and Principles and

Practice of Knowledge Discovery in Databases (ECML/PKDD), 2015.

[12] Prabhu Y., Varma M., FastXML: A fast, accurate and stable tree-classifier for extreme multi-label learning. In: Knowledge Discovery and Data Mining (KDD),

2014.

[13] Choromanska A., Langford J., Logarithmic time online multiclass prediction. In: Neural Information Processing Systems (NIPS), 2015.

[14] Niculescu-Mizil A., Abbasnejad E., Label filters for large scale multilabel classification. In: Workshop on Extreme Classification at the International Confernece

on Machine Learning (ICML), 2015.

[15] Weston J., Makadia A., Yee H., Label partitioning for sublinear ranking. In: International Conference on Machine Learning (ICML), 2013.

[16] Cissé M., Usunier N., Artières T., Gallinari P., Robust bloom filters for large multilabel classification tasks. In: Neural Information Processing Systems (NIPS),

2013.

[17] Jasinska,K., Dembczynski K., Consistent label tree classifiers for extreme multilabel classification. In: Workshop on Extreme Classification at the International

Confernece on Machine Learning (ICML), 2015.

[18] Vijayanarasimhan S., Shlens J., Monga R., Yagnik J., Deep networks with largeb output spaces. In: Workshop contribution at International Conference on Learning

Representation (ICLR), 2014.

22

[19] Shrivastava A., Li P., Improved asymmetric locality sensitive hashing (ALSH) for maximum inner product search (mips). In: Uncertainty in Artificial Intelligence

(UAI), 2015.

[20] Fagin R., Lotem A., Naor M., Optimal aggregation algorithms for middleware. In: Principles of Database Systems (PODS). ACM 2001.

[21] Stock M., Pahikkala T., Airola A., De Baets B., Waegeman, W., Efficient pairwise learning using kernel ridge regression: an exact two-step method. Computing

Research Repository (CoRR), 2016.

[22] Indyk P., Motwani R., Approximate nearest neighbors: Towards removing the curse of dimensionality. In: ACM Symposium on Theory of Computing, 1998.

[23] Beygelzimer A., Langford J., Zadrozny B., Machine learning techniques-reductions between prediction quality metrics. In: Performance Modeling and Engineering.

Springer-Verlag 2008.

[24] Beygelzimer A., Daumé H., Langford J., Mineiro P., Learning reductions that really work. In: Proceedings of the IEEE, 2016.

[25] Dietterich T., Bakiri G., Solving multiclass learning problems via error-correcting output codes. Journal of Machine Learning Research (JMLR), 1996.

[26] Huffman D., A method for the construction of minimum-redundancy codes. Proceedings of the Institute of Radio Engineers (IRE), 1952.

[27] Weinberger K., Dasgupta A., Langford J., Smola A., Attenberg J., Feature hashing for large scale multitask learning. In: International Conference on Machine

Learning (ICML), ACM, 2009.

[28] Dembczynski K., Waegeman W., Cheng W., Hüllermeier E., An analysis of chaining in multi-label classification. In: European Conference on Artificial

Intelligence (ECAI), 2012.

[29] Mena D., Montanes E., Quevedo J.R., del Coz J.J., Using a* for inference in probabilistic classifier chains. In: International Joint Conference on Artificial

Intelligence (IJCAI), 2015.

[30] Beygelzimer A., Langford J., Lifshits Y., Sorkin G.B., Strehl A.L., Conditional probability tree estimation analysis and algorithms. In: Uncertainty in Artificial

Intelligence (UAI), 2009.

[31] Fox J., Applied regression analysis, linear models, and related methods. Sage, 1997.

[32] Morin F., Bengio Y., Hierarchical probabilistic neural network language model. In: Artificial Intelligence and Statistics Conference (AISTATS), 2005, pp. 246–252.

[33] Lafferty J.D., McCallum A., Pereira F.C.N., Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning (ICML), 2001.

23

[34] Tsochantaridis Y., Joachims T., Hofmann T., Altun Y., Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research (JMLR), 2005.

[35] Langford J., Strehl A., Li L., Vowpal wabbit, 2007 http://mloss.org/software/ view/53/.

Czasopismo ukazuje się w sposób ciągły on-line.
Pierwotną formą czasopisma jest wersja elektroniczna.

Wersja papierowa czasopisma dostępna na www.wuj.pl