Towards Learning Word Representation

Magdalena Wiercioch


Continuous vector representations, as a distributed representations for words have gained a lot of attention in Natural Language Processing (NLP) field. Although they are considered as valuable methods to model both semantic and syntactic features, they still may be improved. For instance, the open issue seems to be to develop different strategies to introduce the knowledge about the morphology of words. It is a core point in case of either dense languages where many rare words appear and texts which have numerous metaphors or similies. In this paper, we extend a recent approach to represent word information. The underlying idea of our technique is to present a word in form of a bag of syllable and letter n-grams. More specifically, we provide a vector representation for each extracted syllable-based and letter-based n-gram, and perform concatenation. Moreover, in contrast to the previous method, we accept n-grams of varied length n. Further various experiments, like tasks-word similarity ranking or sentiment analysis report our method is competitive with respect to other state-of-theart techniques and takes a step toward more informative word representation construction.

Słowa kluczowe: representation learning, n-gram model, NLP

[1] Miller S., Guinness J., Zamanian A., Name tagging with word clusters and discriminativetraining. In: Proceedings of HLT, 2004, pp. 337–342.

[2] Vitz P.C., Winkler B.S., Predicting the judged similarity of sound of englishwords. Journal of Verbal Learning and Verbal Behavior, 1973, 12 (4), pp. 373–388.

[3] Rumelhart D.E., Hinton G.E., Williams R.J., Neurocomputing: Foundations of research. MIT Press 1988 pp. 696–699.

[4] Sch¨utze H., Dimensions of meaning. In: Proceedings of the 1992 ACM/IEEE Conference on Supercomputing. Supercomputing ’92, Los Alamitos, CA, USA,

IEEE Computer Society Press, 1992, pp. 787–796.

[5] Deerwester S., Dumais S.T., Furnas G.W., Landauer T.K., Harshman R., Indexing by latent semantic analysis. Journal of the American Society for Information

Science, 1990, 41 (6), pp. 391–407.

[6] Hofmann T., Probabilistic latent semantic indexing. In: Proceedings of the 22Nd Annual International ACM SIGIR Conference on Research and Development in

Information Retrieval. SIGIR ’99, New York, NY, USA, ACM, 1999, pp. 50–57.[7] Baroni M., Lenci A., Distributional memory: A general framework for corpusbased semantics. December 2010, 36 (4), pp. 673–721.

 [8] Mikolov T., Sutskever I., Chen K., Corrado G.S., Dean J., Distributed representations of words and phrases and their compositionality. In: Burges C.J.C., Bottou L., Welling M., Ghahramani Z., Weinberger K.Q., eds.: Advances in

Neural Information Processing Systems 26. Curran Associates, Inc. 2013 pp. 3111–3119.

[9] Pennington J., Socher R., Manning C.D., Glove: Global vectors for word representation.In: Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1532–1543.

[10] Rehm G., Uszkoreit H., The Romanian Language in the Digital Age. Springer Publishing Company, Incorporated, 2012.

[11] Bilmes J.A., Kirchhoff K., Factored language models and generalized parallel backoff. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology:

Companion Volume of the Proceedings of HLT-NAACL 2003–short Papers – Volume 2. NAACL-Short ’03, Stroudsburg, PA, USA, Association for Computational Linguistics, 2003, pp. 4–6.

[12] Botha J.A., Blunsom P., Compositional Morphology for Word Representations and Language Modelling. In: Proceedings of the 31st International Conference on Machine Learning (ICML), 2014.

[13] Luong M.T., Socher R., Manning C.D., Better word representations with recursive neural networks for morphology. In: CoNLL, Sofia, Bulgaria, 2013.[14] Mikolov T., Sutskever I., Deoras A., Le H.S., Kombrink S., Cernocky J., Subword

language modeling with neural networks. preprint (http://www. fit. pdf), 2012.

[15] Sutskever I., Martens J., Hinton G.E., Generating text with recurrent neural networks.In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), 2011, pp. 1017–1024.

[16] Zhang X., Zhao J., LeCun Y., Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems, 2015, pp. 649–657.

[17] Ling W., Lu´ıs T., Marujo L., Astudillo R.F., Amir S., Dyer C., Black A.W., Trancoso I., Finding function in form: Compositional character models for open vocabulary word representation. arXiv preprint arXiv:1508.02096, 2015.

[18] dos Santos C.N., Gatti M., Deep convolutional neural networks for sentiment analysis of short texts. In: COLING, 2014, pp. 69–78.

[19] Kim Y., Jernite Y., Sontag D., Rush A.M., Character-aware neural languagemodels. arXiv preprint arXiv:1508.06615, 2015.

[20] dos Santos C.N., Zadrozny B., Learning character-level representations for partofspeech tagging. In: ICML, 2014, pp. 1818–1826.

 [21] Chrupa la G., Normalizing tweets with edit scripts and recurrent neural embeddings. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Baltimore, Maryland, June 2014,

pp. 680–686.

[22] Luong M.T., Manning C.D., Achieving open vocabulary neural machine translation with hybrid word-character models. arXiv preprint arXiv:1604.00788, 2016.

[23] Sennrich R., Haddow B., Birch A., Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7–12, 2016, Berlin, Germany,

Volume 1: Long Papers, 2016.

[24] Cotterell R., Sch¨utze H., Morphological word-embeddings. In: Proc. of NAACL, 2015.

[25] Sakamoto N., Yamamoto K., Nakagawa S., Combination of syllable based n-gram search and word search for spoken term detection through spoken queries and iv/oov classification. Dec 2015, pp. 200–206.

[26] Wechsler M., Munteanu E., Sch¨auble P., New techniques for open-vocabulary spoken document retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.SIGIR ’98, New York, NY, USA, ACM, 1998, pp. 20–27.

[27] Mikolov T., Chen K., Corrado G., Dean J., Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.

[28] Bojanowski P., Grave E., Joulin A., Mikolov T., Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606, 2016.

[29] Gutmann M.U., Hyv¨arinen A., Noise-contrastive estimation of unnormalized statistical

models, with applications to natural image statistics. Journal of Machine Learning Research, 2012, 13 (Feb), pp. 307–361.

[30] Crystal D., Dictionary of linguistics and phonetics. vol. 30. John Wiley & Sons,


[31] Mayer T., Toward a totally unsupervised, language-independent method for the syllabification of written texts. In: Proceedings of the 11th Meeting of the ACL Special Interest Group on Computational Morphology and Phonology, Association for Computational Linguistics, 2010, pp. 63–71.

[32] Daelemans W., van den Bosch A., Generalization performance of backpropagation learning on a syllabification task. In: Proceedings of the 3rd Twente

Workshop on Language Technology, Universiteit Twente, Enschede, 1992, pp. 27–38.

[33] Finkelstein L., Gabrilovich E., Matias Y., Rivlin E., Solan Z., Wolfman G., Ruppin E., Placing search in context: The concept revisited. In: Proceedings of the 10th international conference on World Wide Web, ACM, 2001, pp. 406–414.


[34] Gerz D., Vuli´c I., Hill F., Reichart R., Korhonen A., SimVerb-3500: A LargeScale Evaluation Set of Verb Similarity. In: EMNLP, 2016.

[35] Hill F., Reichart R., Korhonen A., Simlex-999: Evaluating semantic models with (genuine) similarity estimation. Computational Linguistics, 2016.

[36] Rubenstein H., Goodenough J.B., Contextual correlates of synonymy. October 1965, 8(10), pp. 627–633.

[37] Szumlanski S.R., Gomez, F., Sims V.K., A new set of norms for semantic relatedness

measures. In: ACL (2), 2013, pp. 890–895.

[38] Yang D., Powers D.M., Verb similarity on the taxonomy of WordNet. Masaryk University, 2006.

[39] Gurevych I., Using the structure of a conceptual network in computing semantic relatedness. In: International Conference on Natural Language Processing,

Springer, 2005, pp. 767–778.

[40] Zesch T., Gurevych I., Automatically creating datasets for measures of semantic relatedness. In: Proceedings of the Workshop on Linguistic Distances. LD ’06, Stroudsburg, PA, USA, Association for Computational Linguistics, 2006, pp. 16–


[41] Hassan S., Mihalcea R., Cross-lingual semantic relatedness using encyclopedic knowledge. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3-Volume 3, Association for Computational

Linguistics, 2009, pp. 1192–1201.

[42] Mikolov T., Karafi´at M., Burget L., Cernock`y J., Khudanpur S., Recurrent neural network based language model. In: Interspeech. vol. 2., 2010, pp. 3.

[43] Mnih A., Teh Y.W., A fast and simple algorithm for training neural probabilistic language models. arXiv preprint arXiv:1206.6426, 2012.

[44] Spearman C., The proof and measurement of association between two things. American Journal of Psychology, 1904, 15, pp. 88–103.

[45] Maaten L.v.d., Hinton G., Visualizing data using t-sne. Journal of Machine Learning Research, 2008, 9 (Nov), pp. 2579–2605.

Czasopismo ukazuje się w sposób ciągły on-line.
Pierwotną formą czasopisma jest wersja elektroniczna.

Wersja papierowa czasopisma dostępna na