Evaluating KGR10 Polish Word Embeddings in the Recognition of Temporal Expressions Using BiLSTM-CRF

Jan Kocoń,

Michał Gawor

Abstrakt

The article introduces a new set of Polish word embeddings, built using KGR10 corpus, which contains more than 4 billion words. These embeddings are evaluated in the problem of recognition of temporal expressions (timexes) for the Polish language. We described the process of KGR10 corpus creation and a new approach to the recognition problem using Bidirectional Long-Short Term Memory (BiLSTM) network with additional CRF layer, where specific embeddings are essential. We presented experiments and conclusions drawn from them.

Słowa kluczowe: word embeddings, temporal expressions, recognition, TimeML, CRF, LSTM, BiLSTM, KGR10, FastText

Czasopismo ukazuje się w sposób ciągły on-line.
Pierwotną formą czasopisma jest wersja elektroniczna.

Wersja papierowa czasopisma dostępna na www.wuj.pl