Indeksowane metadane
We review three mathematical developments linked with Hilberg’s conjecture – a hypothesis about the power-law growth of entropy of texts in natural language, which sets up a challenge for machine learning. First, considerations concerning maximal repetition indicate that universal codes such as the Lempel-Ziv code may fail to efficiently compress sources that satisfy Hilberg’s conjecture. Second, Hilberg’s conjecture implies the empirically observed power-law growth of vocabulary in texts. Third, Hilberg’s conjecture can be explained by a hypothesis that texts describe consistently an infinite random object.