A note on Levenshtein distance versus human analysis

Kamil Stachowski

Abstrakt

This paper argues that automatic phonetic comparison will only return true results if the languages in question have similar and comparably lenient phonologies. In the situation where their phonologies are incompatible and / or restrictive, linguistic knowledge of both of them is necessary to obtain results matching human perception. Whilst the case is mainly exemplified by Levenshtein distance and Russian loanwords in Dolgan, the conclusion is also applicable to the approach as a whole.

Słowa kluczowe: Levenshtein distance, loanword adaptation, Dolgan, Russian
References

van der Ark R., Mennecier P., Nerbonne J., Manni F. 2007. Preliminary identification of language groups and loan words in Central Asia. – Osenova P. et al. (eds.) Proceedings of the RANLP workshop on computational phonology workshop at the conference Recent Advances in Natural Language Processing. Borovets: 13–20. [www.let.rug.nl/nerbonne/ paper.html, accessed 2010.12.17]. 

Dunning T. 1994. Statistical identification of language. – Technical Report CRL MCCS 94-273. New Mexico State University. [ucrel.lancs.ac.uk/papers, accessed 2010.12.18]. 

Heeringa W., Kleiweg P., Gooskens Ch., Nerbonne J. 2006. Evaluation of string distance algorithms for dialectology. – Nerbonne J., Hinrichs E. (eds.) Linguistic distances workshop at the joint conference of International Committee on Computational Linguistics and the Association for Computational Linguistics. Sydney: 51–62. [www.let.rug.nl/nerbonne/ paper.html, accessed 2010.12.17]. 

Heggarty P. 2006. Interdisciplinary indiscipline? Can phylogenetic methods meaningfully be applied to language data — and to dating language? – Renfrew C., Forster P. (eds.) Phylogenetic methods and the prehistory of languages. Cambridge: 183–94. 

Nerbonne J., Heeringa W. 2009. Measuring dialect differences. – Schmidt J.E., Auer P. (eds.) Language and space: theories and methods [= Handbücher zur Sprach- und Kommunikationswissenschaft 30.1]. Berlin: 550–67. 

Polivanov E.D. 1968. Statьi po obščemu jazykoznaniju. Moskva. 

Sanders N.C., Chin S.B. 2009. Phonological distance measures. – Journal of Quantitative Linguistics 16.1: 96–114. [citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.95.2447, accessed 2010.12.18]. 

Stachowski K. 2010. Quantifying phonetic adaptations of Russian loanwords in Dolgan. – Studia Linguistica Universitatis Iagellonicae Cracoviensis 127: 101–77.

Czasopismo ukazuje się w sposób ciągły on-line.
Pierwotną i jedyną formą czasopisma jest wersja elektroniczna.