Native vs. non-native English: data-driven lexical analysis

Ewa Witalisz, Justyna Leśniewska


This article presents a preliminary, data-driven study of a corpus of texts written by advanced Polish learners of English, which were analysed with reference to a baseline corpus of native-speaker texts. The texts included in both corpora were produced in similar circumstances (classroom setting), with the same time and word limit, and in response to the same task. We conducted a comparative lexical analysis of the two corpora, using corpus methodology (word lists, cluster analysis, concordances, keyness) to identify the most significant differences. The most important conclusion from this study is that advanced foreign language use may differ from native-speaker language use in ways which only become visible in larger samples of language, and the differences, if analysed individually, would not be regarded as errors and would go unnoticed. There is some evidence in the study that some of these differences may be attributed to cross-linguistic influence.

Słowa kluczowe

advanced EFL use, corpus analysis of learner language, lexical features of L2 writing

Pełny tekst:



  • There are currently no refbacks.