On Lexical Bundles in Polish Patient Information Leaflets: A Corpus-Driven Study

Łukasz Grabowski


So far little attention has been paid to the corpus analysis of recurrent phraseologies found in Polish texts, in particular texts representing specialists registers of language use. Also, one may note the lack of corpus linguistic studies of lexical bundles (Biber et al. 1999) found in texts originally written in Polish. Conducted from a register perspective (Biber and Conrad 2009), this descriptive and exploratory study is intended as a first step towards a comprehensive corpus-driven description of the use and functions of the most frequent lexical bundles found in patient information leaflets (PILs), one of the most commonly used text types in the healthcare sector in Poland. The research material includes 100 PILs written originally in Polish, extracted from internet websites of ten pharmaceutical companies operating on the Polish market, compiled in a purpose-designed corpus of circa 197,000 words. Based largely on the methodology proposed by Biber, Conrad and Cortes (2003, 2004), Biber (2006), and Goźdź-Roszkowski (2011), which makes possible an analysis of the use and discourse functions of lexical bundles, the present study is primarily meant to provide methodological guidelines for future research on lexical bundles in Polish texts. This appears to be important since so far lexical bundles have been studied predominantly in texts originally written in English. The results of this preliminary research reveal salient links between the frequent occurrence of lexical bundles on the one hand, and situational and functional characteristics of the text variety under scrutiny on the other.

Słowa kluczowe

corpus linguistics, phraseology, register analysis, corpus-driven approach, lexical bundles, patient information leaflets

Pełny tekst:



Adel Annelie, Erman Britt (2012). Recurrent word combinations in academic writing by native and non-native speakers of English: A lexical LBs approach. English for Specific Purposes 31, 81–92.

Biber Douglas (2006). University Language. A Corpus-Based Study of Spoken and Written Registers. Amsterdam: John Benjamins.

Biber Douglas (2009). A corpus-driven approach to formulaic language in English: multi-word patterns in speech and writing. International Journal of Corpus Linguistics 14, 275–311.

Biber Douglas, Conrad Susan (2009). Register, Genre and Style. Cambridge: Cambridge University Press.

Biber Douglas, Conrad Susan, Cortes Viviana (2003). Lexical LBs in speech and writing: An initial taxonomy. In Corpus Linguistics by the Lune: A Festschrift for Geoffrey Leech. Andrew Wilson, Paul Rayson, Tony McEnery (eds.), 71–92. Frankfurt am Main: Peter Lang.

Biber Douglas, Conrad Susan, Cortes Viviana (2004). “If you look at…”: Lexical bundles in university teaching and textbooks. Applied Linguistics 25, 371–405.

Biber Douglas, Johansson Stig, Leech Geoffrey, Conrad Susan, Finegan Edward (1999). The Longman Grammar of Spoken and Written English. London: Longman.

Burger Harald (ed.) (2007). Phraseologie: ein internationales Handbuch zeitgenössischer Forschung, Vol. 2. Berlin: Walter de Gruyter.

Cacchiani Silvia (2006). Dis/similiarities between Patient Information Leaflets in Britain and Italy: Implications for the Translator. New Voices in Translation Studies 2, 28–43.

Chen Yu-Hua, Baker Paul (2010). Lexical bundles in L1 and L2 academic writing. Language Learning and Technology 14(2), 30–49.

Cheng Winnie, Leung Maggie (2012). Exploring phraseological variations by concgramming: The realization of complete patterns of variations. Linguistic Research, 29(3), 617–638.

Cheng Winnie, Greaves Chris, Warren Martin (2006). From n-gram to skipgram to concgrams. International Journal of Corpus Linguistics 11, 411–433.

Chlebda Wojciech (2003). Elementy frazematyki: wprowadzenie do frazeologii nadawcy. Łask: Leksem.

Chlebda Wojciech (2009). Idiomatykon 4: gdzie jesteśmy, dokąd zmierzamy (i parę zdań o tym, skąd przychodzimy). In Podręczny idiomatykon polsko-rosyjski 4. Wojciech Chlebda (ed.), 9–38. Opole: Wydawnictwo Uniwersytetu Opolskiego.

Chlebda Wojciech (2010). Nieautomatyczne drogi dochodzenia do reproduktów wielowyrazowych”. In Na tropach reproduktów: w poszukiwaniu wielowyrazowych jednostek języka. Wojciech Chlebda (ed.), 15–35. Opole: Wydawnictwo Uniwersytetu Opolskiego.

Clerehan Rosemary, Hirsch Di, Buchbinder Rachelle (2009). Medication information leaflets for patients: the further validation of an analytic linguistic framework. Communication & Medicine 6(2), 117–128.

Fletcher William (2007). KfNgram. Annapolis: USNA. Retrieved from: http://www.kwicfinder.com/kfNgram/kfNgramHelp.html.

Forchini Pierfranca, Murphy Amanda (2008). N-grams in comparable specialized corpora. Perspectives on phraseology, translation and pedagogy. International Journal of Corpus Linguistics, 13(3), 351–367.

Fuster-Marquez Miguel (2014). Lexical bundles and phrase frames in the language of hotel websites. English Text Construction, 7(1), 84–121.

Goźdź-Roszkowski Stanisław (2011). Patterns of Linguistic Variation in American Legal English. A Corpus-Based Study. Frankfurt: Peter Lang.

Grabowski Łukasz (2013). Register variation across English pharmaceutical texts:

a corpus-driven study of keywords, lexical bundles and phrase frames in patient information leaflets and summaries of product characteristics. Procedia – Social and Behavioral Sciences 95C, 391–401.

Granger Sylviane, Meunier Fanny (2008). Introduction: The many faces of phraseology. In Phraseology: An Interdisciplinary Perspective. Sylviane Granger, Fanny Meunier (eds.), xix–xxx. Amsterdam: John Benjamins.

Gray Betany, Biber Douglas (2013). Lexical frames in academic prose and conversation. International Journal of Corpus Linguistics, 18(1), 109–135.

Greaves Chris (2009). ConcGram 1.0: A Phraseological Search Engine. Amsterdam: John Benjamins.

Holtz Monica (2011). Lexico-grammatical properties of abstracts and research articles. A corpus-based study of scientific discourse from multiple disciplines. Unpublished PhD dissertation. Technische Universitaet Darmstadt. [URL: http://tuprints.ulb.tu-darmstadt.de/2638/1/PhD-Thesis-Monica-Holtz.pdf, accessed: October 23, 2013].

Hyland Ken (2008). As can be seen: Lexical bundles and disciplinary variation. English for Specific Purposes 27, 4–21.

Jablonkai Reka (2010). English in the context of European integration: A corpus-driven analysis of lexical bundles in English EU documents. English for Specific Purposes 29, 253–267.

Kilgarriff Adam (2005). Language is never ever ever random. Corpus Linguistics and Linguistic Theory 1(2), 263–276.

Kopaczyk Joanna (2012). Long lexical bundles and standardisation in historical legal texts. Studia Anglica Posnaniensia 47(2−3), 3–25.

Montalt Resurrecio Vicent, Gonzales Davies Maria (2007). Medical Translation Step by Step. Translation Practices explained. Manchester: St. Jerome.

Moon Rosamund (2007). Corpus linguistic aspects of phraseology. In Burger (ed.), 1045–1059.

Paiva Daniel (2000). Investigating style in a corpus of pharmaceutical leaflets: results of a factor analysis. In Proceedings of the Student Workshop of the 38th Annual Meeting of the ACL, Hong Kong, China, 1−8 Oct 2000, 52–59. [URL:http://www.itri.brighton.ac.uk/~Daniel.Paiva/acl2000student.finalversion.pdf; accessed: October 08, 2011].

Pęzik Piotr (2013). Paradygmat dystrybucyjny w badaniach frazeologicznych. Powtarzalność, reprodukcja i idiomatyzacja. In Metodologie Językoznawstwa. Ewolucja Języka, Ewolucja Teorii Językoznawczych. Piotr Stalmaszczyk (ed.), 143–160. Łódź: Wydawnictwo Uniwersytetu Łódzkiego.

Piotrowski Tadeusz, Grabowski Łukasz (2013). Interpretacja danych frekwencyjnych z korpusów językowych: opis pewnych problemów (na kilku przykładach z życia wziętych). In Na tropach korpusów. W poszukiwaniu optymalnych zbiorów tekstów. Wojciech Chlebda (ed.), 59–71. Opole: Wydawnictwo Uniwersytetu Opolskiego.

Przepiórkowski Adam, Bańko Mirosław, Górski Rafał, Lewandowska-Tomaszczyk Barbara (eds.) (2012). Narodowy Korpus Języka Polskiego. Warszawa: PWN.

Read John, Nation Paul (2004). Measurement of formulaic sequences. In Formulaic Sequences: Acquisition, Processing and Use. Norbert Schmitt (ed.), 23–35. Amsterdam: John Benjamins.

Salazar Danica (2011). Lexical Bundles in Scientific English: A Corpus-based Study of Native and Non-native Writing. Unpublished PhD dissertation. University of Barcelona. [URL: http://www.tdx.cat/bitstream/handle/10803/52083/DJLS_DISSERTATION.pdf; accessed: March 26, 2013].

Schmitt Norbert, Carter Ronald (2004). Formulaic sequences in action: An introduction. In Formulaic Sequences: Acquisition, Processing and Use. Norbert Schmitt (ed.), 1–22. Amsterdam: John Benjamins.

Scott Mike (2007). WordSmith Tools 4.0. Liverpool: Lexical Analysis Software.

Sinclair John (1991). Corpus, Concordance, Collocation. Oxford: Oxford University Press.

Stubbs Michael, Barth Isabel (2003). Using recurrent phrases as text-type discriminators: a quantitative method and some findings. Functions of Language 10, 65–108.

Wray Alison (2002). Formulaic Language and the Lexicon. Cambridge: Cambridge University Press.

Wray Alison (2009). Identifying formulaic language. Persistent challenges and new opportunities. In Formulaic Language. Vol. 1. Distribution and historical change. Roberta Corrigan, Edith A. Moravcsik, Hamid Ouali, Kathleen Wheatley (eds.), 27–51. Amsterdam: John Benjamins.

Wray Alison, Perkins Michael (2000). The functions of formulaic language: an integrated model. Language and Communication 20, 1–28.


  • There are currently no refbacks.