Lecturer(s)
|
-
Pořízka Petr, PhDr. Ph.D.
|
Course content
|
In addition to the necessary philological knowledge, corpus building involves several stages and technical areas that will be progressively discussed in the course: (1) Format: character coding (ASCII, ANSI, and Unicode) and data formats (structured - XML vs. unstructured, the so-called plain text ".txt"). (2) Annotations (= metadata): external vs. internal: structural-content based and linguistic. (3) Tools: preparation and data processing (integration into the corpus manager); corpus and data mining (query language, annotations). Freely available software tools will be used (freeware, GNU GPL, or Open Source projects).
|
Learning activities and teaching methods
|
Lecture, Dialogic Lecture (Discussion, Dialog, Brainstorming), Work with Text (with Book, Textbook), Demonstration
|
Learning outcomes
|
The course deals with developing of small corpora for linguistic and literary purposes according to the requirements and criteria defined by the developer.
Ability to build a corpus of language data Ability to interpret corpus data On completing the course, students will be able to build and evaluate their own minor corpora of language data for language analysis. The course is dedicated to the issue of building minor corpora for linguistic and literary science purpose according to requirements and criteria defined by its author.
|
Prerequisites
|
unspecified
|
Assessment methods and criteria
|
Analysis of Activities ( Technical works), Seminar Work
(1) Regular class attendance and active participation (includes completion of tasks assigned) (2) Realization of a class project - due to the technical demands of the discipline, it will be based on the results and knowledge of the students acquired during the seminar
|
Recommended literature
|
-
Sketch Engine User Guide.
-
Baker, P. - Hardie, A. - McEnery, T. (2006). A Glossary of Corpus Linguistics. Edinburgh.
-
Čermák - Klímová - Petkevič. Studie z korpusové lingvistiky. Praha 2000..
-
Čermák, F. - Blatná, R. Korpusová lingvistika: Stav a modelové přístupy. Praha 2006..
-
Kosek J. (2000). XML pro každého, podrobný průvodce. Grada Publishing, Praha.
-
Machálek, T. (2018). KonText - rozhraní pro vyhledávání v korpusech. FF UK, Praha. Dostupný z WWW: <http://kontext.korpus.cz/>. Praha.
-
Pořízka, P. (2014). Tvorba korpusů a vytěžování jazykových dat (metody, modely, nástroje). Olomouc.
-
Wynne Martin (ed.). Developing Linguistic Corpora: A Guide to Good Practice.
|