Information package & Course catalogue

Palacký University Olomouc

Study programmes & Course catalogue

for academic year 2026/2027
Palacký University Olomouc

Česky

Search

Course: Computer Linguistics and Data Processing

» List of faculties » FIF » KOL

Course title	Computer Linguistics and Data Processing
Course code	KOL/PCLNG
Organizational form of instruction	Seminar
Level of course	Master
Year of study	not specified
Semester	Winter and summer
Number of ECTS credits	4
Language of instruction	Czech
Status of course	unspecified
Form of instruction	Face-to-face
Work placements	This is not an internship
Recommended optional programme components	None

Lecturer(s)
Matlach Vladimír, Mgr. Ph.D.
Course content
Building of corpora involves, apart from the necessary philological knowledge, several stages and areas of technical nature, which will be gradually covered by the course: (1) Format: character coding (ASCII, ANSI and Unicode) and data formats (structured - XML vs. unstructured, the so-called plain text ".txt"). (2) Annotations (= metadata): external vs. internal: structural-content based and linguistic. (3) Tools: preparation and processing (integration into the corpus manager); corpus and data mining (query language, annotations). Freely available software tools will be used (freeware, GNU GPL or Open Source projects). (4) Possibilities of automating of data processing (segmentation: tokenization and the vertical bar; format conversion, etc.). (5) Methodological standpoint: consistent differentiation between data vs. meta-data. (6) Possibilities and types of annotations (technical, structural, linguistic). (7) Specifics of data - its gathering and processing (written vs. spoken form). Practical exercises (1) Building of students' own corpora: preparation of data: coding, "cleaning" of text, conversion ? .txt format (plain text) tokenization and verticalization of text (by use of software applications) linguistic annotation of text ? lemmatisation, tag set creation data structuring ? tagging of text: simple XML format finalization of corpora and its storing in the Bonito corpus manager (2) Work with linguistic data in various corpus applications GPL software (off-line) and web interface (on-line)
Learning activities and teaching methods
Lecture, Dialogic Lecture (Discussion, Dialog, Brainstorming), Work with Text (with Book, Textbook), Demonstration
Learning outcomes
On completing the course, students will be able to build and evaluate their own minor corpora of language data for special purposes. The course is dedicated to the issue of building minor corpora for linguistic and literary science purpose according to requirements and criteria defined by its author. Ability to build a small corpus of language data Ability to interpret corpus data
Prerequisites
unspecified
Assessment methods and criteria
Analysis of Activities ( Technical works), Seminar Work (1) Regular class attendance and active participation (includes completion of tasks assigned) (2) Realization of class project
Recommended literature
Antonín Vitovský. (2006). Moderní slovník softwaru : výkladový anglicko-český a česko-anglický. Praha. Bradley, N. (2000). XML - kompletní průvodce. Praha. Čermák - Klímová - Petkevič. Studie z korpusové lingvistiky. Praha 2000.. Čermák, F. - Blatná, R. Korpusová lingvistika: Stav a modelové přístupy. Praha 2006.. Kosek J. (2000). XML pro každého, podrobný průvodce. Praha. Kosek, J. - Kopřivová, M. Manuál korpusového manažeru Bonito. Dostupné z http://www.korpus.cz/bonito/index.php. Křen, M. Dotazovací jazyk korpusového manažeru Bonito. Dostupné z http://www.korpus.cz/bonito/regular.php.

Study plans that include the course

Faculty	Study plan (Version)	Category of Branch/Specialization	Recommended year of study	Recommended semester

Palacký University Olomouc, date of update: 16.07.2026 23:53. Data created for academic year 2026/2027