Course: Natural language processing

« Back
Course title Natural language processing
Course code KOL/ZPJ
Organizational form of instruction Lecture + Seminary
Level of course Bachelor
Year of study 2
Semester Winter and summer
Number of ECTS credits 5
Language of instruction Czech
Status of course Compulsory-optional
Form of instruction Face-to-face
Work placements This is not an internship
Recommended optional programme components None
Lecturer(s)
  • Matlach Vladimír, Mgr. Ph.D.
Course content
1) Pre-processing of unstructured data 2) Processing of structured large data (XML, JSON) from kB to TB. 3) NLP frameworks: Spacy, Udpipe, FLAIR, SPARK and NLTK and basic NLP tasks: - Sentence processing, - Analyzing actor relationships based on dependency rules, - sentiment determination, - extracting named entities. 4) Modeling and vectorization of text using Bag-of-Words: - Advantages, disadvantages, classical treatments, - reduction using TF-IDF, SVD, PCA, - methods of implementation, - text similarity computation. 5) Semantics - Deriving latent semantics based on PCA, SVD, MDS decompositions, - Word2Vec, FastText, GloVe semantic embeddings and their applications, - use in text analysis. 6) Quantification of text features - Identification of thematic words, keywords, - identification of topics using LDA, - Implementation of automatic creation of a translation dictionary for a given language using parallel corpora, - implementation of synonymy detection. 7) OCR - using Tesseract, PyTesseract, EasyOCR and other tools, - OCR implementation including preprocessing and postprocessing with language models. 8) Speech-to-Text, Text-to-Speech - Currently available technologies and models Whisper, Seamless and others, - Implementation of simple tasks. 9) Large Language Models (LLM) - LLM, Generative Pretrained Transformers (GPT), - zero-shot, few-shot, RLHF, finetuning of models, data ingestion, - BERT, LLAMA, Mistral, etc, - custom chatbot implementation. Translated with DeepL.com (free version)

Learning activities and teaching methods
unspecified
Learning outcomes
In this course, students will learn the skills and resources for natural language processing. They will learn how to process text in various forms from plain text, preprocessing it and extracting it from formats such as XML and JSON, they will learn how to use Spacy, Udpipe, Spark, NLTK and other tools for a range of real-world tasks. They will also learn to use key concepts and common methods used in language corpora that form the basis for big data research. In addition, they will learn some key concepts in linguistics, especially morphology, syntax and semantics, which are useful in NLP. The emphasis is on the practicality of the knowledge gained.
1) Increase programming skills. 2) Acquire an understanding of typical tasks in practice and industry. 3) Acquiring tasks for research in linguistics.
Prerequisites
1) Finished atleast 2nd semestr of programming in Python.

Assessment methods and criteria
unspecified
1) Completing tasks. 2) Active participation.
Recommended literature


Study plans that include the course
Faculty Study plan (Version) Category of Branch/Specialization Recommended year of study Recommended semester
Faculty: Faculty of Arts Study plan (Version): Lingvistics and Digital Humanities (2020) Category: Philological sciences 2 Recommended year of study:2, Recommended semester: Winter
Faculty: Faculty of Arts Study plan (Version): Lingvistics and Digital Humanities (2020) Category: Philological sciences 2 Recommended year of study:2, Recommended semester: Winter
Faculty: Faculty of Arts Study plan (Version): Lingvistics and Digital Humanities (2020) Category: Philological sciences 2 Recommended year of study:2, Recommended semester: Winter
Faculty: Faculty of Arts Study plan (Version): Lingvistics and Digital Humanities (2020) Category: Philological sciences 2 Recommended year of study:2, Recommended semester: Winter
Faculty: Faculty of Arts Study plan (Version): Lingvistics and Digital Humanities (2020) Category: Philological sciences 2 Recommended year of study:2, Recommended semester: Winter