1) Pre-processing of unstructured data 2) Processing of structured large data (XML, JSON) from kB to TB. 3) NLP frameworks: Spacy, Udpipe, FLAIR, SPARK and NLTK and basic NLP tasks: - Sentence processing, - Analyzing actor relationships based on dependency rules, - sentiment determination, - extracting named entities. 4) Modeling and vectorization of text using Bag-of-Words: - Advantages, disadvantages, classical treatments, - reduction using TF-IDF, SVD, PCA, - methods of implementation, - text similarity computation. 5) Semantics - Deriving latent semantics based on PCA, SVD, MDS decompositions, - Word2Vec, FastText, GloVe semantic embeddings and their applications, - use in text analysis. 6) Quantification of text features - Identification of thematic words, keywords, - identification of topics using LDA, - Implementation of automatic creation of a translation dictionary for a given language using parallel corpora, - implementation of synonymy detection. 7) OCR - using Tesseract, PyTesseract, EasyOCR and other tools, - OCR implementation including preprocessing and postprocessing with language models. 8) Speech-to-Text, Text-to-Speech - Currently available technologies and models Whisper, Seamless and others, - Implementation of simple tasks. 9) Large Language Models (LLM) - LLM, Generative Pretrained Transformers (GPT), - zero-shot, few-shot, RLHF, finetuning of models, data ingestion, - BERT, LLAMA, Mistral, etc, - custom chatbot implementation. Translated with DeepL.com (free version)
|
In this course, students will learn the skills and resources for natural language processing. They will learn how to process text in various forms from plain text, preprocessing it and extracting it from formats such as XML and JSON, they will learn how to use Spacy, Udpipe, Spark, NLTK and other tools for a range of real-world tasks. They will also learn to use key concepts and common methods used in language corpora that form the basis for big data research. In addition, they will learn some key concepts in linguistics, especially morphology, syntax and semantics, which are useful in NLP. The emphasis is on the practicality of the knowledge gained.
1) Increase programming skills. 2) Acquire an understanding of typical tasks in practice and industry. 3) Acquiring tasks for research in linguistics.
|