Course: Linguistic data-mining 1

« Back
Course title Linguistic data-mining 1
Course code KOL/91PM1
Organizational form of instruction Seminary
Level of course Doctoral
Year of study not specified
Semester Winter and summer
Number of ECTS credits 15
Language of instruction Czech
Status of course Compulsory-optional
Form of instruction Face-to-face
Work placements This is not an internship
Recommended optional programme components None
Lecturer(s)
  • Matlach Vladimír, Mgr. Ph.D.
  • Andres Jan, prof. RNDr. dr hab. DSc.
Course content
Multivariate analysis - Utilizing multiple quantified properties, pitfalls - Distances and similarities between objects - Visualizing and interpreting multivariate data, relationships between properties - Clustering methods, finding patterns and groups, describing and interpreting data - Application of methods in practice Data acquisition issues - Corpora, online databases, open datasets - Data retrieval from web resources: API access, REST, JSON, XML formats - Web-Scraping Text and multidimensional data - Application of quantitative linguistics to text description, edit distances, latent semantics - Classical methods of text modelling, their pitfalls and solutions - Applications of explicated multidimensional methods from clustering to visualizations - Application of methods in practice to authorship, language, similarity of works, use in sociology, anthropology, etc. Graph theory and social networks - Graph theory and applications to social and other networks, social network analysis (SNA) - Ways of extracting relationships from text: letters, books, manuscripts, ? - Social networks on the internet: discussion forums and others - data and relationship mining - Timeline and evolution of relationships - Gephi and Cytoscape tools - Applications in historiography, sociology, political science Introduction to geoinformation systems - Analysis of data related to areas - Methods of data visualisation

Learning activities and teaching methods
Lecture
Learning outcomes
The aim of the course is to develop the knowledge from the first two courses and to build on the R programming language to solve practical tasks, especially multidimensional data analysis. This course addresses how to compare the similarity of objects described by more than one property, clustering them by similarity, understanding the relationships of individual properties to each other, and their influence on group formation. Further, consideration is given to the meaningful visualization of such data and their interpretation using classical methods up to the state-of-the-art. This knowledge is further extended to graph theory, its visualization, applications to social networks and their extraction from various sources. This course provides deeper practical and theoretical skills.

Prerequisites
The lecture is just for PhD students.

Assessment methods and criteria
Oral exam

Completion of own and previously consulted project
Recommended literature
  • Hajičová, Panevová, Sgall. (2003). Úvod do teoretické a počítačové lingvistiky. Praha.
  • Sells, P. (1985). Lectures on Contemporary Syntactic Theories. Stanford.
  • Stockwell, R. M. (1977). Fundations of Syntactic Theory. New Persey.


Study plans that include the course
Faculty Study plan (Version) Category of Branch/Specialization Recommended year of study Recommended semester
Faculty: Faculty of Arts Study plan (Version): Linguistics and Digital Humanities (2020) Category: Philological sciences - Recommended year of study:-, Recommended semester: -
Faculty: Faculty of Arts Study plan (Version): Linguistics and Digital Humanities (2020) Category: Philological sciences - Recommended year of study:-, Recommended semester: -