Teacher | Face to face hours | working hours | ECTS |
---|---|---|---|
Serena Villata (I3S) | 30 | 60 | 6 |
Description
This course aims to provide tools and methodology to do information extraction from textual data using two complementary approaches:
- Statistical analysis based on historical methods and baseline calculations.
- Deep learning, which proposes methods for classifying texts and identifying linguistic markers and patterns.
Statistics will mainly focus on the frequency analysis of words and their distributions in a corpus, through methods such as z-score, correspondence analysis or the calculation of co-occurrences. The deep learning part will focus on two standard text classification architectures: recurrent networks and convolutional networks. The study of the hidden layers of these networks (embedding, attention, TDS) will be considered in order to extract the linguistic information learned by these models and compare it to the information known in statistics.
Learning outcomes
At the end of the course, students will be able to:
- Create a corpus and define metadata related to a given working hypothesis
- Use appropriate statistical methods to fit any analytical needs
- Program a deep learning network for text classification
- Extract linguistic information from the hidden layers of a deep learning network
Requirements
Good notions of Python programming + baseline statistics and machine learning.