Semester | Semester 1 |
---|---|
Type | Optional |
Nature | Choice |
Credit hour | 6 |
---|---|
Total number of hours | 60 |
Number of hours requiring attendance | 30 |
Prerequisites
Skills to be acquired or developed:
At the end of the course, students will be able to:
- Create a corpus and define metadata related to a given working hypothesis
- Use appropriate statistical methods to fit any analytical needs
- Programming a deep learning network for text classification
- Extracting linguistic information from the hidden layers of a deep learning network
Goals
- Statistical analysis based on historical methods and baseline calculations.
- Deep learning, which proposes methods for classifying texts and identifying linguistic markers and patterns.
Content
- Course outline
-
Lesson 1: Textual data analysis
Introductory course on statistical analysis of textual data.
Tutorials: Use of the Hyperbase Web platform with illustrative examples and exercises.
Lesson 2: Preprocessing the Text
Summaries of the different data formats and metadata encoding methods. Introduction to data labeling, tokenization, and standard text preprocessing methods.
Tutorials: Accessing text from the web, preprocessing text, create corpus and create a data base in Hyperbase Web.
Lesson 3: z-score and co-occurrence analysis
Course on z-score applied on textual data analysis. Calculation of the word distributions and co-occurrence based word vectors.
Tutorials: Practical studies based on the corpus of each student.
Lesson 4: Multivariate statistics and clustering
Course on correspondence analysis and hierarchical classification. Supervised and unsupervised approach for text classification.
Tutorials: Practical studies based on the corpus of each student.
Lesson 5: Deep learning for NLP
Introductory course on deep learning for NLP. Challenges, limits and expected added value.
Tutorials: Use of the Hyperbase Web platform with illustrative examples and exercises.
Lesson 6: Learning word embeddings
Course on word embedding from Count Vectors to Word2Vec. Study of the different types of word embedding.
Tutorials: Word embedding implementation in python (Count Vectors, TF-IDF, CBOW, SkipGram…). Comparison between statistics and deep learning.
Lesson 7: Convolutional neural network (CNN)
Course on convolutional models for text classification. Study of CNN hidden layers for linguistic feature extractions.
Tutorials: Programming CNN models for text classification. Analyze of the hidden convolutional layers for linguistic markers extraction.
Lesson 8: Recurrent neural network (RNN)
Course on recurrent models for text classification. Study of attention layers for linguistic feature extractions.
Tutorials: Programming RNN models for text classification. Analyze of the attention layer for linguistic markers extraction.
Lesson 9: Go further with deep learning for text analysis
Overview of different architectures and tasks applied in computational linguistics. Hybrid network (CNN+RNN), GAN, Text generation and Question Answering.
Tutorials: Programming Hybrid network for text classification. Practical studies based on the corpus of each student.
Lesson 10: Final exam
Practical case study. From a given corpus, the student will answer to a list of question by using the tools and the methods learned during the course and/or available online on Hyperbase web. - Evaluation
-
Tutorials will represent 25% of the overall rating. Based on the student's participation and their answers to the exercises. Final exam represents 75%.
- References
-
Python and NLP: https://towardsdatascience.com/introduction-to-natural-language-processing-for-text-df845750fb63
Skansi, Sandro (2018), Introduction to Deep Learning - From Logical Calculus to Artificial Intelligence. Springer 2018
L. Vanni, M. Corneli, D. Longree, D. Mayaffre, F. Precioso (2020) - "Key passages: from statistics to deep learning" In Text Analytics, Advances and Challenges - D. F. Iezzi, D. Mayaffre, M. Misuraca (Eds) - Springer 2020
Goyal P, Pandey S, Jain K (2018) Deep learning for natural language processing. Apress, Berkeley
L. Vanni, M. Ducoffe, D. Mayaffre, F. Precioso D. Longrée, et al. (2018) - "Text Deconvolution Saliency (TDS): a deep toolbox for linguistic analysis" In Proceedings Of The 56th Annual Meeting of the Association for Computational Linguistics - ACL 2018 [hal-01804310]
Hyperbase Web: http://hyperbase.unice.fr