First semester
- REFRESHER
-
Basic Probability
Christine MALOT (UniCA)
Basic Algebra for Data Analysis
Mathieu CARRIERE (INRIA)
Basic Algorithmics
Michel RIVEILL (UniCA, INRIA, I3S)
Methods and tools for technical and scientific writing
Aline MENIN (UniCA)1 ECTS
1 ECTS
1 ECTS
1 ECTS
STATISTICS
- Statistical Inference Theory
-
Vincent Vandewalle (INRIA, UniCA, 3IA Chair Holder)6 ECTS - 60h course
MACHINE LEARNING
- Introduction to Machine Learning
-
Michel RIVEILL (UniCA, INRIA, I3S)
Diane LINGRAND (UniCA, I3S, CNRS)3 ECTS - 30h courseThis course is an introduction to machine learning. It aims at introducing the field, defining the vocabulary. At the end of the course, students will be able to perform simple pre-processing on different types of data and solve supervised or unsupervised tasks using several models.
- Introduction to Deep Learning
-
Michel RIVEILL (UniCA, INRIA, I3S)
Diane LINGRAND (UniCA, I3S, CNRS)3 ETC - 30h courseIn this course, student will build and train neural network architectures such as convolutional neural networks or recurrent neural networks, and most importantly, student will learn how to improve them with strategies such as Dropout, BatchNorm, different initialization strategies. Theoretical concepts and their industrial applications using Python and TensorFlow will be implemented on object recognition or natural language processing problems.
- Ethical Aspects of Data
-
Frédéric PRECIOSO (I3S, INRIA, UniCA)3 ECTS - 30h course
This course introduces the ethical aspects of artificial intelligence (AI), addressing the concerns raised by the increased use of AI to make decisions that have important consequences on people’s lives. In particular, the course focuses on fundamental concepts and methods of interpretability and transparency in Machine Learning (ML) with a particular focus on fairness.
PROGRAMMING
- R Programming for Data Science
-
3 ECTS - 30h course
At the end of the course, students will be able to explore a dataset, to handle missing data, to clean and standardize data, to calculate basic statistics, to subset, replace and any other kind of processing operation. They will be able to create markdown report and develop their own application in shiny.
Finally, they’ll learn how to apply very basic ML algorithms (logistic regression, decision tree, random forest, svm, pca, clustering) in R and how to prepare a dataset for modeling (preprocessing, feature engineering, train/test dataset) and test its performances (accuracy, ROC curves, ….). - Python Programming for Data Science
-
Marco MILANESIO (UniCA)3 ECTS - 30h course
In this course we will provide an extensive overview on various aspect of data manipulation and analysis with the help of the Python language. In the first part of the course we will start with an introduction of the Python programming language, with a particular stress on what can be achieved in terms of data analysis without using any external framework, in order to provide the basics for more advanced programming techniques. Then, we will focus on different Python frameworks (notably, numpy and pandas) to tackle larger datasets: from data cleaning (outliers detection, duplicates, and so on), missing value management (interpolation, substitution, removal) and basic data analysis (statistical and quantitative).
- Distributed Big Data Systems
-
Luc HOGIE (I3S, CNRS, UniCA, INRIA)3 ECTS - 30 h course
This course introduces concepts and techniques involved into the design and implementation of distributed systems, with an emphasis on the (distributed) processing of large datasets.
It is organized as mixed course/lab sessions, where theoretical aspects and challenges are illustrated by real world implementations whenever possible. More precisely, across the sessions, each student will develop a particular component for a common distributed application. By doing this, students will be led to face classic issues of distributed computing, and to propose adequate solutions.
WORKSHOP AND VULGARIZATION
- Workshop and vulgarization
-
Michel RIVEILL (UniCA, INRIA, I3S)2 ECTS
Be able to synthetize the content of a scientific talk in English. Students are attending the SophIA Summit.
SECOND SEMESTER
STATISTICAL LEARNING
- Introduction to Information Theory
-
Cédric RICHARD (Lagrange, UniCA, 3IA Chair Holder)3 ECTS - 30h course
Information Theory is the study of the fundamental limits of information transmission (or coding) and storage (or compression).
This course offers a broad introduction to information theory and its real-world applications. A subset of the following is covered: entropy and information; theoretical limits of lossless data compression and practical algorithms; communication in the presence of noise; channel capacity; channel coding. - Model Selection and Resampling Methods
-
Marco LORENZI (INRIA - 3IA chair holder)3 ECTS - 30h course
• Critically assess the performance of the model on a specified task through cross validation and the evaluation of information criteria
• Identify and prevent the sources of assessment bias
• Create your own benchmark for a variety of modeling problem
• Identify modeling alternatives and evaluation strategies
• Visualize and present performances across models
• Understand the basis of theoretical approaches to model selection - Optimization for Data Science
-
Rémy SUN (INRIA)3 ECTS - 30h course
Stochastic gradient descent (Robbins-Monro, 1951) is the workhorse of many statistical and probabilistic procedure. In particular, it is widely used in machine learning for training neural networks, support vector machines. This course is intended to provide a mathematical foundation to this algorithm and variants of it, along with a numerical intuition of its behavior on practical examples.
It will be organized in three main blocks: a first one giving foundation on optimization, a second one dedicated to automatic differentiation and a third one dedicated the stochastic gradient descent algorithm.
MACHINE LEARNING
- More on Learning Algorithms
-
Michel RIVEILL (UniCA, INRIA, I3S)
Diane LINGRAND (UniCA, I3S, CNRS)3 ECTS - 30h courseMachine learning algorithms are data analysis methods that search for patterns and characteristic structures in data sets. Typical tasks are data classification, automatic regression and unsupervised model fitting.
This course presents some of the main advanced methods in the field for structure discovery, classification and non-linear regression. This is an advanced course in machine learning, so students will gain extensive experience in this area. - More on Deep Learning
-
Michel RIVEILL (UniCA, INRIA, I3S)
Diane LINGRAND (UniCA, I3S, CNRS)3 ECTS - 30h courseThe objective of the course is to deepen the construction of machine learning models in both Tensorflow and Pytorch by being able to build your own cells, loss functions or metrics.
In particular, time series processing will be covered as well as the main tasks related to natural language processing: sentiment analysis or text classification, part of speech or named entity recognition, machine translation or text summarization or question answering, text generation or image capturing. - Web of Data
-
3 ECTS - 30h course
Web applications use and exchange data on the web, which has evolved into the so-called Web of Data. This course introduces to the foundational principles of Graph-based Knowledge Representation for the Web of Data and its implementation with the standard languages recommended by W3C: RDF to represent knowledge graphs, RDFS, OWL and SKOS to represent their vocabularies, SPARQL to query RDF graphs and their vocabularies, SHACL to represent constraints on RDF graphs.
PERSONAL WORK
- Case studies
-
Charles BOUVEYRON (UniCA, INRIA, 3IA chairholder)3 ECTS
The goal of Case studies is to work on concrete problems of data analysis from companies / laboratories / communities. For this, the interested companies / laboratories will provide a description of the problem, an associated original dataset and contact details of the person from the company / laboratory who will be in charge of monitoring. Students will work on the project in groups for 8 weeks and send a report back to the company at the end of these 8 weeks. A final presentation will be made at the end of the project.
- Internship
-
Michel RIVEILL (UniCA, INRIA, I3S)
9 ECTS