First semester

REFRESHER
Basic Probability
Christine MALOT (UniCA)

Basic Algebra for Data Analysis 
Mathieu CARRIERE (INRIA) 

Basic Algorithmics
Michel RIVEILL (UniCA, INRIA, I3S) 

Methods and tools for technical and scientific writing
Aline MENIN (UniCA)
1 ECTS


1 ECTS


1 ECTS


1 ECTS
 

STATISTICS 

Statistical Inference Theory
Vincent Vandewalle (INRIA, UniCA, 3IA Chair Holder)
6 ECTS - 60h course

MACHINE LEARNING

Introduction to Machine Learning 
Michel RIVEILL (UniCA, INRIA, I3S)  
Diane LINGRAND (UniCA, I3S, CNRS)
3 ECTS - 30h course 

This course is an introduction to machine learning. It aims at introducing the field, defining the vocabulary. At the end of the course, students will be able to perform simple pre-processing on different types of data and solve supervised or unsupervised tasks using several models.

Introduction to Deep Learning
Michel RIVEILL (UniCA, INRIA, I3S)
Diane LINGRAND (UniCA, I3S, CNRS)
3 ETC - 30h course 

In this course, student will build and train neural network architectures such as convolutional neural networks or recurrent neural networks, and most importantly, student will learn how to improve them with strategies such as Dropout, BatchNorm, different initialization strategies. Theoretical concepts and their industrial applications using Python and TensorFlow will be implemented on object recognition or natural language processing problems.

Ethical Aspects of Data
Frédéric PRECIOSO (I3S, INRIA, UniCA)
3 ECTS - 30h course

This course introduces the ethical aspects of artificial intelligence (AI), addressing the concerns raised by the increased use of AI to make decisions that have important consequences on people’s lives. In particular, the course focuses on fundamental concepts and methods of interpretability and transparency in Machine Learning (ML) with a particular focus on fairness.

PROGRAMMING

R Programming for Data Science
3 ECTS - 30h course

At the end of the course, students will be able to explore a dataset, to handle missing data, to clean and standardize data, to calculate basic statistics, to subset, replace and any other kind of processing operation. They will be able to create markdown report and develop their own application in shiny.
Finally, they’ll learn how to apply very basic ML algorithms (logistic regression, decision tree, random forest, svm, pca, clustering) in R and how to prepare a dataset for modeling (preprocessing, feature engineering, train/test dataset) and test its performances (accuracy, ROC curves, ….).

Python Programming for Data Science
Marco MILANESIO (UniCA)
3 ECTS - 30h course

In this course we will provide an extensive overview on various aspect of data manipulation and analysis with the help of the Python language. In the first part of the course we will start with an introduction of the Python programming language, with a particular stress on what can be achieved in terms of data analysis without using any external framework, in order to provide the basics for more advanced programming techniques. Then, we will focus on different Python frameworks (notably, numpy and pandas) to tackle larger datasets: from data cleaning (outliers detection, duplicates, and so on), missing value management (interpolation, substitution, removal) and basic data analysis (statistical and quantitative).

Distributed Big Data Systems
Luc HOGIE (I3S, CNRS, UniCA, INRIA)
3 ECTS - 30 h course

This course introduces concepts and techniques involved into the design and implementation of distributed systems, with an emphasis on the (distributed) processing of large datasets.

It is organized as mixed course/lab sessions, where theoretical aspects and challenges are illustrated by real world implementations whenever possible. More precisely, across the sessions, each student will develop a particular component for a common distributed application. By doing this, students will be led to face classic issues of distributed computing, and to propose adequate solutions.

WORKSHOP AND VULGARIZATION

Workshop and vulgarization
Michel RIVEILL (UniCA, INRIA, I3S)
2 ECTS 

Be able to synthetize the content of a scientific talk in English. Students are attending the SophIA Summit.



SECOND SEMESTER

STATISTICAL LEARNING

Introduction to Information Theory
Cédric RICHARD (Lagrange, UniCA, 3IA Chair Holder)
3 ECTS - 30h course 

Information Theory is the study of the fundamental limits of information transmission (or coding) and storage (or compression).
This course offers a broad introduction to information theory and its real-world applications. A subset of the following is covered: entropy and information; theoretical limits of lossless data compression and practical algorithms; communication in the presence of noise; channel capacity; channel coding.

Model Selection and Resampling Methods 
Marco LORENZI (INRIA - 3IA chair holder)
3 ECTS - 30h course 


• Critically assess the performance of the model on a specified task through cross validation and the evaluation of information criteria
• Identify and prevent the sources of assessment bias
• Create your own benchmark for a variety of modeling problem
• Identify modeling alternatives and evaluation strategies
• Visualize and present performances across models
• Understand the basis of theoretical approaches to model selection

Optimization for Data Science
Rémy SUN (INRIA)
3 ECTS - 30h course 

Stochastic gradient descent (Robbins-Monro, 1951) is the workhorse of many statistical and probabilistic procedure. In particular, it is widely used in machine learning for training neural networks, support vector machines. This course is intended to provide a mathematical foundation to this algorithm and variants of it, along with a numerical intuition of its behavior on practical examples.
It will be organized in three main blocks: a first one giving foundation on optimization, a second one dedicated to automatic differentiation and a third one dedicated the stochastic gradient descent algorithm.

MACHINE LEARNING

More on Learning Algorithms
Michel RIVEILL (UniCA, INRIA, I3S)
Diane LINGRAND (UniCA, I3S, CNRS)
3 ECTS - 30h course

Machine learning algorithms are data analysis methods that search for patterns and characteristic structures in data sets. Typical tasks are data classification, automatic regression and unsupervised model fitting. 
This course presents some of the main advanced methods in the field for structure discovery, classification and non-linear regression. This is an advanced course in machine learning, so students will gain extensive experience in this area.

More on Deep Learning
Michel RIVEILL (UniCA, INRIA, I3S)
Diane LINGRAND (UniCA, I3S, CNRS)
3 ECTS - 30h course

The objective of the course is to deepen the construction of machine learning models in both Tensorflow and Pytorch by being able to build your own cells, loss functions or metrics.
In particular, time series processing will be covered as well as the main tasks related to natural language processing: sentiment analysis or text classification, part of speech or named entity recognition, machine translation or text summarization or question answering, text generation or image capturing.

Web of Data
3 ECTS - 30h course

Web applications use and exchange data on the web, which has evolved into the so-called Web of Data. This course introduces to the foundational principles of Graph-based Knowledge Representation for the Web of Data and its implementation with the standard languages recommended by W3C: RDF to represent knowledge graphs, RDFS, OWL and SKOS to represent their vocabularies, SPARQL to query RDF graphs and their vocabularies, SHACL to represent constraints on RDF graphs.

PERSONAL WORK 

Case studies 
Charles BOUVEYRON (UniCA, INRIA, 3IA chairholder)
3 ECTS

The goal of Case studies is to work on concrete problems of data analysis from companies / laboratories / communities. For this, the interested companies / laboratories will provide a description of the problem, an associated original dataset and contact details of the person from the company / laboratory who will be in charge of monitoring. Students will work on the project in groups for 8 weeks and send a report back to the company at the end of these 8 weeks. A final presentation will be made at the end of the project. 

Internship

Michel RIVEILL (UniCA, INRIA, I3S)

9 ECTS