First semester

Basic Probability

Basic Algebra for Data Analysis 

Basic Algorithmics

Methods and tools for technical and scientific writing






Statistical Inference Theory

6 ECTS - 60h course


Introduction to Information Theory

3 ECTS - 30h course 

Information Theory is the study of the fundamental limits of information transmission (or coding) and storage (or compression).
This course offers a broad introduction to information theory and its real-world applications. A subset of the following is covered: entropy and information; theoretical limits of lossless data compression and practical algorithms; communication in the presence of noise; channel capacity; channel coding.

Model Selection and Resampling Methods 

3 ECTS - 30h course 

• Critically assess the performance of the model on a specified task through cross validation and the evaluation of information criteria
• Identify and prevent the sources of assessment bias
• Create your own benchmark for a variety of modeling problem
• Identify modeling alternatives and evaluation strategies
• Visualize and present performances across models
• Understand the basis of theoretical approaches to model selection

Optimization for Data Science

3 ECTS - 30h course 

Stochastic gradient descent (Robbins-Monro, 1951) is the workhorse of many statistical and probabilistic procedure. In particular, it is widely used in machine learning for training neural networks, support vector machines. This course is intended to provide a mathematical foundation to this algorithm and variants of it, along with a numerical intuition of its behavior on practical examples.
It will be organized in three main blocks: a first one giving foundation on optimization, a second one dedicated to automatic differentiation and a third one dedicated the stochastic gradient descent algorithm.


Introduction to Machine Learning 

3 ECTS - 30h course 

This course is an introduction to machine learning. It aims at introducing the field, defining the vocabulary. At the end of the course, students will be able to perform simple pre-processing on different types of data and solve supervised or unsupervised tasks using several models.

Introduction to Deep Learning

3 ETC - 30h course 

In this course, student will build and train neural network architectures such as convolutional neural networks or recurrent neural networks, and most importantly, student will learn how to improve them with strategies such as Dropout, BatchNorm, different initialization strategies. Theoretical concepts and their industrial applications using Python and TensorFlow will be implemented on object recognition or natural language processing problems.

Ethical Aspects of Data

3 ECTS - 30h course

This course introduces the ethical aspects of artificial intelligence (AI), addressing the concerns raised by the increased use of AI to make decisions that have important consequences on people’s lives. In particular, the course focuses on fundamental concepts and methods of interpretability and transparency in Machine Learning (ML) with a particular focus on fairness.


More on Learning Algorithms

3 ECTS - 30h course

Machine learning algorithms are data analysis methods that search for patterns and characteristic structures in data sets. Typical tasks are data classification, automatic regression and unsupervised model fitting. 
This course presents some of the main advanced methods in the field for structure discovery, classification and non-linear regression. This is an advanced course in machine learning, so students will gain extensive experience in this area.

More on Deep Learning

3 ECTS - 30h course

The objective of the course is to deepen the construction of machine learning models in both Tensorflow and Pytorch by being able to build your own cells, loss functions or metrics.
In particular, time series processing will be covered as well as the main tasks related to natural language processing: sentiment analysis or text classification, part of speech or named entity recognition, machine translation or text summarization or question answering, text generation or image capturing.

Web of Data

3 ECTS - 30h course

Web applications use and exchange data on the web, which has evolved into the so-called Web of Data. This course introduces to the foundational principles of Graph-based Knowledge Representation for the Web of Data and its implementation with the standard languages recommended by W3C: RDF to represent knowledge graphs, RDFS, OWL and SKOS to represent their vocabularies, SPARQL to query RDF graphs and their vocabularies, SHACL to represent constraints on RDF graphs.


R Programming for Data Science

3 ECTS - 30h course

At the end of the course, students will be able to explore a dataset, to handle missing data, to clean and standardize data, to calculate basic statistics, to subset, replace and any other kind of processing operation. They will be able to create markdown report and develop their own application in shiny.
Finally, they’ll learn how to apply very basic ML algorithms (logistic regression, decision tree, random forest, svm, pca, clustering) in R and how to prepare a dataset for modeling (preprocessing, feature engineering, train/test dataset) and test its performances (accuracy, ROC curves, ….).

Python Programming for Data Science

3 ECTS - 30h course

In this course we will provide an extensive overview on various aspect of data manipulation and analysis with the help of the Python language. In the first part of the course we will start with an introduction of the Python programming language, with a particular stress on what can be achieved in terms of data analysis without using any external framework, in order to provide the basics for more advanced programming techniques. Then, we will focus on different Python frameworks (notably, numpy and pandas) to tackle larger datasets: from data cleaning (outliers detection, duplicates, and so on), missing value management (interpolation, substitution, removal) and basic data analysis (statistical and quantitative).

Distributed Big Data Systems

3 ECTS - 30 h course

This course introduces concepts and techniques involved into the design and implementation of distributed systems, with an emphasis on the (distributed) processing of large datasets.

It is organized as mixed course/lab sessions, where theoretical aspects and challenges are illustrated by real world implementations whenever possible. More precisely, across the sessions, each student will develop a particular component for a common distributed application. By doing this, students will be led to face classic issues of distributed computing, and to propose adequate solutions.


Case studies 





Workshop and vulgarization


Be able to synthetize the content of a scientific talk in English.