## Matrix computations: a quick overview and some applications

**Lecturer: ** Gianna Del Corso (UniPI), Federico Poloni (UniPI)

**Period:** 16-25 January 2019, Sala Seminari Ovest/Est

We overview some useful techniques of numerical linear algebra with several applications to data analysis. The theoretical aspects will be complemented with a few examples from applications. Theory: review of singular value decomposition and other factorizations, Kronecker products, Fast Fourier transform, nonnegative matrix factorizations, nonnegative matrices and M-matrices. Applications: image deblurring / denoising, vector space model, latent semantic indexing, clustering, Pagerank / invariant measures of Markov chains, Centrality measures.

**Scheduling:**

- January 16, 9-11 Sala Seminari Ovest
- January 17, 14-16 Sala Seminari Ovest
- January 18, 9-11 Sala Seminari Ovest
- January 21, 14-16 Sala Seminari Ovest
- January 22, 14-16 Sala Seminari Ovest
- January 23, 9-11 Sala Seminari Ovest
- January 24, 14-16 Sala Seminari Est
- January 25, 9-11 Sala Seminari Est

## Genomic data analysis with applications

**Lecturer: ** Ugo Borello (Biologia, UniPI), Filippo Geraci (IIT CNR)

**Period:** January-February 2019, Sala Seminari Est

The goal of the course is to introduce the main framework for computational analysis of genomic data: from differential analysis to integration of omics data and biological networks.

We focus on data generated by Next Generation Sequencing (NGS) technology and we discuss the impact of these technologies on biology and bioinformatics.

NGS data analysis is the first step to differential analysis, which is the task of measuring a certain (often large) number of variables in two different populations of biological samples and deriving the subset of variables that are statistically different. We will see several techniques and applications of differential analysis: from differential expression to the discovery of biomarkers. To this end, we will also review the basics of clustering, classification and feature selection.

Subsequently, we will introduce the problem of multi-omics data analysis, its advantages and challenges. We will focus on the integration of data coming from different technologies reviewing different state-of-the-art methods.

In the last part of the course, we will introduce the network structure of biological data. We will show how some problems in graph theory are closely related to biological problems. For example, we will see how the problem of finding protein complexes can be easily mapped into the problem of finding communities in a graph.

**Syllabus**

Lesson 1: Course Introduction (U. Borello and F. Geraci), Basics of genomics and trascriptomics (U. Borello)

Lesson 2: Where are data? (F. Geraci) Introduction to the most common databases and their structure (NCBI, UCSC, TCGA)

Lesson 3: From the raw data to the counts (F. Geraci)

Lesson 4: Differential expression (F. Geraci)

Lesson 5: How to build a sensible panel of biomarkers (F. Geraci)

Lesson 6: And now? (U. Borello) What can I do with a list of differentially expressed genes?, How a biologist and a computer scientist can interact with this data

Lesson 7: Data integration in bioinformatics (F. Geraci)

Lesson 8: Networks in bioinformatics (F. Geraci) Introduction of the interaction networks in bioinformatics, Algorithmic strategies for network data analysis

**Scheduling:**

- January 29, 11-13
- January 30, 9-11
- January 31, 9-11
- February 1, 9-11
- February 5, 9-11
- February 6, 9-11
- February 7, 9-11
- February 8, 9-11

## Polyhedral Combinatorics

**Lecturer: ** Laura Galli (UniPI), Adam Letchford (University of Lancaster, UK)

**Period:** 12-21 March 2019, Sala Seminari Ovest

Many important optimization problems arising in practical applications are discrete or combinatorial. Indeed, such problems arise in fields as diverse as operational research, statistics, computer science, engineering and the physical sciences. Although most combinatorial problems are theoretically intractable (“NP-hard”), algorithms and software have improved dramatically during the past couple of decades, to the point where real-life instances with thousands of variables and constraints can now be solved to proven optimality or near-optimality. One of the key components of this success is the study of certain convex polyhedra. The study of such polyhedra is known as polyhedral combinatorics. This course gives an introduction to this fascinating topic, covering theory, algorithms, applications and software. Topics covered: – Recap on Combinatorial Optimisation – Modelling Problems as Integer Programs – Classical Solution Approaches – Fundamentals of the Polyhedral Approach – Some Simple Polyhedra – Polyhedra Associated with NP-Hard Problems – Algorithms and Implementation – Applications

**Scheduling**

- March 12, 9:30-11:30
- March 13, 9:30-11:30
- March 14, 9:30-11:30
- March 15, 9:30-11:30
- March 18, 15-17
- March 19, 9:30-11:30
- March 20, 9:30-11:30
- March 21, 9:30-11:30

## Data Stream Processing from the Parallelism Perspective

**Lecturer: ** Gabriele Mencagli (UniPI), Matteo Andreozzi, ARM

**Period:** 1-11 April 2019

An ever-growing number of devices are capable of sensing the world by producing huge flows (data streams) of information regarding the users and the environment. A large set of applications need efficient processing techniques to extract insights and complex knowledge from such a massive and transient data deluge. Furthermore, to keep the processing in real-time, existing systems must expose parallel features to adapt the algorithms and the way the processing is performed to unpredictable and time-varying input rates and workloads.

The computing paradigm enabling the processing in those scenarios is called Data Stream Processing, which comprises in its most general meaning other closely-related paradigms like Event Processing and Complex Event Processing. The course aims at giving a solid background on this research field, with an introduction on the theory behind stream processing, an overview of the most famous Stream Processing Systems (e.g., Apache Storm, Apache Flink and Spark Streaming) and the most recent results achieved by the scientific community in terms of optimizations, cost models and patterns. Furthermore, in the final part of the course, the FastFlow and WindFlow parallel libraries will be described in details, by outline the research advancements for ultra-low latency streaming systems targeting multi-core environments and C++ programming.

**Syllabus**

Lesson 1 (Monday 01/04): Introduction to the Course

Lesson 2 (Tuesday 02/04): Data Stream Processing: Paradigms and Models

Lesson 3 (Wednesday 03/04): Stream Processing Frameworks: an Overview

Lesson 4 (Thursday 04/04): Stream Processing Optimizations

Lesson 5 (Monday, 08/04): Special lecture by Matteo Andreozzi (ARM)

Lesson 6 (Tuesday 09/04): Parallel Patterns for Window-based Streaming Operators

Lesson 7 (Wednesday 10/04): FastFlow: Building Blocks for Advanced Streaming Runtimes in C++11

Lesson 8 (Thursday 11/04): WindFlow: a C++17 Parallel Pattern-based Library for Data Stream Processing

Each lecture will consist of two hours of frontal lesson. Indicative starting time at 11:00 (in one of the two seminar rooms of the department, to be decided).

**Exams**

Students can do the exam according to two possible modalities:

- by giving an oral presentation (seminar) summarizing a research paper in the field
- by writing a survey (max. 5 pages) on a topic given by the lecturer

## Theory and Practice of Learning From Data

**Lecturer: ** Luca Oneto (UniPI)

**Period:** 3-5 June 2019, Sala Seminari Ovest

This course aims at providing an introductory and unifying view of information extraction and model building from data, as addressed by many research fields like Data Mining, Statistics, Computational Intelligence, Machine Learning, and Pattern Recognition. The course will present an overview of the theoretical background of learning from data, including the most used algorithms in the field, as well as practical applications in industrial areas such as transportation, manufacturing, etc.

**Assesment**

Students have to make a small presentation (20 min) about the applications of the methods presented in the course to their research.

**Scheduling**

- June 3, 9:30-13:30, 14:30-16:30
- June 4, 9:30-13:30, 14:30-16:30
- June 5, 9:30-13:30, 14:30-16:30