A basic and concise introduction to Topological Data Analysis
Lecturer: Patrizio Frosini (UNIPI)
Contact: patrizio.frosini@unipi.it
Schedule:
JAN 8th 11am-1pm sala seminari est
JAN 10th 11am-1pm sala seminari est
JAN 13th 11am-1pm sala seminari est
JAN 15th 11am-1pm sala seminari est
JAN 17th 11am-1pm sala seminari est
JAN 20th 11am-1pm sala seminari est
JAN 22nd 11am-1pm sala seminari est
JAN 24th 11am-1pm sala seminari est
Topological Data Analysis (TDA) is a mathematical framework focused on studying and quantifying the “shape” of data. Its primary goal is to describe and measure the similarity in datasets by using distances, particularly when equivalences are defined through geometric transformations. Additionally, TDA is highly effective for reducing the dimensionality of data, making it easier to analyze and compare. It can also be utilized in geometric machine learning, and its approach can be applied to a wide range of data types, including time series, 2D and 3D objects, and point clouds. Throughout the course, fundamental concepts required for a basic understanding of TDA will be introduced, with a focus on practical and computational examples, rather than formal mathematical theory.
Topics:
- Equivalence and non-equivalence of data with respect to the action of a group of transformations.
- Simplicial complexes as a generalization of the concept of a graph and as a geometric representation of data described by point clouds in Euclidean spaces.
- Simplicial homology groups as a method for representing the “shape” of a simplicial complex derived from a point cloud.
- The need to adapt homology to the observer’s point of view and the presence of noise: an introduction to persistent homology and persistence diagrams.
- Stability of persistence diagrams in the presence of noise.
- Applications of persistent homology.
- From the shape of data to the shape of observers: the concept of a Group Equivariant Non-Expansive Operator (GENEO).
- The problem of approximating observers in the space of GENEOs.
- GENEOs as a geometric method for reducing the number of parameters in neural networks and increasing their interpretability.
- Application of GENEO theory to identify pockets in proteins and the implementation of GENEO networks for geometric Machine Learning.
Distributed Ledger Technology data: management and analysis
Lecturers: Damiano Di Francesco Maesa (UNIPI), Matteo Loporchio (UNIPI)
Contact: damiano.difrancesco@unipi.it, matteo.loporchio@phd.unipi.it
Schedule
JAN 27th 10am-1pm sala seminari est
JAN 28th 10am-1pm sala seminari est
JAN 29th 10am-1pm sala seminari est
JAN 30th 10am-1pm sala seminari est
JAN 31th 10am-1pm sala seminari est
JAN 31th 2pm-4pm sala seminari est
The goal of this course is to present how data is managed (represented, secured, and retrieved) in Distributed Ledger Technology (DLT) based systems, and how data can be analysed to study the ecosystems they support. The course will start with an introduction to the concepts behind DLT, including its main implementations, novel properties, and innovative applications. We will then present the two most famous blockchain protocols, Bitcoin and Ethereum, outlining how they manage their internal transaction data. This same data will be focus of the following lectures showcasing how to represent and analyse it through graphs. The course will close by presenting how authenticated data structures can be leveraged to enhance such data management.
The course final exam will either be a project or seminar depending on each student preference.
Program analysis
Lecturers: Roberto Bruni (UNIPI), Roberta Gori (UNIPI)
Period: February 2025
This course offers a focused exploration of formal methods in software development, with some emphasis on the shift of perspectives after Peter O’Hearn’s influential paper on incorrectness logic. Instead of exploiting over-approximations to prove program correctness like done with classical formal methods, incorrectness reasoning exploits under-approximations for exposing true bugs.
The overall goal of incorrectness methods is to develop principled techniques to assist programmers with timely feedback about the presence of true errors, with few or zero false alarms.
The course will overview different approaches, like program logics, pointer analysis, and abstract interpretation, for both over- and under-approximation, as well as their combination.
Computational Modeling for Systems Biology
Lecturers: Paolo Milazzo (UNIPI)
Period: Spring 2025
The course will deal with several aspects of the in-silico analysis of dynamical properties of biological systems. We will focus, in particular, on mechanistic modeling approaches aiming at creating executable representations of the biological mechanisms and processes underlying cell functioning. After providing a few notions of biochemistry and cell biology, we will examine modeling methods for gene regulatory networks with particular emphasis on Boolean network models and rule-based approaches. Next, we will present approaches suitable for the analysis of metabolic and cell-signaling processes, ranging from differential equations, to stochastic modeling and simulation methods, to hybrid approaches. Finally, we will briefly survey emerging methods in computational structural biology, such as methods for protein structure prediction and molecular dynamics simulation, and we discuss how these techniques could be integrated with the previous ones in order to evaluate the impact of protein mutations on cell functioning.
Pathways to Green ICT
Lecturers: Antonio Brogi (UNIPI), Stefano Forti (UNIPI)
Period: Spring 2025
The course aims at introducing students to the fundamentals of Green ICT, providing them with a toolbox to consider sustainability aspects in their research. The course will introduce:
-The concepts of sustainability and the types of environmental impact of the lifecycle of ICT systems (power consumption, carbon emissions, e-waste)
– Methodologies to assess the environmental impact of ICT systems (from production to operation and maintenance to disposal)
– Methodologies to decrease the environmental impact of ICT systems (orthogonality of QoS and environmental goals, hardware selection and PUE reduction, energy-aware programming, green software engineering, energy-aware system deployment)
– Use cases and open research challenges
Programming Tools and Techniques in the Pervasive Parallelism Era
Lecturers: Marco Danelutto (UNIPI), Patrizio Dazzi (UNIPI)
Period: Spring 2025
The course covers techniques and tools (already existing or that are in the process of being moved to mainstream) suitable to support the implementation of efficient parallel/distributed applications targeting small scale parallel systems as well as larger scale parallel and distributed systems, possibly equipped with different kind of accelerators. The course follows a methodological approach to provide a homogeneous overview of classical tools and techniques as well as of new tools and techniques specifically developed for new, emerging architectures and applicative domains. Perspectives in the direction of reconfigurable coprocessors and domain-specific architectures will also be covered.
Analysis techniques for transfer learning in Neural Tangent Kernel Regime
Pietro Cassarà (CNR-ISTI), Dario Trevisan (UNIPI)
Period: April-May 2025
Knowledge transfer learning consists of training a simpler model, mimicking the output of a more complex one, even using heterogeneous information. This approach is investigated because some results show that transfer learning speeds up the training process and improves the generalization of a new learning model
using the soft labels generated by the complex model. This feature makes this kind of technique suitable for semisupervised, unsupervised learning techniques, and distributed learning applications. Although transfer learning is widely used in application fields. such as networking and decision support systems, no satisfactory theoretical explanation has yet been found; since, there is a lack of design techniques for this type of learning model. The course focuses on the mathematical tools that can be exploited for the theoretical analysis of transfer learning, starting with the results based on the spectral analysis.
3D Geometry Representation and Processing for Deep Learning
Lecturers: Paolo Cignoni, Massimiliano Corsini, Daniela Giorgi, Luigi Malomo (CNR-ISTI)
Period: May-June 2025
Computer Graphics and Geometry Processing are the main disciplines dealing with 3D data such as meshes and point clouds. In turn, Artificial Intelligence and Deep Learning are fundamental paradigms to manage visual data. Nevertheless, applying traditional learning paradigms on 3D data requires rethinking architectural building blocks designed for 2D images, such as convolution and pooling operators, as well as attention layers.
In this course, we will introduce different representations for 3D data, and basic geometry processing techniques that intervene in deep learning pipelines (sampling, remeshing, conversion, …). Then, we will introduce methods able to learn tasks on 3D data. We will describe different architectures to process complex geometric domains, and the novel mechanisms introduced in the literature to preserve by design their intrinsic properties. Examples include graph learning techniques, augmented with geometric and topological information; attention modules to process unordered point sets and mesh data; transformer-like architectures for unstructured data.
In the second part of the course, we will discuss different applications where the interplay between Computer Graphics/Geometry Processing and Deep Learning is opening up to exciting results, including Computational Fabrication, Architectural Geometry, and Environmental Monitoring.
Challenges in Modern Information Retrieval
Lecturers: Franco Maria Nardini, Cosimo Rulli, Salvatore Trani (CNR-ISTI), Rossano Venturini (UNIPI)
Period: June 2025
This PhD course focuses on Information Retrieval and discusses the state-of-the-art and the challenges in the two main areas of Web search: i) indexing and ii) query processing. The course introduces each area by discussing the state of the art in the field and by presenting the open research questions. The course emphasizes query processing, a research line where machine learning is important to advance the state of the art. After introducing the different query processing techniques, the course introduces supervised techniques explicitly focused on targeting the ranking problem and discusses several time and space efficiency/effectiveness trade-offs in query processing. The course will also provide an in-depth analysis of query processing techniques employing transformer-based large language models. Four hands-on sessions will cover indexing and query processing of public Web collections.
Course Contents
- Modern Information Retrieval (4 hours)
o The web: history, peculiarities and the importance of the search.
o Data structures for indexing Web documents
o Modern techniques for efficient text retrieval
o Data compression for integers, sequences of integers, and vectors
o Challenges in indexing the Web
o Hands-On: Indexing and basic query processing on a public Web collection
- Machine learning in modern query processors (4 hours)
o Machine learning approaches for IR: Learning to Rank
o Efficiency/effectiveness trade-offs and cascading architectures
o Hands-On: Learning to rank for efficient Web search
- Neural Information Retrieval I (4 hours)
o Neural information retrieval
o The role of transformers in modern Web Search
o Interaction-based methods vs. representation-based methods
o Efficient query processing with interaction-based methods
o Hands-On: Deep neural networks for efficient Web search
- Neural Information Retrieval II (4 hours)
o Transformer-based large language models as text encoders
o Sparse, dense, and multi-vector representations
o Data structures for efficient k-NN search and retrieval over learned representations
o Quantization techniques
o Hands-On: Encoding and retrieving over learned sparse representations