Ph.D. courses a.y. 2025/2026 - Area Dottorato

A basic and concise introduction to Topological Data Analysis

Lecturer: Patrizio Frosini (UNIPI)

Schedule: January 7th, 9th, 12th, 14th, 16th, 19th, 21st and 23rd, 2026, from 11 am to 1 pm, in “sala seminari ovest”

Topological Data Analysis (TDA) is a mathematical framework focused on studying and quantifying the “shape” of data. Its primary goal is to describe and measure the similarity in datasets by using distances, particularly when equivalences are defined through geometric transformations. Additionally, TDA is highly effective for reducing the dimensionality of data, making it easier to analyze and compare. It can also be utilized in geometric machine learning, and its approach can be applied to a wide range of data types, including time series, 2D and 3D objects, and point clouds. Throughout the course, fundamental concepts required for a basic understanding of TDA will be introduced, with a focus on practical and computational examples, rather than formal mathematical theory.

Topics:

Equivalence and non-equivalence of data with respect to the action of a group of transformations.
Simplicial complexes as a generalization of the concept of a graph and as a geometric representation of data described by point clouds in Euclidean spaces.
Simplicial homology groups as a method for representing the “shape” of a simplicial complex derived from a point cloud.
The need to adapt homology to the observer’s point of view and the presence of noise: an introduction to persistent homology and persistence diagrams.
Stability of persistence diagrams in the presence of noise.
Applications of persistent homology.
From the shape of data to the shape of observers: the concept of a Group Equivariant Non-Expansive Operator (GENEO).
The problem of approximating observers in the space of GENEOs.
GENEOs as a geometric method for reducing the number of parameters in neural networks and increasing their interpretability.
Application of GENEO theory to identify pockets in proteins and the implementation of GENEO networks for geometric Machine Learning.

Distributed Ledger Technology data: management and analysis

Lecturers: Damiano Di Francesco Maesa (UNIPI), Matteo Loporchio (UNIPI)

Contact: damiano.difrancesco@unipi.it, matteo.loporchio@di.unipi.it

Schedule: Week of January 26th, 2026

The goal of this course is to present how data is managed (represented, secured, and retrieved) in Distributed Ledger Technology (DLT) based systems, and how data can be analysed to study the ecosystems they support. The course will start with an introduction to the concepts behind DLT, including its main implementations, novel properties, and innovative applications. We will then present the two most famous blockchain protocols, Bitcoin and Ethereum, outlining how they manage their internal transaction data. This same data will be focus of the following lectures showcasing how to represent and analyse it through graphs. The course will close by presenting how authenticated data structures can be leveraged to enhance such data management.

The course final exam will either be a project or seminar depending on each student preference.

Pathways to Green ICT

Lecturers: Antonio Brogi (UNIPI), Stefano Forti (UNIPI)

Contact: antonio.brogi@unipi.it, stefano.forti@unipi.it

Schedule: week of February 2nd, 2026 (first part) and week of March 2nd, 2026 (second part)

The course aims at introducing students to the fundamentals of Green ICT, providing them with a toolbox to consider sustainability aspects in their research. The course will introduce:

The concepts of sustainability and the types of environmental impact of the lifecycle of ICT systems (power consumption, carbon emissions, e-waste)
Methodologies to assess the environmental impact of ICT systems (from production to operation and maintenance to disposal)
Methodologies to reduce the environmental impact of ICT systems (orthogonality of QoS and environmental goals, hardware selection and PUE reduction, energy-aware programming, green software engineering, energy-aware system deployment)
Use cases and open research challenges

Responsible AI Engineering

Lecturers: Humberto Torres Marques-Neto (PUC Minas, Brasil)

Contact: humberto@pucminas.br

Schedule: March 18th, 19th, 20th, 23rd, 24th, 25th, 26th and 27th, 2026, from 11am to 1 pm – in “sala seminari est”

The emergence of AI software engineering marks a transformation in traditional software development practices. Unlike conventional approaches – centered on well-defined requirements and deterministic algorithms – AI development embraces iterative processes such as model training, validation, and deployment. These stages are inherently probabilistic and data-driven, emphasizing adaptation and continuous learning from historical datasets and user interactions. Moreover, the widespread adoption of AI technologies in recent years, together with the integration of AI-assisted tools and Large Language Models (LLMs), has introduced new paradigms in how software systems are designed, developed, and maintained. Effectively incorporating AI techniques and tools within the software lifecycle requires engineers to embrace agile principles, prioritize data quality and governance, and account for ethical dimensions to promote fairness, reliability, privacy, transparency, sustainability, accountability, and explainability. This ongoing evolution from software engineering to Responsible AI engineering represents a pivotal step toward leveraging AI’s full potential while fostering responsible and sustainable innovation. According to the Software Engineering Institute of the Carnegie Mellon University – SEI@CMU, the AI Engineering proposes the combining of system engineering, software engineering, computer science, and human-centered design to create AI Systems to be scalable, robust, and secure for running in complex contexts and with a minimum predictability of maintenance and budget. Moreover, in addition to the importance of building high-quality with the high-productivity AI-based software systems, software teams should be aware of and consider best practices for the responsible use of AI in different domains. This course will present concepts, discuss case studies, and debate research opportunities about Responsible AI Engineering. In the course, students will be invited to analyze and discuss issues regarding the software processes and their ethical implications, AI-based software requirement engineering using LLMs, defining software architecture for AI systems, software quality, deployment, as well as the usage of AI to code and test AI Systems. For this 20-hour course, we expected students to apply the discussed concepts and present a final seminar using data from large-scale systems, such as online social network data and open government data, within the broad context of an AI System.

Specification and analysis of communication protocols

Lecturers: Emilio Tuosto (GSSI)

Contact: emilio.tuosto@gssi.it

Tentative schedule: April 2026

The course provides an overview of the application of formal methods to the specification and analysis of communication protocols in distributed systems. It introduces mathematical models and languages for describing process interactions, with particular emphasis on formal reasoning and the verification of key properties in relevant application domains such
as the analysis of smart contracts and of robotics systems.

Big Data Analytics: Ecosystem Data as a Case Study

Lecturers: Gianpaolo Coro (CNR-ISTI)

Contact: gianpaolo.coro@cnr.it

Tentative schedule: May 4th-8th, 2026

Schedule: May 5th, 6th, 7th, and 8th, 2026, from 9 am to 1 pm, in “sala seminari est”

In this course, practical methodologies for geospatial data analysis and modelling will be presented.

The course will cover specific classes of problems in ecosystem science and their corresponding solutions, adopting state-of-the-art computer science technologies and methodologies. The explained techniques will include:

1) Unsupervised approaches to discover and predict patterns of ecosystem change: Principal Component Analysis and Maximum Entropy for feature selection; K-means, X-means, and DBScan cluster analysis; anomaly detection through Variational Autoencoders; Singular Spectrum Analysis for time series forecasting;

2) Supervised approaches for biodiversity distribution prediction and invasive species monitoring: Feed-Forward Artificial Neural Networks, Support Vector Machines, AquaMaps, Maximum Entropy;

3) Data mining techniques to detect hotspots of risk of ecosystem change.

These techniques will be applied to Earth observations, environmental data, and species observation records within Big Data methodologies. We will show how they can be used practically to safeguard global food availability and economic welfare. For example, predicting the impact of climate change on biodiversity contributes to avoiding economic and biodiversity collapse due to sudden ecosystem change. Likewise, monitoring the effect of overfishing and illegal fishing on stocks and biodiversity helps prevent ecosystem and economic collapse.

The described techniques address real-world cases of the United Nations (FAO, UNESCO, UNEP, and others) related to food and ecosystem safety. We will demonstrate new research directions within this context. The techniques are also sufficiently versatile to be applied to Big Data in other fields. We will utilize geospatial data that present typical Big Data features, such as continuously increasing volume, significant heterogeneity and complexity, and unreliable content. The methodologies will also be showcased within the Open Science paradigm, which emphasizes repeatability, reproducibility, and cross-domain reuse of all experimental stages.

The course will be interactive, with practical exercises. Attendees will use online (the D4Science platform, https://services.d4science.org) and offline (QGIS, Java-based MaxEnt) software to parametrize the models and run the experiments. A final exercise will be proposed as the course exam.

Course Contents in brief:

Geospatial data
Feature selection techniques
Distance and density-based cluster analysis for pattern recognition
Artificial Neural Networks for ecosystem modelling
Data mining techniques for geospatial data analysis
Open Science approaches

Programming Tools and Techniques in the Pervasive Parallelism Era

Lecturers: Marco Danelutto (UNIPI), Patrizio Dazzi (UNIPI)

Contact: marco.danelutto@unipi.it, patrizio.dazzi@unipi.it

Tentative schedule: week of May 11th and week of May 18th, 2026

The course covers techniques and tools (already existing or that are in the process of being moved to mainstream) suitable to support the implementation of efficient parallel/distributed applications targeting small scale parallel systems as well as larger scale parallel and distributed systems, possibly equipped with different kind of accelerators. The course follows a methodological approach to provide a homogeneous overview of classical tools and techniques as well as of new tools and techniques specifically developed for new, emerging architectures and applicative domains. Perspectives in the direction of reconfigurable coprocessors and domain-specific architectures will also be covered.