Ph.D. courses a.y. 2023/2024

Data compression and compressed data structures

Lecturers: Giovanni Manzini (UNIPI)

Schedule: November 22nd, 24th, 27th, 29th, December 4th, 6th, 13th, 15th,  2023, from 9am to 11am, in “sala seminari est”

This course will describe some recent data compression techniques and how they have been used to design data structures that use space asymptotically close to the information theoretic lower bound. Starting with the classical problem of text indexing, we will show how these techniques have been generalized to the succinct representation of trees, regular languages and graphs.

To attend the course, please send an email to giovanni.manzini@unipi.it

Probabilistic Reasoning in Machine Learning

Lecturers: Daniele Castellana (UNIFI), Federico Errica (NEC Laboratories Europe GmbH, Germany)

Schedule: February 19th 2024 from 9 am to 11am, February 20th from 2 pm to 5 pm, February 21st from 9 am to 12 pm, February 22nd from 9 am to 12 pm, February 23rd from 9 am to 11 am, February 27th from 2 pm to 5 pm. All classes will be held in room “sala seminari est”, but on February 23rd (room “D3”).

The objective of the course is to introduce the key concepts of probabilistic reasoning, illustrating their application in the machine learning domain.

Topics:

  • Probabilistic Modeling Toolbox
  • Latent Variable Models (LVMs)
  • Sampling for LVMs
  • Variational Inference for LVMs
  • Infinite LVMs
  • LVMs for structured data

No background in machine learning is required.

To attend the course, please contact daniele.castellana@unifi.it and/or Federico.Errica@neclab.eu

Pathways to Green ICT

Lecturers: Antonio Brogi (UNIPI), Stefano Forti (UNIPI)

Period: March 2024

As the world experiences environmental problems due to increasing carbon emissions, national and EU initiatives target the compelling need of our society to achieve sustainability goals. According to recent estimates, ICT systems produce 2% to 6% of the global carbon emissions throughout their lifecycle. On the other hand, ICT can contribute to reducing carbon emissions in other sectors (e.g. agriculture, building management, transportation) through digitalisation.

The course aims at introducing students to the fundamentals of Green ICT, providing them with a toolbox to consider sustainability aspects in their research. The lectures will introduce:

  • The concepts of sustainability and the types of environmental impact of the lifecycle of ICT systems (power consumption, carbon emissions, e-waste)
  • Methodologies to assess the environmental impact of ICT systems (from production to operation and maintenance to disposal)
  • Methodologies to decrease the environmental impact of ICT systems (orthogonality of QoS and environmental goals, hardware selection and PUE reduction, energy-aware programming, green software engineering, energy-aware system deployment)
  • Use cases and open research challenges

Graph transformation: foundations and tools

Lecturers: Arend Rensink (University of Twente, The Netherlands)

Period: April 8th, 10th, 12th, 15th, 17th, 19th, 22th, 24th, 2024, from 9am to 11am, in “sala seminari est”

Graph transformation is a very widely applicable modelling paradigm, based on an understanding of a given domain in terms of graphs. Graphs can be used to capture very many concepts in an intuitive way, and have been used (within computer science) for modelling syntax, data structures, networks, or entire software architectures, but also (outside computer science) for biological structures, building architecture, social networks, infrastructures and much more. The transformation of these graphs, captured in a rule-based manner, then serves to describe the dynamics in such systems. This gives rise to a state space (in which the states are graphs and the transitions are rule applications) enabling the analysis of the potential behaviour of the modelled system, for instance by reachability analysis, invariant checking or model cheking.

In this course, students learn the foundations of graph transformation and how to apply it in the way described above. The foundations include the formalisation of graphs, the connection to first-order logic, and the embedding in category theory. This is supported by a tool called GROOVE (which stands for “GRaphs for Object-Oriented VErification”) in which students apply everything they have learned.

Provisional outline of classes:

  1. Theory: Graphs, morphisms and typing – Practice: Getting acquainted with GROOVE
  2. Theory: Graph gluing and rules – Practice: Solving a simple puzzle
  3. Theory: Categorical embedding
  4. Theory: Graph conditions – Practice: Usage scenarios for graph transformation
  5. Theory: Nested graph conditions – Practice: Programming language semantics
  6. Theory: Attribute algebras – Practice: Data manipulation in GROOVE
  7. Theory: Model checking – Practice: Model checking in GROOVE
  8. Practice: Parameters, control and regular expressions

Assessment will be a choice of: reading and presenting a research paper, or developing a graph transformation model in a given domain (in GROOVE).

Prerequisites are: a basic understanding of mathematical modelling, including first-order logic. Prior knowledge of category theory is not required.

Collective Machine Intelligence: Beyond an Agent-Centric View of AI

Lecturers: Antonio Carta (UNIPI), Vincenzo Lomonaco (UNIPI)

Period: April 2024

This course aims to explore the emerging field of collective machine intelligence, which studies how multiple artificial agents can interact, cooperate, and learn from each other in complex and dynamic environments. The course will cover the theoretical foundations and practical applications of collective machine intelligence, such as game theory, multi-agent decision making, continual learning, federated learning, swarm intelligence, and complex systems. The course will also showcase some of the current and future challenges and opportunities of collective machine intelligence in various domains, such as social networks, smart cities, robotics, and healthcare. By the end of the course, the students will be able to understand the key concepts and methods of collective machine intelligence, and apply them to design and implement intelligent systems that can leverage the collective wisdom and capabilities of multiple agents.

Big Data Analytics: Marine Data as a Case Study

Lecturers: Gianpaolo Coro (CNR-ISTI)

Schedule: May 6-9, 2024, from 9am to 1pm

This course will present practical methodologies for marine data analysis and modelling. The course will cover specific classes of problems in marine science and their corresponding solutions, adopting state-of-the-art computer science technologies and methodologies. The explained techniques will include:

  • Unsupervised approaches to discover patterns of habitat change and predict fishing patterns: Principal Component Analysis and Maximum Entropy for feature selection; KMeans, XMeans, DBScan, and Local Outlier Factor cluster analysis; Singular Spectrum Analysis for time series forecasting;
  • Supervised approaches for species distribution prediction and invasive species monitoring: Feed-Forward Artificial Neural Networks, Support Vector Machines, AquaMaps, Maximum Entropy;
  • Data mining techniques to detect hotspots of illegal fishing activity from vessel trajectory data.

These methods will be applied to marine data such as vessel transmitted data, species observation records, and catch and vessel time series that fall into the Big Data category. These data are crucial to safeguard food availability and economic welfare, which are fundamental to human life. For example, predicting the impact of climate change on species habitat distribution contributes to avoiding economic and biodiversity collapse due to sudden ecosystem change. Likewise, monitoring the effect of overfishing and illegal fishing on stocks and biodiversity helps prevent ecosystem and economic collapse.

The explained techniques currently address real use cases of the United Nations (FAO, UNESCO, UNEP, and others) for marine food and ecosystem safety and illustrate the new lines of research in this context. They are also general enough to be applied to Big Data of other domains. The analyzed data have general characteristics of Big Data such as constantly incrementing volume, vast heterogeneity and complexity, and unreliable content. For this reason, the methodologies will be illustrated in the context of the Open Science paradigm, characterized by the repeatability, reproducibility, and cross-domain reuse of all experimental phases.

The course will be interactive and made up of practical exercises. Attendees will use online (the D4Science platform, https://services.d4science.org) and offline (QGIS, Java-based MaxEnt) software to parametrize the models and run the experiments. A final exercise will be proposed as the course exam.

Course schedule:

Day 1 (4h) – Introduction to marine geospatial data and Open Science methodologies

Day 2 (4h) – Data selection techniques and pattern detection

Day 3 (4h) – Supervised modelling of species distributions and invasions

Day 4 (4h) – Data mining techniques for marine geospatial data analysis

Final Exam: A final exercise summarizing key methodologies will be proposed, where students will be asked to estimate the distribution of the giant squid (Architeuthis dux) in the Atlantic Ocean.

Programming Tools and Techniques in the Pervasive Parallelism Era

Lecturers: Marco Danelutto (UNIPI), Patrizio Dazzi (UNIPI)

Period: May 13-24, 2024

The course covers techniques and tools (already existing or that are in the process of being moved to mainstream) suitable to support the implementation of efficient parallel/distributed applications targeting small scale parallel systems as well as larger scale parallel and distributed systems, possibly equipped with different kind of accelerators. The course follows a methodological approach to provide a homogeneous overview of classical tools and techniques as well as of new tools and techniques specifically developed for new, emerging architectures and applicative domains. Perspectives in the direction of reconfigurable coprocessors and domain-specific architectures will also be covered.