We experimentally demonstrate a framework and use cases for full lifecycle management of AI-Agent-assisted digital twin optical networks. We achieve 100% accuracy of API-calling by AI-Agent, 7x speed-up alarm-log analysis, and 83% hardware computation resources reduction through LoRA fine-tuning.
ECOC
On High-Power Optical Amplification in Hollow Core Fibers for Energy Efficiency and Network Throughput Maximization
Giovanni Simone Sticca, Memedhe Ibrahimi, Nicola Di Cicco, and 2 more authors
In 50th European Conference on Optical Communications (ECOC), 2024
We investigate how to optimally set the EDFA output power in Hollow Core Fiber (HCF) networks. We show that, using high-power amplification, HCF allows 2.4x increase in throughput and 52% decrease in transponders along with a 41% reduction in EDFAs power consumption per Tbps.
ECOC
Open Implementation of a Large Language Model Pipeline for Automated Configuration of Software-Defined Optical Networks
Nicola Di Cicco, Memedhe Ibrahimi, Sebastian Troia, and 2 more authors
In 50th European Conference on Optical Communications (ECOC), 2024
We leverage LLMs to develop a natural-language interface to a software-defined optical network testbed. Results show over 80% accuracy in translating human intent to the appropriate network configurations. Our code is public.
We investigate classifying hardware failures in microwave networks via Machine Learning (ML). Although ML-based approaches excel in this task, they usually provide only hard failure predictions without guarantees on their reliability, i.e., on the probability of correct classification. Generally, accumulating data for longer time horizons increases the model’s predictive accuracy. Therefore, in real-world applications, a trade-off arises between two contrasting objectives: i) ensuring high reliability for each classified observation, and ii) collecting the minimal amount of data to provide a reliable prediction. To address this problem, we formulate hardware failure-cause identification as an As-Soon-As-Possible (ASAP) selective classification problem where data streams are sequentially provided to an ML classifier, which outputs a prediction as soon as the probability of correct classification exceeds a user-specified threshold. To this end, we leverage Inductive and Cross Venn-Abers Predictors to transform heuristic probability estimates from any ML model into rigorous predictive probabilities. Numerical results on a real-world dataset show that our ASAP framework reduces the time-to-predict by 8x compared to the state-of-the-art, while ensuring a selective classification accuracy greater than 95%. The dataset utilized in this study is publicly available, aiming to facilitate future investigations in failure management for microwave networks.
We consider the problem of classifying hardware failures in microwave networks given a collection of alarms using Machine Learning (ML). While ML models have been shown to work extremely well on similar tasks, an ML model is, at most, as good as its training data. In microwave networks, building a good-quality dataset is significantly harder than training a good classifier: annotating data is a costly and time-consuming procedure. We, therefore, shift the perspective from a Model-Centric approach, i.e., how to train the best ML model from a given dataset, to a Data-Centric approach, i.e., how to make the best use of the data at our disposal. To this end, we explore two orthogonal Data-Centric approaches for hardware failure identification in microwave networks. At training time, we leverage synthetic data generation with Conditional Variational Autoencoders to cope with extreme data imbalance and ensure fair performance in all failure classes. At inference time, we leverage Batch Uncertainty-based Active Learning to guide the data annotation procedure of multiple concurrent domain-expert labelers and achieve the best possible classification performance with the smallest possible training dataset. Illustrative experimental results on a real-world dataset show that our Data-Centric approaches allow for training top-performing models with 4.5x less annotated data, while improving the classifier’s F1-Score by 2.5% in a condition of extreme data scarcity. Finally, for the first time to the best of our knowledge, we make our dataset (curated by microwave industry experts) publicly available, aiming to foster research in data-driven failure management.
2023
ACM SIGCOMM
Poster: Continual Network Learning
Nicola Di Cicco, Amir Al Sadi, Chiara Grasselli, and 3 more authors
In Proceedings of the ACM SIGCOMM 2023 Conference, New York, NY, USA, 2023
We make a case for in-network Continual Learning as a solution for seamless adaptation to evolving network conditions without forgetting past experiences. We propose implementing Active Learning-based selective data filtering in the data plane, allowing for data-efficient continual updates. We explore relevant challenges and propose future research directions.
ECOC
Throughput Maximization in (C+L+S) Networks with Incremental Deployment of HFAs and 3Rs
Giovanni Simone Sticca, Memedhe Ibrahimi, Nicola Di Cicco, and 2 more authors
In 49th European Conference on Optical Communications (ECOC), 2023
We optimize HFA and 3R deployment to avoid lightpath degradation and maximize throughput in (C+L+S) networks. We show that our proposed strategies can lead up to around 64% fewer HFAs and 20% higher throughput compared to baseline solutions.
Deep Reinforcement Learning (DRL) is being investigated as a competitive alternative to traditional techniques for solving network optimization problems. A promising research direction lies in enhancing traditional optimization algorithms by offloading low-level decisions to a DRL agent. In this study, we consider how to effectively employ DRL to improve the performance of Local Search algorithms, i.e., algorithms that, starting from a candidate solution, explore the solution space by iteratively applying local changes (i.e., moves), yielding the best solution found in the process. We propose a Local Search algorithm based on lightweight Deep Reinforcement Learning (DeepLS) that, given a neighborhood, queries a DRL agent for choosing a move, with the goal of achieving the best objective value in the long term. Our DRL agent, based on permutation-equivariant neural networks, is composed by less than a hundred parameters, requiring only up to ten minutes of training and can evaluate problem instances of arbitrary size, generalizing to networks and traffic distributions unseen during training. We evaluate DeepLS on two illustrative NP-Hard network routing problems, namely OSPF Weight Setting and Routing and Wavelength Assignment, training on a single small network only and evaluating on instances 2x-10x larger than training. Experimental results show that DeepLS outperforms existing DRL-based approaches from literature and attains competitive results with state-of-the-art metaheuristics, with computing times up to 8x smaller than the strongest algorithmic baselines.
IEEE NetSoft
DRL-FORCH: A Scalable Deep Reinforcement Learning-based Fog Computing Orchestrator
Nicola Di Cicco, Gaetano Francesco Pittalà, Gianluca Davoli, and 4 more authors
In 2023 IEEE 9th International Conference on Network Softwarization (NetSoft), 2023
We consider the problem of designing and training a neural network-based orchestrator for fog computing service deployment. Our goal is to train an orchestrator able to optimize diversified and competing QoS requirements, such as blocking probability and service delay, while potentially supporting thousands of fog nodes. To cope with said challenges, we implement our neural orchestrator as a Deep Set (DS) network operating on sets of fog nodes, and we leverage Deep Reinforcement Learning (DRL) with invalid action masking to find an optimal trade-off between competing objectives. Illustrative numerical results show that our Deep Set-based policy generalizes well to problem sizes (i.e., in terms of numbers of fog nodes) up to two orders of magnitude larger than the ones seen during the training phase, outperforming both greedy heuristics and traditional Multi-Layer Perceptron (MLP)-based DRL. In addition, inference times of our DS-based policy are up to an order of magnitude faster than an MLP, allowing for excellent scalability and near real-time online decision-making.
IEEE ICC
Uncertainty-Aware QoT Forecasting in Optical Networks with Bayesian Recurrent Neural Networks
Nicola Di Cicco, Jacopo Talpini, Memedhe Ibrahimi, and 2 more authors
In 2023 IEEE International Conference on Communications (ICC): Optical Networks and Systems Symposium (IEEE ICC’23 - ONS Symposium), 2023
We consider the problem of forecasting the Quality-of-Transmission (QoT) of deployed lightpaths in a Wavelength Division Multiplexing (WDM) optical network. QoT forecasting plays a determinant role in network management and planning, as it allows network operators to proactively plan maintenance or detect anomalies in a lightpath. To this end, we leverage Bayesian Recurrent Neural Networks for learning uncertainty-aware probabilistic QoT forecasts, i.e., for modelling a probability distribution of the QoT over a time horizon. We evaluate our proposed approach on the open-source Microsoft Wide Area Network (WAN) optical backbone dataset. Our illustrative numerical results show that our approach not only outperforms state-of-the-art models from literature, but also predicts intervals providing near-optimal empirical coverage. As such, we demonstrate that uncertainty-aware probabilistic modelling enables the application of QoT forecasting in risk-sensitive application scenarios.
EuCAP
Machine Learning-Based Line-Of-Sight Prediction in Urban Manhattan-Like Environments
Nicola Di Cicco, Simone Del Prete, Silvi Kodra, and 5 more authors
In 2023 17th European Conference on Antennas and Propagation (EuCAP), 2023
This paper considers the problem of predicting whether or not a transmitter and a receiver are in Line-of-Sight (LOS) condition. While this problem can be easily solved using a digital urban database and applying ray tracing, we consider the scenario in which only few high-level features descriptive of the propagation environment and of the radio link are available. LOS prediction is modelled as a binary classification Machine Learning problem, and a baseline classifier based on Gradient Boosting Decision Trees (GBDT) is proposed. A synthetic ray-tracing dataset of Manhattan-like topologies is generated for training and testing a GBDT classifier, and its generalization capabilities to both locations and environments unseen at training time are assessed. Results show that the GBDT model achieves good classification performance and provides accurate LOS probability modelling. By estimating feature importance, it can be concluded that the model learned simple decision rules that align with common sense.
2022
BalkanCom
Calibrated Probabilistic QoT Regression for Unestablished Lightpaths in Optical Networks
Nicola Di Cicco, Mëmëdhe Ibrahimi, Cristina Rottondi, and 1 more author
In 2022 International Balkan Conference on Communications and Networking (BalkanCom), 2022
Quality-of-Transmission (QoT) regression of unestablished lightpaths is a fundamental problem in Machine Learning applied to optical networks. Even though this problem is well-investigated in current literature, many state-of-the-art approaches either predict point-estimates of the QoT or make simplifying assumptions about the QoT distribution. Because of this, during lightpath deployment, an operator might take either overly-aggressive or overly-conservative decisions due to biased predictions. In this paper, we leverage state-of-the-art Gradient Boosting Decision Tree (GBDT) models and recent advances in uncertainty calibration to perform QoT probabilistic regression for unestablished lightpaths. Calibration of a regression model allows for an accurate modeling of the QoT Cumulative Distribution Function (CDF) without any prior assumption on the QoT distribution. In our illustrative experimental results, we show that our calibrated GBDT model’s predictions provide accurate confidence interval estimates, even when only few samples per lightpath configuration are available at training time.
Deep Reinforcement Learning (DRL) is rising as a promising tool for solving optimization problems in optical networks. Though studies employing DRL for solving static optimization problems in optical networks are appearing, assessing strengths and weaknesses of DRL with respect to state-of-the-art solution methods is still an open research question. In this work, we focus on Routing and Wavelength Assignment (RWA), a well-studied problem for which fast and scalable algorithms leading to better optimality gaps are always sought for. We develop two different DRL-based methods to assess the impact of different design choices on DRL performance. In addition, we propose a Multi-Start approach that can improve the average DRL performance, and we engineer a shaped reward that allows efficient learning in networks with high link capacities. With Multi-Start, DRL gets competitive results with respect to a state-of-the-art Genetic Algorithm with significant savings in computational times. Moreover, we assess the generalization capabilities of DRL to traffic matrices unseen during training, in terms of total connection requests and traffic distribution, showing that DRL can generalize on small to moderate deviations with respect to the training traffic matrices. Finally, we assess DRL scalability with respect to topology size and link capacity.
ONDM
A Reinforcement Learning-based Dynamic Bandwidth Allocation for XGS-PON Networks
Abdullah Quran, Sebastian Troia, Omran Ayoub, and 2 more authors
In 2022 International Conference on Optical Network Design and Modeling (ONDM), 2022
Time-division-multiplexing passive optical networks (TDM-PONs), with their massive deployment worldwide, are considered a fundamental technology for supporting not only traditional Internet broadband services, but also for new emerging 5G latency-sensitive services, such as Ultra-Reliable and Low Latency Communications (URLLC). Traditional dynamic bandwidth allocation (DBA) mechanisms, currently used to allocate network resources in TDM-PONs, are not suited to meet the requirements of these new services with strict latency requirements, as they use a polling mechanism which can result in a high queuing delay and ultimately violate URLLC latency requirements. In this work, we propose a new predictive-based DBA mechanism for Gigabit Symmetrical PON (XGS-PON) that allows to reduce the latency to fulfill requirements of emerging latency-sensitive services. Our solution employs reinforcement learning (RL) to predict the ingress buffer occupancy of ONUs in the next DBA cycle. Results show that the proposed RL method outperforms traditional DBA approaches in terms of upstream delay while maintaining similar frame loss ratio.
Artificial Intelligence (AI) has demonstrated superhuman capabilities in solving a significant number of tasks, leading to widespread industrial adoption. For in-field network-management application, AI-based solutions, however, have often risen skepticism among practitioners as their internal reasoning is not exposed and their decisions cannot be easily explained, preventing humans from trusting and even understanding them. To address this shortcoming, a new area in AI, called Explainable AI (XAI), is attracting the attention of both academic and industrial researchers. XAI is concerned with explaining and interpreting the internal reasoning and the outcome of AI-based models to achieve more trustable and practical deployment. In this work, we investigate the application of XAI for network management, focusing on the problem of automated failure-cause identification in microwave networks. We first introduce the concept of XAI, highlighting its advantages in the context of network management, and we discuss in detail the concept behind Shapley Additive Explanations (SHAP), the XAI framework considered in our analysis. Then, we propose a framework for a XAI-assisted ML-based automated failure-cause identification in microwave networks, spanning model’s development and deployment phases. For the development phase, we show how to exploit SHAP for feature selection and how to leverage SHAP to inspect misclassified instances during model’s development process, and how to describe model’s global behavior based on SHAP’s global explanations. For the deployment phase, we propose a framework based on predictions uncertainty to detect possibly wrong predictions that will be inspected through XAI.
Resource optimization in 5G Radio Access Networks (5G-RAN) has to face the dynamics over time in networks with increasing numbers of nodes and virtual network functions. In this context, multiple objectives need to be jointly optimized, and key application requirements such as latency must be enforced. In addition, virtual network functions realizing baseband processing are subject to failures of the cloud infrastructure, requiring an additional level of reliability. Overall, this is a complex problem to solve, requiring fast algorithms to cope with dynamic networks while avoiding resource overprovisioning. This paper considers the problem of optimal virtual function placement in 5G-RAN with reliability against a single DU Hotel failure and proposes a solution that takes service dynamics into account. Firstly, the joint optimization of the total number of DU Hotels, of the RU–DU latency and of the backup DU sharing in a static traffic scenario is considered, and the DUOpt algorithm, based on Lexicographic Optimization, is proposed for solving efficiently this multi-objective problem. DUOpt splits the multi-objective problem into smaller Integer Linear Programming (ILP) subproblems that are sequentially solved, adopting for each one the most effective methodology to reduce the total execution time. The proposed DUOpt algorithm is extensively benchmarked to show its effectiveness in optimization of medium to large size networks: in particular, it is shown to greatly outperform an aggregate multi-objective approach, being able to compute optimal or close to optimal solutions for networks of several tens of nodes in computing times of a few seconds. Then, the problem is extended to a dynamic traffic scenario in which optimization is performed over time. In this context, in addition to the aforementioned objectives, the total number of network function migrations induced by multiple reoptimizations must be kept to the minimum. For solving efficiently this problem the DUMig algorithm is proposed, which extends and improves DUOpt. Reoptimization over a time horizon of one day in an illustrative dynamic traffic scenario is performed to evaluate the proposed DUMig algorithm against DUOpt, the latter being oblivious of the traffic dynamics. DUMig shows remarkable savings in the total number of migrations (above 86.1% for primary virtual functions and 83% for backup virtual functions) compared to DUOpt, while preserving near-optimal resource assignment.
2021
DRCN
Scalable Multi-objective Optimization of Reliable Latency-constrained Optical Transport Networks
Nicola Di Cicco, Valentina Cacchiani, and Carla Raffaelli
In 2021 17th International Conference on the Design of Reliable Communication Networks (DRCN), 2021
In the evolving scenario of 5G end-to-end networks, optical transport networks provide the connectivity between the mobile edge and the mobile core network. According to the functional decoupling of the base station into the Remote Radio Unit (RRU) and the Baseband Unit (BBU), the latter can be virtualized into a cloud computing platform to access the mobile core. As a consequence, BBU virtual network functions related to different RRUs can be centralized and replicated in a subset of the nodes of the transport network with the aim of optimized reliable design.In this paper a scalable methodology, based on lexicographic optimization, is proposed for the solution of a multi-objective optimization problem to achieve, among other goals, the minimization of the number of active nodes in the transport network while supporting reliability and meeting latency constraints. The proposed solution method is compared to an aggregate optimization approach, showing that the former is capable of proving the optimality of the most relevant components of the multi-objective function (minimization of the number of active nodes and of the number of hops) for instances of medium size, and finds better solutions for instances with a larger number of nodes, namely several tens. The computing times to find an optimal solution for the most relevant objectives are much shorter than those required to solve the aggregate model, even for networks of several tens of nodes.