Nowadays Cloud Computing infrastructures are being challenged by an increasing demand for evolved cloud services characterised by heterogeneous performance requirements including real-time, data-intensive, and highly dynamic workloads. The classical way to deal with dynamicity is to scale computing and network resources horizontally. However, these techniques can be way more effective when coupled with mechanisms ensuring efficient and predictable execution of software components in distributed, shared & multi-tenant infrastructures. Such mechanisms may span across the multitude of layers or planes characterizing a cloud infrastructure: from ensuring temporal isolation at the OS kernel and/or hypervisor level, to intelligent, QoS-aware mechanisms for VM placement and migration, to preventing unstable networking performance by avoiding cross-talks and resources saturation, through an appropriate QoS-aware management of the network and applications’ flows.
The RETIS started contributing with applied research in this area in the context of the IRMOS European Project, collaborating with prestigious other academic & industrial institutions in Europe, such as the High-Performance Computing Center of Stuttgart, the National Technical University of Athens, the historical Bell Labs and Telefonica I+D. At the heart of the IRMOS achievements, we can find [CCK10] the Intelligent Service-Oriented Infrastructure (ISONI – illustrated in the picture below), an integrated software capable of deploying a multitude of complex Virtual Service Networks (VSNs) with end-to-end performance guarantees. A VSN is essentially a graph-alike processing service with an arbitrary topology made of multiple Service Component (SCs) interconnected by Virtual Links (VLs), where all elements of the topology can be annotated with precise QoS requirements.

In the IRMOS project, the RETIS provided the IRMOS real-time scheduler [CCF09], a scheduler guaranteeing CPU time to, and performance isolation among, co-hosted VMs on a system, when using the KVM hosted hypervisor.

The RETIS contributed to the Juniper project with kernel mechanisms to improve isolation and timing predictabilities to a Real-Time Java platform, and the resulting framework supports a range of high-performance Intelligent Information Management application domains that seek real-time streaming data processing or real-time access to stored data.

More recently, this topic has been further investigated in the context of a long-standing industrial collaboration with Ericsson in Stockholm (Sweden), where real-time scheduling strategies, and specifically the SCHED_DEADLINE real-time scheduler in the mainline Linux kernel (realized by the RETIS in collaboration with Evidence [LSA16]) based on EDF and the Constant Bandwidth Server) has been successfully applied [CAM19] for achieving predictable performance and controllable latency in real-time packet-processing in the context of Virtualized Network Functions (VNFs) as needed in the Virtualized Radio Access Network (vRAN) domain. The approach has been extended by adopting HCBS, a hierarchical extension of SCHED_DEADLINE able to guarantee multi-core reservations and to compose with the POSIX fixed priority scheduler in the Linux kernel, so to provide reservations to complex, multi-threaded software components. Specifically, as sketched out in the figure below, HCBS reservations have been used to provide scheduling guarantees to entire LXC containers hosting packet processing services (in the context of vRAN) deployed through a modified OpenStack cloud orchestration engine [CAM21] used as NFV Virtual Infrastructure Manager (VIM), and using Tacker as the OpenStack component implementing NFV Management and Orchestration (MANO) descriptors.

This resulted in a NFV infrastructure [CAM21] with the ability to deploy containers with precise real-time processing and QoS-aware networking capabilities, so to provide end-to-end performance guarantees to, and temporal isolation among, the deployed services, essentially implementing an effective NFV infrastructure slicing mechanism.

An additional important effort in this area where the RETIS is actively working, is the one of embedding differentiated per-client performance levels when accessing NoSQL data stores for cloud computing infrastructures that are shared among a number of different clients with possibly heterogeneous timing requirements when accessing their data. For example, interactive applications or services might need to fetch data and complete queries within timing constraints that are compatible with a well-perceived end-to-end interaction with remote users. Other services or applications might have more relaxed timing requirements or just might correspond to classes of users with a different priority (e.g., users with gold vs bronze access levels). Differently from the traditional domain of real-time databases where this problem has been studied for some time with reference to traditional single-instance and highly reliable relational database management systems (RDBMSs), the cloud computing context mandates to rethink and re-engineer new solutions for NoSQL data stores used in cloud infrastructures, which are massively distributed key-value stores with a quite flexible and versatile schema for data management and access.

For example, the RETIS has recently proposed [ACP21] RT-MongoDB, a modification to the well-known MongoDB open-source NoSQL data store, with a patch introducing per-client prioritized service into the system. As visible in the figure below, experimental results conducted with synthetic workloads highlighted that high-priority clients exhibit much lower and more stable response times for their requests, compared to normal-priority clients.

References

[LSA16] Lelli, J., Scordino, C., Abeni, L., and Faggioli, D. (2016) Deadline scheduling in the Linux kernel. Softw. Pract. Exper., 46: 821– 839. doi: 10.1002/spe.2335

[CCK10] T. Cucinotta, F. Checconi, G. Kousiouris, D. Kyriazis, T. Varvarigou, A. Mazzetti, Z. Zlatev, J. Papay, M. Boniface, S. Berger, D. Lamp, T. Voith, M. Stein, “Virtualised e-Learning with Real-Time Guarantees on the IRMOS Platform,” in Proceedings of the IEEE International Conference on Service-Oriented Computing and Applications (SOCA 2010), Perth, Australia, December 2010

[CCF09] F. Checconi, T. Cucinotta, D. Faggioli, G. Lipari, “Hierarchical Multiprocessor CPU Reservations for the Linux Kernel,” in Proceedings of the 5th International Workshop on Operating Systems Platforms for Embedded Real-Time Applications (OSPERT 2009), Dublin, Ireland, June 2009

[CAM19] T. Cucinotta, L. Abeni, M. Marinoni, A. Balsini, C. Vitucci. “Reducing Temporal Interference in Private Clouds through Real-Time Containers,” in Proceedings of the 2019 IEEE International Conference on Edge Computing (IEEE EDGE 2019), July 8-13, 2019, Milan, Italy

[CAM21] T. Cucinotta, L. Abeni., M. Marinoni, R. Mancini and C. Vitucci. “Strong Temporal Isolation among Containers in OpenStack for NFV Services,” (to appear on) IEEE Transactions on Cloud Computing, in print, 2021

[ACP21] R. Andreoli, T. Cucinotta, D. Pedreschi. “RT-MongoDB: a NoSQL database with differentiated performance,” in Proceedings of the 11th International Conference on Cloud Computing and Services Science (CLOSER 2021), April 28-30, 2021, Prague, Czech Republic (on-line event due to Covid-19)