Posted on

UBITECH kicks off the DataCloud Research and Innovation Action on Big Data Pipeline Lifecycle Management across the Computing Continuum

UBITECH participates at the virtual kick-off meeting hosted by SINTEF (January 19-21, 2021) of the DataCloud Research and Innovation Action, officially started on January 1st, 2021. The project is funded by European Commission under Horizon 2020 Programme (Grant Agreement No. 101016835) and spans on the period January 2021 – December 2023. The DataCloud project provides a novel paradigm covering the complete lifecycle of managing Big Data pipelines through discovery, design, simulation, provisioning, deployment, and adaptation across the Computing Continuum. Big Data pipelines in DataCloud interconnect the end-to-end industrial operations of collecting pre-processing and filtering data, transforming and delivering insights, training simulation models, and applying them in the cloud to achieve a business goal. DataCloud delivers a toolbox of new languages, methods, infrastructures, and prototypes for discovering, simulating, deploying, and adapting Big Data pipelines on heterogeneous and untrusted resources. DataCloud separates the design from the run-time aspects of Big Data pipeline deployment, empowering domain experts to take an active part in their definitions.

Within DataCloud, UBITECH will implement the Flexible and Automated Big Data Pipeline Deployment tool (DEP-PIPE). DEP-PIPE provides means for orchestration of the entire pipeline deployment process. It adaptively responds in relation to significant changes in the pool of available resources during pipeline execution and identifies provisioned resources that do not provide good performance for a given task in the pipeline. DEP-PIPE replaces low-performing resources, e.g., VMs or containers that no longer mee SLO requirements, or reconfigures existing ones (increase number of CPUs to a VM running a message queue broker) to reduce the negative effects of infrastructure drifts in the Computing Continuum. Finally, DEP-PIPE continuously monitors the performance of the provisioned resources with a Prometheus based monitoring system used by the event detection engine during deployment and orchestration processes. DEP-PIPE builds upon the MAESTRO orchestrator, developed by UBITECH. In DataCloud, MAESTRO will be extended to support the orchestration of Big Data pipelines, utilize the trusted resources available in the Cloud/Fog/Edge continuum, and include also new security and monitoring aspects to deal with specificities of Big Data pipelines.