The short paper entitled “Enabling the Big Data Pipelines on the Computing Continuum”, co-authored by UBITECH, has been accepted by the 15th International Conference on Research Challenges in Information Science (RCIS 2021), held online between 11-14 May, 2021. Mr. Giannis Ledakis, Head of the Computing Systems, Software and Services research group of UBITECH, and his co-authors present the overall vision of the DataCloud Research and Innovation Action, which is the creation of a novel paradigm for Big Data pipeline processing over heterogeneous resources encompassing the Computing Continuum, covering the complete lifecycle of managing Big Data pipelines.
Big Data pipelines are composite pipelines for processing data with non-trivial properties and characteristics, commonly referred to as the Vs of Big Data (e.g., volume, velocity, variety, veracity, value, etc.). Big Data pipelines in DataCloud interconnect the end-to-end industrial operations of collecting pre-processing and filtering data, transforming and delivering insights, training simulation models, and applying them in the cloud to achieve a business goal. DataCloud identifies and covers six lifecycle phases for the management of the big data pipelines: (a) Discovering Big Data pipelines from various data sources; (b) Defining pipelines featuring an abstraction level suitable for pure data processing; (c) Simulating pipelines to evaluate performance; (d) Secure provisioning of (trusted and untrusted) resources.; (e) Deployment of pipelines across the provisioned resources; and (f) Run-time adaptation of computational resources. Therefore, DataCloud will deliver a toolbox of new languages, methods, infrastructures, and prototypes for discovering, simulating, deploying, and adapting Big Data pipelines on heterogeneous and untrusted resources.
For more info, please visit the DataCloud project website (https://datacloudproject.eu/) and the RCIS 2021 webpage (https://www.rcis-conf.com/rcis2021/ResearchProjectsAC.php)