Posted on

Scientific paper on privacy preserving data federation for trainable, queryable and actionable data at DBML 2023

UBITECH publishes the paper titled as “Privacy-preserving Data Federation for Trainable, Queryable and Actionable Data” in the International Workshop on Databases and Machine Learning (DBML 2023). DBML 2023 is held in conjunction with the 39th IEEE International Conference on Data Engineering, which addresses research issues in designing, building, managing, and evaluating advanced data-intensive systems and applications. It is a leading forum for researchers, practitioners, developers, and users to explore cutting-edge ideas and to exchange techniques, tools, and experiences.

After the increased adoption of Machine Learning (ML) in various applications and disciplines, a synergy between the database (DB) systems and ML communities emerged. Steps involved in an ML pipeline, such as data preparation and cleaning, feature engineering and management of the ML lifecycle, can benefit from research conducted by the data management community. For example, the management of the ML lifecycle requires mechanisms for modeling, storing and querying ML artifacts. Moreover, in many use cases pipelines require a mixture of relational and linear algebra operators raising the question of whether a seamless integration between the two algebras is possible.

Privacy preservation over federated data has also gained its momentum in the era of securing users’ sensitive information. Combining and analyzing sensitive information from multiple data sources offers considerable potential for knowledge discovery. However, there are different constraints which should be fulfilled, such as what are the data to be preserved; what is meant by privacy preservation; what are the constraints on federated computing; and what are the secure mechanisms to train, query and explore data without accuracy loss. Our paper authored by Stavroula Iatropoulou, Theodora Anastasiou, Sophia Karagiorgou, Petros Petrou, Dimitrios Alexandrou, and Thanassis Bouras introduces the Protected Federated Query Engine which applies Fully Homomorphic Encryption and querying processing over decentralized data sources of diverse schemas and granularities to efficiently collect, align, aggregate and serve Artificial Intelligence Operations (AIOps) and Data Operations (DataOps) without sacrificing accuracy and efficiency.

The contributions of the paper are, as follows:

  • The presentation of a reproducible, governance- and provenance-rich microservices prototype for protected data federation in the support of trainable, queryable and actionable data operations which are preserving privacy across the complete data path;
  • An adaptive multi-tenant engine capable of concurrently running hundreds of memory, I/O, and CPU-intensive queries on top of encrypted data, and scaling to multiple worker nodes while preserving privacy to sensible information and efficiently utilizing computing resources;
  • A mechanism for Fully Homomorphic Encryption (FHE) which allows cloud servers, edge, and fog nodes to perform DataOps and AIOps over encrypted data, while only authorized clients (i.e., users and applications) are able to see the decrypted data; and
  • A Protected Federated Query Engine for decentralized query confidentiality which ensures that cross-domain knowledge cannot be generated and propagated between different nodes, resources, and administrative domains.