Grant / January 2020

Privacy-preserving and Decentralized Federated Learning

Abstract

Machine learning technology is developing rapidly and has been continuously changing our daily life. However, a major limiting factor that hinders many machine learning tasks is the need of huge and diverse training data. Crowdsourcing has been shown effective to collect data labels with a centralized server. The emergence of blockchain technology makes a decentralized platform possible, which provides better reliability and discoverability. While blockchain provides an ideal platform for crowdsourcing, all data become publicly available once being put onto today¬ís blockchain platform such as Ethereum. This could discourage users from contributing their data, which may contain highly sensitive information, e.g., medical records. In this proposal, we aim to design a blockchain-based data sharing and training platform, that allows participants to contribute data and train models in a fully decentralized and privacy-preserving way. Compared with solutions that naively run training algorithms on blockchain, our proposal has the following advantages. (1) Efficiency: we borrow ideas from federated machine learning. Instead of contributing raw data, each participant locally trains a model and only contributes model parameters to the blockchain; blockchain nodes simply aggregate the contributed models to build a global model. (2) Privacy: we adapt a secure aggregation protocol to hide each contributed model from other participants. Moreover, differential privacy is applied to protect the global model from revealing a particular client’s information. Finally, we use system log anomaly detection as a case study to demonstrate the wide applicability of the proposed platform.

Research Findings and Presentations