The Long & Short Of Federated Machine Learning

11 January 2023

The Long & Short Of Federated Machine Learning

[ad_1]

This article gives a quick insight into what is federated machine learning and why it’s popular today.

A cloud refers to a server, public or private, accessible over a network, which is usually the internet. It generally has high processing power and storage, and is suitable for big computations. A cloud can be used to train AI models, but only when the data is available to it. On the other hand, since the cloud is generally a remote system, capturing data directly on it becomes difficult and not feasible. Capturing the data on local devices and transmitting it to the cloud does not always give real-time results. This is where the concept of federated learning comes in.

Federated learning promotes machine learning while the data is on the device. It handles a flexible architecture, which enables a secure process for sensitive data collection and model training. The world currently takes data privacy as an important responsibility. To introduce automation into fields like healthcare, biometrics, etc, real-time sensitive data is the core requirement. The important question therefore is: How do we train a model without collecting sensitive data from the users, storing it, and using it for training? To define optimal training for a machine learning model, relevance of data is an important aspect. Data is best when it comes directly from a source. But the permission for data collection becomes an issue.

With federated machine learning, model training can be centralised on a decentralised data feed. The model is trained on the source device, and the device configurations are measured to see if it is able to train the model or not. Data sources are selected based on how optimally they can provide the data. Once the model trains on the device, it sends the training results (not the data) to the server. In a similar fashion, training results from edge devices are sent to the server. Each device has a training threshold to avoid overlearning or accessibility of unique data results, also referred to as ‘differential privacy’. Simply put, the model ‘trains enough to be unknown’. Model memorisation will not reflect to a particular user or device in general when we get differential privacy into the picture.

The model files are removed from the device once the complete model training is performed, to ensure no violation of privacy takes place. Server-device communication is pipelined with secure aggregation, which lets the server combine the results that are encrypted and only decrypt the aggregates.

A secure aggregation protocol masks the training results and scrambles them in such a way that all the results add up to zero. Once the training is successful and the results are sent to the server, the testing is carried out on other devices which were selected as data sources but not used. To explain it in lay terms, the “device acts as a data sample, some devices are used for training and some for testing.”

So, where is federated machine learning being used? Facial recognition based applications, self-driving cars, concepts that involve reinforcement learning, and medical diagnoses are just a few of the fields it is being used in extensively today.

Fig. 1: Training model

Keeping data secure

As evident from the above discussion, federated learning models depend on a centralised server (generally, a cloud) and a decentralised data feed (generally, edge and fog devices). Since the data processed is sensitive in nature, security must be ensured for the same.
There are three main security goals: confidentiality, integrity, and availability.

Confidentiality implies that all the associated data, starting from the data feeds to even the model files, is stored securely and no unauthorised party can access it. Next, the data that is transmitted should not be tampered with or distorted in the process. This is what is meant by ensuring its integrity. Availability implies that the data and the model files should be accessible when needed.

Data can generally be secured by encrypting it and safely storing the encryption keys. Here, it is to be noted that data is to be encrypted mainly in transit when it is transmitted from the edge or fog device to the cloud. Remember that encrypted data may not be secure if the keys are not kept safely. So it is important to store them in key vaults and similar structures.

Here, it is advisable to use a public key cryptosystem, like RSA, where the server can announce its public keys and the edge devices can encrypt the outgoing data using these. The private keys, which are safely stored in the server, can be used to decrypt the same data when it reaches the server.

With the emergence of Web 3.0 and decentralisation of data, federated learning is a promising way to train AI models without hampering the user’s privacy. The data is processed locally on the user’s device and only the results are shared, that too in a secured manner. This technology can be implemented in fields where privacy is of utmost importance, such as in healthcare, banking and fintech, military areas, processing of biometric information, and so on.

This article was first published in August 2022 issue of Open Source For You magazine.

Aditya Mitra likes working deep into various levels of the network. His areas of interest are IoT, networking, and cybersecurity.

Gautam Galada is a deep learning researcher and likes working with emerging technologies in the field of AI.

[ad_2]

Source link