Skip to content

Federated Models

Introduction

The BOSS federated modeling capability operates on objects known as pipelines. Pipelines allow users to combine and parameterize components of their Machine Learning model. For example, a model may include a DataIO component (loading data into the model) which feeds into a Logistic Regression component. However, we have abstracted away some of the base layers of constructing a pipeline to make the BOSS user’s life easier. We build necessary components for loading a federated VDS into a pipeline (Readers & DataIO) and apply federated feature engineering components. We then hand the partially constructed pipeline to the user’s model python script as an argument. Once they select a federated VDS from the GUI when starting a training job, we automatically load and pass it into their modeling components. Essentially, this means that writing a BOSS federated model and uploading it to our system involves few simple steps:

  • Ensure VDS has a feature titled id , e.g. student.id
  • Set the model initiator's role & party ID on the pipeline. The initiator must be set to your local federate.
  • Set the active roles (guest/host/arbiter) participating in the job on the pipeline.
  • Instantiate, parameterize, and add remaining desired modeling components to the pipeline (such as a Logistic Regression & Evaluation component).
  • Specify federated as the model framework / library (as opposed to simple or advanced)

The user will then return the pipeline object & information about the role/party configuration. The completed pipeline is then submitted to the local federate’s configured federation server, thus completing the development and launching of a federated model training job from within the BOSS platform.

Common and known errors when developing federated models include:

  • improper role/party configuration in the pipeline by the user
  • improper parameterization of input/output between components

BOSS federated modeling currently supports horizontal datasets. BOSS federated modeling also supports vertical datasets consisting of labels on the guest machine. Note that Horizontal vs. Vertical is determined at model upload time. Each has been developed and tested to the standards of provided example pipelines. Examples illustrating how to use the federated modeling capability can be found within The BOSS Model Shop.

Main Function

The main function is intended to contain code for building federated pipeline models. The BMF sends args from the GUI, a partially completed Pipeline object with a Reader component, a DataIO component pre-configured with the user’s selected VDS, and components for any supported / selected federated feature engineering operations. It also provides a handle to the “final_component” for integration with the user’s modeling components inside of the main function. In turn, the main function must add components to the partially completed pipe, pulling the final_component.output.data as input to the new components being added, then return two objects:

  • The completed pipe object (uncompiled; no need to call pipe.compile())
  • A dictionary of role to party id list mappings; {"guest": [9999], "host": [10000], "arbiter": [9999]}
def main(args, pipe, final_component):
    """Function used by BMF for training and analyzing pipeline models.

        Args:
            args (list): List of parameters selected within the No Code client upon train job initialization.
            pipe (pipeline): A partially completed pipeline.
            final_component (component): A handle to the last component added to the partially completed Pipeline.

        Returns:
            Pipeline object (for training),
            Role / party id mapping dictionary object (for properly monitoring and handling the Pipeline)
    """

No Code Federated Model Visibility

After starting a federated model training, the BOSS platform actively retrieves metrics and status information from the local federate and surfaces them to the No Code client.

If the job fails, any available error messages will be surfaced to the user through the Training object status block just as it occurs with non-federated models (i.e., PyTorch, TensorFlow, Sklearn.) Note that as federated models train within separate servers running alongside the BOSS backend and separate federates, it is possible to reach a failure mode within those servers/federates requiring manual inspection of the deployment.

If the job succeeds, users will see metric data and plots generated by the model training displayed within the No Code client. The available metrics and plots depend on the type of model trained and the data accessible. Examples include loss curves, accuracy curves, ROC curves, mean squared error, precision & recall, and several other metrics. Note that unavailable information includes metrics related to portions of your federated model trained in other federates, but users may log into the other federates via JedAI to see available data.

Example Federated Model

from pipeline.component.evaluation import Evaluation
from pipeline.component.homo_lr import HomoLR
from pipeline.interface.data import Data

def main(args, pipe, final_component):
    try:
        guest = 9999
        host = 10000
        arbiter = 9999

        # set job initiator
        pipe.set_initiator(role='guest', party_id=guest)

        # set participants information
        pipe.set_roles(guest=guest, host=host, arbiter=arbiter)

        param = {
            "penalty": "L2",
            "optimizer": "sgd",
            #            "tol": 1e-05,
            "alpha": 0.01,
            "max_iter": 5,  # was 30
            "early_stop": "diff",
            "batch_size": -1,
            "learning_rate": 0.15,
            "decay": 1,
            "decay_sqrt": True,
            "init_param": {
                "init_method": "zeros"
            },
            "encrypt_param": {
                "method": None
            },
            "cv_param": {
                "n_splits": 4,
                "shuffle": True,
                "random_seed": 33,
                "need_cv": False
            }
        }

        # Add the logistic regression component
        horizontal_lr_0 = HomoLR(name='horizontal_lr_0', **param)
        pipe.add_component(horizontal_lr_0, data=Data(train_data=final_component.output.data))

        # Add the evaluation component
        evaluation_0 = Evaluation(name="evaluation_0", eval_type="binary")
        pipe.add_component(evaluation_0, data=Data(data=horizontal_lr_0.output.data))

    except Exception as err:
        raise

    return pipe, {"guest": [guest], "host": [host], "arbiter": [arbiter]}