Federated Models
Introduction
The BOSS federated modeling capability operates on objects known as pipelines. Pipelines allow users to combine and parameterize components of their Machine Learning model. For example, a model may include a DataIO component (loading data into the model) which feeds into a Logistic Regression component. However, we have abstracted away some of the base layers of constructing a pipeline to make the BOSS user’s life easier. We build necessary components for loading a federated VDS into a pipeline (Readers & DataIO) and apply federated feature engineering components. We then hand the partially constructed pipeline to the user’s model python script as an argument. Once they select a federated VDS from the GUI when starting a training job, we automatically load and pass it into their modeling components. Essentially, this means that writing a BOSS federated model and uploading it to our system involves few simple steps:
- Ensure VDS has a feature titled
id
, e.g.student.id
- Set the model
initiator's
role
&party ID
on the pipeline. The initiatormust
be set to your local federate. - Set the active roles (
guest/host/arbiter
) participating in the job on the pipeline. - Instantiate, parameterize, and add remaining desired modeling components to the
pipeline
(such as a Logistic Regression & Evaluation component). - Specify
federated
as the model framework / library (as opposed tosimple
oradvanced
)
The user will then return the pipeline object & information about the role/party configuration. The completed pipeline is then submitted to the local federate’s configured federation server, thus completing the development and launching of a federated model training job from within the BOSS platform.
Common and known errors when developing federated models include:
- improper
role/party
configuration in thepipeline
by the user - improper parameterization of input/output between components
BOSS federated modeling currently supports horizontal datasets. BOSS federated modeling also supports vertical datasets consisting of labels on the guest machine. Note that Horizontal vs. Vertical is determined at model upload time. Each has been developed and tested to the standards of provided example pipelines. Examples illustrating how to use the federated modeling capability can be found within The BOSS Model Shop.
Main Function
The main
function is intended to contain code for building federated pipeline models. The BMF sends args from the GUI, a partially completed Pipeline object with a Reader
component, a DataIO
component pre-configured with the user’s selected VDS, and components for any supported / selected federated feature engineering operations. It also provides a handle to the “final_component” for integration with the user’s modeling components inside of the main
function. In turn, the main
function must add components to the partially completed pipe
, pulling the final_component.output.data
as input to the new components being added, then return two objects:
- The completed pipe object (uncompiled; no need to call pipe.compile())
- A dictionary of
role
toparty id
list mappings;{"guest": [9999], "host": [10000], "arbiter": [9999]}
def main(args, pipe, final_component):
"""Function used by BMF for training and analyzing pipeline models.
Args:
args (list): List of parameters selected within the No Code client upon train job initialization.
pipe (pipeline): A partially completed pipeline.
final_component (component): A handle to the last component added to the partially completed Pipeline.
Returns:
Pipeline object (for training),
Role / party id mapping dictionary object (for properly monitoring and handling the Pipeline)
"""
No Code Federated Model Visibility
After starting a federated model training, the BOSS platform actively retrieves metrics and status information from the local federate and surfaces them to the No Code client.
If the job fails, any available error messages will be surfaced to the user through the Training object status block just as it occurs with non-federated models (i.e., PyTorch, TensorFlow, Sklearn.) Note that as federated models train within separate servers running alongside the BOSS backend and separate federates, it is possible to reach a failure mode within those servers/federates requiring manual inspection of the deployment.
If the job succeeds, users will see metric data and plots generated by the model training displayed within the No Code client. The available metrics and plots depend on the type of model trained and the data accessible. Examples include loss curves, accuracy curves, ROC curves, mean squared error, precision & recall, and several other metrics. Note that unavailable information includes metrics related to portions of your federated model trained in other federates, but users may log into the other federates via JedAI to see available data.
Example Federated Model
from pipeline.component.evaluation import Evaluation
from pipeline.component.homo_lr import HomoLR
from pipeline.interface.data import Data
def main(args, pipe, final_component):
try:
guest = 9999
host = 10000
arbiter = 9999
# set job initiator
pipe.set_initiator(role='guest', party_id=guest)
# set participants information
pipe.set_roles(guest=guest, host=host, arbiter=arbiter)
param = {
"penalty": "L2",
"optimizer": "sgd",
# "tol": 1e-05,
"alpha": 0.01,
"max_iter": 5, # was 30
"early_stop": "diff",
"batch_size": -1,
"learning_rate": 0.15,
"decay": 1,
"decay_sqrt": True,
"init_param": {
"init_method": "zeros"
},
"encrypt_param": {
"method": None
},
"cv_param": {
"n_splits": 4,
"shuffle": True,
"random_seed": 33,
"need_cv": False
}
}
# Add the logistic regression component
horizontal_lr_0 = HomoLR(name='horizontal_lr_0', **param)
pipe.add_component(horizontal_lr_0, data=Data(train_data=final_component.output.data))
# Add the evaluation component
evaluation_0 = Evaluation(name="evaluation_0", eval_type="binary")
pipe.add_component(evaluation_0, data=Data(data=horizontal_lr_0.output.data))
except Exception as err:
raise
return pipe, {"guest": [guest], "host": [host], "arbiter": [arbiter]}