Skip to content

Commit

Permalink
Add status to session, and API method to check if a session has finished
Browse files Browse the repository at this point in the history
  • Loading branch information
Andreas Hellander committed Jan 31, 2024
2 parents 4b77120 + 06c21ae commit 3c2e846
Show file tree
Hide file tree
Showing 31 changed files with 538 additions and 224 deletions.
53 changes: 33 additions & 20 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,32 +10,35 @@
.. image:: https://readthedocs.org/projects/fedn/badge/?version=latest&style=flat
:target: https://fedn.readthedocs.io

FEDn is a modular and model agnostic framework for hierarchical
federated machine learning which scales from pseudo-distributed
development to real-world production networks in distributed,
heterogeneous environments. For more details see https://arxiv.org/abs/2103.00148.
FEDn is a modular and model agnostic framework for
federated machine learning. FEDn is designed to scale from pseudo-distributed
development on your laptop to real-world production setups in geographically distributed environments.

Core Features
=============

- **Scalable and resilient.** FEDn is highly scalable and resilient via a tiered
architecture where multiple aggregation servers (combiners) form a network to divide up the work to coordinate clients and aggregate models.
Recent benchmarks show high performance both for thousands of clients in a cross-device
setting and for large model updates (1GB) in a cross-silo setting.
FEDn has the ability to recover from failure in all critical components.

Benchmarks show high performance both for thousands of clients in a cross-device
setting and for large model updates in a cross-silo setting.
FEDn has the ability to recover from failure in all critical components.

- **Security**. A key feature is that
clients do not have to expose any ingress ports.

- **Track events and training progress in real-time**. FEDn tracks events for clients and aggregation servers, logging to MongoDB. This
helps developers monitor traning progress in real-time, and to troubleshoot the distributed computation.
Tracking and model validation data can easily be retrieved using the API enabling development of custom dashboards and visualizations.

- **Flexible handling of asynchronous clients**. FEDn supports flexible experimentation
with clients coming in and dropping out during training sessions. Extend aggregators to experiment
with different strategies to handle so called stragglers.

- **ML-framework agnostic**. Model updates are treated as black-box
computations. This means that it is possible to support any
ML model type or framework. Support for Keras and PyTorch is
available out-of-the-box.

- **Security**. A key feature is that
clients do not have to expose any ingress ports.

- **Track events and training progress**. FEDn logs events in the federation and tracks both training and validation progress in real time. Data is logged as JSON to MongoDB and a user can easily make custom dashboards and visualizations.

- **UI.** A Flask UI lets users see client model validations in real time, as well as track client training time distributions and key performance metrics for clients and combiners.

Getting started
===============

Expand All @@ -55,23 +58,33 @@ Clone this repository, locate into it and start a pseudo-distributed FEDn networ
docker-compose up
Navigate to http://localhost:8090. You should see the FEDn UI, asking you to upload a compute package. The compute package is a tarball of a project. The project in turn implements the entrypoints used by clients to compute model updates and to validate a model.
This starts up the needed backend services MongoDB and Minio, the API Server and one Combiner. You can verify deployment using these urls:

- API Server: localhost:8092
- Minio: localhost:9000
- Mongo Express: localhost:8081

Next, we will prepare the client. A key concept in FEDn is the compute package -
a code bundle that contains entrypoints for training and (optionally) validating a model update on the client.
The following steps uses the compute package defined in the example project 'examples/mnist-pytorch'.

Locate into 'examples/mnist-pytorch'.
Locate into 'examples/mnist-pytorch' and familiarize yourself with the project structure. The entrypoints
are defined in 'client/entrypoint'. The dependencies needed in the client environment are specified in
'requirements.txt'. For convenience, we have provided utility scripts to set up a virtual environment.

Start by initializing a virtual enviroment with all of the required dependencies for this project.

.. code-block::
bin/init_venv.sh
Now create the compute package and a seed model:
Next create the compute package and a seed model:

.. code-block::
bin/build.sh
Uploade the generated files 'package.tar.gz' and 'seed.npz' in the FEDn UI.
Uploade the generated files 'package.tgz' and 'seed.npz' using the API:

The next step is to configure and attach clients. For this we download data and make data partitions:

Expand All @@ -82,7 +95,7 @@ Download the data:
bin/get_data
Split the data in 2 parts for the clients:
Split the data in 2 partitions:

.. code-block::
Expand Down
45 changes: 45 additions & 0 deletions docs/aggregators.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
.. _agg-label:

Aggregators
===========

Aggregators handle combinations of model updates received by the combiner into a combiner-level global model.
During a training session, the combiners will instantiate an Aggregator and use it to process the incoming model updates from clients.

.. image:: img/aggregators.png
:alt: Aggregator overview
:width: 100%
:align: center

The above figure illustrates the overall flow. When a client completes a model update, the model parameters are streamed to the combiner, and a model update message is sent. The model parameters are written to file on disk, and the model update message is passed to a callback function, on_model_update. The callback function validates the model update, and if successful, puts the update message on an aggregation queue. The model parameters are written to disk at a configurable storage location at the combiner. This is done to avoid exhausting RAM memory at the combiner. As multiple clients send updates, the aggregation queue builds up, and when a certain criteria is met, another method, combine_models, starts processing the queue, aggregating models according to the specifics of the scheme (FedAvg, FedAdam, etc).

The user can configure several parameters that guide general behavior of the aggregation flow:

- Round timeout: The maximal time the combiner waits before processing the update queue.
- Buffer size: The maximal allowed length of the queue before processing it.
- Whether to retain or delete model update files after they have been processed (default is to delete them)



A developer can extend FEDn with his/her own Aggregator(s) by implementing the interface specified in
:py:mod:`fedn.network.combiner.aggregators.aggregatorbase.AggregatorBase`. The developer implements two following methods:

- ``on_model_update`` (optional)
- ``combine_models``

on_model_update
----------------

The on_model_update has access to the complete model update including the metadata passed on by the clients (as specified in the training entrypoint, see compute package). The base class implements a default callback that checks that all metadata assumed by the aggregation algorithms FedAvg and FedAdam is present in the metadata. However, the callback could also be used to implement custom preprocessing and additional checks including strategies to filter out updates that are suspected to be corrupted or malicious.

combine_models
--------------

This method is responsible for processing the model update queue and in doing so produce an aggregated model. This is the main extension point where the numerical detail of the aggregation scheme is implemented. The best way to understand how to implement this methods is to study the already implemented algorithms:

- :py:mod:`fedn.network.combiner.aggregators.fedavg`
- :py:mod:`fedn.network.combiner.aggregators.fedopt`

To add an aggregator plugin “myaggregator”, the developer implements the interface and places a file called ‘myaggregator.py’ in the folder ‘fedn.network.combiner.aggregators’.


9 changes: 1 addition & 8 deletions docs/architecture.rst
Original file line number Diff line number Diff line change
Expand Up @@ -51,13 +51,6 @@ Notes on aggregating algorithms
FEDn is designed to allow customization of the FedML algorithm, following a specified pattern, or programming model.
Model aggregation happens on two levels in the network. First, each Combiner can be configured with a custom orchestration and aggregation implementation, that reduces model updates from Clients into a single, *combiner level* model.
Then, a configurable aggregation protocol on the *Controller* level is responsible for combining the combiner-level models into a global model. By varying the aggregation schemes on the two levels in the system,
many different possible outcomes can be achieved. Good starting configurations are provided out-of-the-box to help the user get started. See API reference for more details.

Hierarchical Federated Averaging
................................

The currently implemented default scheme uses a local SGD strategy on the Combiner level aggregation and a simple average of models on the reducer level.
This results in a highly horizontally scalable FedAvg scheme. The strategy works well with most artificial neural network (ANNs) models,
and can in general be applied to models where it is possible and makes sense to form mean values of model parameters (for example SVMs).
many different possible outcomes can be achieved. Good starting configurations are provided out-of-the-box to help the user get started. See :ref:`agg-label` and API reference for more details.


8 changes: 8 additions & 0 deletions docs/fedn.network.api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,11 @@ fedn.network.api.network module
:members:
:undoc-members:
:show-inheritance:

fedn.network.api.tests module
-----------------------------

.. automodule:: fedn.network.api.tests
:members:
:undoc-members:
:show-inheritance:
8 changes: 8 additions & 0 deletions docs/fedn.network.clients.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,11 @@ fedn.network.clients.state module
:members:
:undoc-members:
:show-inheritance:

fedn.network.clients.test\_client module
----------------------------------------

.. automodule:: fedn.network.clients.test_client
:members:
:undoc-members:
:show-inheritance:
16 changes: 16 additions & 0 deletions docs/fedn.network.combiner.aggregators.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,14 @@ fedn.network.combiner.aggregators package
Submodules
----------

fedn.network.combiner.aggregators.aggregator module
---------------------------------------------------

.. automodule:: fedn.network.combiner.aggregators.aggregator
:members:
:undoc-members:
:show-inheritance:

fedn.network.combiner.aggregators.aggregatorbase module
-------------------------------------------------------

Expand All @@ -24,3 +32,11 @@ fedn.network.combiner.aggregators.fedavg module
:members:
:undoc-members:
:show-inheritance:

fedn.network.combiner.aggregators.fedopt module
-----------------------------------------------

.. automodule:: fedn.network.combiner.aggregators.fedopt
:members:
:undoc-members:
:show-inheritance:
30 changes: 19 additions & 11 deletions docs/fedn.network.combiner.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,22 @@ Subpackages
Submodules
----------

fedn.network.combiner.combiner module
-------------------------------------

.. automodule:: fedn.network.combiner.combiner
:members:
:undoc-members:
:show-inheritance:

fedn.network.combiner.combiner\_tests module
--------------------------------------------

.. automodule:: fedn.network.combiner.combiner_tests
:members:
:undoc-members:
:show-inheritance:

fedn.network.combiner.connect module
------------------------------------

Expand All @@ -41,18 +57,10 @@ fedn.network.combiner.modelservice module
:undoc-members:
:show-inheritance:

fedn.network.combiner.round module
----------------------------------

.. automodule:: fedn.network.combiner.round
:members:
:undoc-members:
:show-inheritance:

fedn.network.combiner.server module
-----------------------------------
fedn.network.combiner.roundhandler module
-----------------------------------------

.. automodule:: fedn.network.combiner.server
.. automodule:: fedn.network.combiner.roundhandler
:members:
:undoc-members:
:show-inheritance:
8 changes: 8 additions & 0 deletions docs/fedn.network.controller.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,11 @@ fedn.network.controller.control module
:members:
:undoc-members:
:show-inheritance:

fedn.network.controller.controlbase module
------------------------------------------

.. automodule:: fedn.network.controller.controlbase
:members:
:undoc-members:
:show-inheritance:
11 changes: 1 addition & 10 deletions docs/fedn.network.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,8 @@ Subpackages
fedn.network.clients
fedn.network.combiner
fedn.network.controller
fedn.network.dashboard
fedn.network.loadbalancer
fedn.network.statestore
fedn.network.storage

Submodules
----------
Expand All @@ -31,14 +30,6 @@ fedn.network.config module
:undoc-members:
:show-inheritance:

fedn.network.reducer module
---------------------------

.. automodule:: fedn.network.reducer
:members:
:undoc-members:
:show-inheritance:

fedn.network.state module
-------------------------

Expand Down
34 changes: 34 additions & 0 deletions docs/fedn.network.storage.models.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
fedn.network.storage.models package
===================================

.. automodule:: fedn.network.storage.models
:members:
:undoc-members:
:show-inheritance:

Submodules
----------

fedn.network.storage.models.memorymodelstorage module
-----------------------------------------------------

.. automodule:: fedn.network.storage.models.memorymodelstorage
:members:
:undoc-members:
:show-inheritance:

fedn.network.storage.models.modelstorage module
-----------------------------------------------

.. automodule:: fedn.network.storage.models.modelstorage
:members:
:undoc-members:
:show-inheritance:

fedn.network.storage.models.tempmodelstorage module
---------------------------------------------------

.. automodule:: fedn.network.storage.models.tempmodelstorage
:members:
:undoc-members:
:show-inheritance:
17 changes: 17 additions & 0 deletions docs/fedn.network.storage.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
fedn.network.storage package
============================

.. automodule:: fedn.network.storage
:members:
:undoc-members:
:show-inheritance:

Subpackages
-----------

.. toctree::
:maxdepth: 4

fedn.network.storage.models
fedn.network.storage.s3
fedn.network.storage.statestore
34 changes: 34 additions & 0 deletions docs/fedn.network.storage.s3.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
fedn.network.storage.s3 package
===============================

.. automodule:: fedn.network.storage.s3
:members:
:undoc-members:
:show-inheritance:

Submodules
----------

fedn.network.storage.s3.base module
-----------------------------------

.. automodule:: fedn.network.storage.s3.base
:members:
:undoc-members:
:show-inheritance:

fedn.network.storage.s3.miniorepository module
----------------------------------------------

.. automodule:: fedn.network.storage.s3.miniorepository
:members:
:undoc-members:
:show-inheritance:

fedn.network.storage.s3.repository module
-----------------------------------------

.. automodule:: fedn.network.storage.s3.repository
:members:
:undoc-members:
:show-inheritance:
Loading

0 comments on commit 3c2e846

Please sign in to comment.