From 0329d12321c05b7c5c5623b30313d7aff54e8f40 Mon Sep 17 00:00:00 2001 From: nvkevlu <55759229+nvkevlu@users.noreply.github.com> Date: Sat, 1 Feb 2025 23:07:40 -0800 Subject: [PATCH] Apply automated document enhancement modifications (#3165) Applies the more straightforward automated document enhancement modifications. ### Description Applies the more straightforward automated document enhancement modifications that are the same as what was merged in the 2.5 branch. ### Types of changes - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Co-authored-by: Ziyue Xu Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com> --- .github/ISSUE_TEMPLATE/question.md | 4 +-- CODE_OF_CONDUCT.md | 4 +-- docs/example_applications_algorithms.rst | 2 +- docs/examples/hello_scatter_and_gather.rst | 2 +- docs/fl_introduction.rst | 2 +- docs/flare_overview.rst | 8 ++--- docs/index.rst | 7 ++-- .../cross_site_model_evaluation.rst | 4 +-- .../scatter_and_gather_workflow.rst | 8 ++--- .../3rd_party_integration.rst | 2 +- .../execution_api_type/client_api.rst | 18 +++++------ docs/programming_guide/filters.rst | 4 +-- docs/programming_guide/fl_model.rst | 6 ++-- .../programming_guide/provisioning_system.rst | 2 +- .../resource_manager_and_consumer.rst | 2 +- .../programming_guide/system_architecture.rst | 2 +- docs/real_world_fl/flare_api.rst | 2 +- docs/real_world_fl/kubernetes.rst | 14 ++++---- docs/real_world_fl/migrating_to_flare_api.rst | 6 ++-- docs/real_world_fl/notes_on_large_models.rst | 6 ++-- docs/real_world_fl/workspace.rst | 10 +++--- docs/release_notes/flare_210.rst | 4 +-- docs/release_notes/flare_220.rst | 4 +-- docs/release_notes/flare_240.rst | 2 +- docs/user_guide/confidential_computing.rst | 20 ++++++------ .../communication_configuration.rst | 4 +-- .../configurations/job_configuration.rst | 6 ++-- docs/user_guide/dashboard_api.rst | 32 +++++++++---------- docs/user_guide/dashboard_ui.rst | 2 +- .../reliable_xgboost_design.rst | 16 +++++----- .../reliable_xgboost_timeout.rst | 8 ++--- .../flower_job_structure.rst | 4 +-- .../nvflare_cli/dashboard_command.rst | 2 +- docs/user_guide/nvflare_cli/poc_command.rst | 2 +- .../nvflare_cli/preflight_check.rst | 2 +- docs/user_guide/nvflare_security.rst | 2 +- .../authorization_policy_previewer.rst | 2 +- .../security/communication_security.rst | 2 +- .../security/data_privacy_protection.rst | 6 ++-- .../security/site_policy_management.rst | 4 +-- .../security/terminologies_and_roles.rst | 2 +- .../security/unsafe_component_detection.rst | 6 ++-- docs/whats_new.rst | 2 +- examples/advanced/README.md | 2 +- .../advanced/federated-statistics/README.md | 2 +- .../hierarchical_stats.ipynb | 8 ++--- .../notebooks/graph_construct.ipynb | 2 +- .../notebooks/prepare_data.ipynb | 2 +- .../advanced/finance-end-to-end/xgboost.ipynb | 8 ++--- examples/advanced/finance/README.md | 4 +-- .../fl_hub/jobs/numpy-cross-val/README.md | 2 +- examples/advanced/gnn/gnn_examples.ipynb | 2 +- examples/advanced/job_api/README.md | 10 +++--- examples/advanced/job_api/pt/README.md | 2 +- examples/advanced/job_api/tf/README.md | 11 +++---- examples/advanced/kaplan-meier-he/README.md | 2 +- .../advanced/prostate/prostate_2D/README.md | 6 ++-- .../sklearn-svm/sklearn_svm_cancer.ipynb | 2 +- .../cifar10_split_learning.ipynb | 2 +- .../xgboost/histogram-based/README.md | 10 +++--- examples/advanced/xgboost_secure/README.md | 5 ++- .../pt/nvflare_pt_getting_started.ipynb | 2 +- examples/getting_started/sklearn/README.md | 2 +- examples/getting_started/tf/README.md | 13 ++++---- .../hello-fedavg-numpy_flare_api.ipynb | 4 +-- .../hello-numpy-cross-val/README.md | 9 +++--- .../hello-numpy-sag/hello_numpy_sag.ipynb | 4 +-- examples/hello-world/hello_world.ipynb | 4 +-- examples/hello-world/ml-to-fl/README.md | 2 +- examples/hello-world/ml-to-fl/tf/README.md | 10 +++--- .../step-by-step/cifar10/README.md | 4 +-- .../step-by-step/cifar10/code/readme.md | 2 +- .../step-by-step/cifar10/cse/cse.ipynb | 2 +- .../step-by-step/cifar10/cyclic/cyclic.ipynb | 4 +-- .../cifar10/cyclic_ccwf/cyclic_ccwf.ipynb | 2 +- .../step-by-step/cifar10/data/readme.md | 8 ++--- .../step-by-step/cifar10/sag/sag.ipynb | 4 +-- .../cifar10/sag_executor/sag_executor.ipynb | 6 ++-- .../step-by-step/cifar10/sag_he/sag_he.ipynb | 4 +-- .../cifar10/sag_mlflow/sag_mlflow.ipynb | 2 +- .../cifar10/stats/image_stats.ipynb | 16 +++++----- .../step-by-step/cifar10/swarm/swarm.ipynb | 4 +-- examples/tutorials/flare_simulator.ipynb | 2 +- examples/tutorials/job_cli.ipynb | 10 +++--- examples/tutorials/setup_poc.ipynb | 2 +- nvflare/app_common/psi/README.md | 4 +-- tests/README.md | 6 ++-- tests/integration_test/README.md | 8 ++--- 88 files changed, 233 insertions(+), 238 deletions(-) diff --git a/.github/ISSUE_TEMPLATE/question.md b/.github/ISSUE_TEMPLATE/question.md index 21ae9fafb2..a3b1cf2173 100644 --- a/.github/ISSUE_TEMPLATE/question.md +++ b/.github/ISSUE_TEMPLATE/question.md @@ -1,7 +1,7 @@ --- -name: Question (please use the Discussion tab) +name: Question (please use the Discussions tab) about: https://github.com/NVIDIA/NVFlare/discussions -title: 'Please use NVFlare Discussion tab for questions' +title: 'Please use NVFlare's Discussions tab for questions' labels: '' assignees: '' --- diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md index a93f45f523..167c167744 100644 --- a/CODE_OF_CONDUCT.md +++ b/CODE_OF_CONDUCT.md @@ -10,7 +10,7 @@ COMMUNITY | DEVELOPERS | PROJECT LEADS ## Our Pledge -In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation. +In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to make participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation. ## Our Standards @@ -34,7 +34,7 @@ Examples of unacceptable behavior by participants include: Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior. -Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful. +Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned with this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful. ## Scope diff --git a/docs/example_applications_algorithms.rst b/docs/example_applications_algorithms.rst index 52e0b65e2b..0257cf3316 100644 --- a/docs/example_applications_algorithms.rst +++ b/docs/example_applications_algorithms.rst @@ -8,7 +8,7 @@ NVIDIA FLARE has several tutorials and examples to help you get started with fed 1. Hello World Examples ======================= -Can be run from the :github_nvflare_link:`hello_world notebook `. +Can be run from :github_nvflare_link:`hello_world notebook `. .. toctree:: :maxdepth: 1 diff --git a/docs/examples/hello_scatter_and_gather.rst b/docs/examples/hello_scatter_and_gather.rst index baea6b0287..a7444ffa48 100644 --- a/docs/examples/hello_scatter_and_gather.rst +++ b/docs/examples/hello_scatter_and_gather.rst @@ -27,7 +27,7 @@ Due to the simplified weights, you will be able to clearly see and understand the results of the FL aggregation and the model persistor process. The setup of this exercise consists of one **server** and two **clients**. -The server side model starting with weights ``[[1, 2, 3], [4, 5, 6], [7, 8, 9]]``. +The server-side model starts with weights ``[[1, 2, 3], [4, 5, 6], [7, 8, 9]]``. The following steps compose one cycle of weight updates, called a **round**: diff --git a/docs/fl_introduction.rst b/docs/fl_introduction.rst index 04cb9a9cd5..b99fc6834a 100644 --- a/docs/fl_introduction.rst +++ b/docs/fl_introduction.rst @@ -26,7 +26,7 @@ FL Terms and Definitions .. note:: - Here we describe the centralized version of FL, where the FL server has the role of the aggregrator node. However in a decentralized version such as + Here we describe the centralized version of FL, where the FL server has the role of the aggregator node. However in a decentralized version such as swarm learning, FL clients can serve as the aggregator node instead. - Types of FL diff --git a/docs/flare_overview.rst b/docs/flare_overview.rst index bd7fe70c40..4f6a665adc 100644 --- a/docs/flare_overview.rst +++ b/docs/flare_overview.rst @@ -16,8 +16,8 @@ Federated Computing At its core, FLARE serves as a federated computing framework, with applications such as Federated Learning and Federated Analytics built upon this foundation. Notably, it is agnostic to datasets, workloads, and domains. In contrast to centralized data lake solutions that necessitate copying data to a central location, FLARE brings computing capabilities directly to distributed datasets. -This approach ensures that data remains within the compute node, with only pre-approved, selected results shared among collaborators. -Moreover, FLARE is system agnostic, offering easy integration with various data processing frameworks through the implementation of the FLARE client. +This approach ensures that data remains within the compute node, with only pre-approved, selected results being shared among collaborators. +Moreover, FLARE is system-agnostic, offering easy integration with various data processing frameworks through the implementation of the FLARE client. This client facilitates deployment in sub-processes, Docker containers, Kubernetes pods, HPC, or specialized systems. Built for productivity @@ -106,7 +106,7 @@ High-level System Architecture As detailed above, FLARE incorporates components that empower researchers and developers to construct and deploy end-to-end federated learning applications. The high-level architecture, depicted in the diagram below, encompasses the foundational layer of the FLARE communication, messaging streaming layers, and tools dedicated to privacy preservation and secure platform management. -Atop this foundation lie the building blocks for federated learning applications, featuring a suite of federation workflows and learning algorithms. +Atop this foundation are the building blocks for federated learning applications, featuring a suite of federation workflows and learning algorithms. Adjacent to this central stack are tools facilitating experimentation and simulation with the FL Simulator and POC CLI, complemented by a set of tools designed for the deployment and management of production workflows. .. image:: resources/flare_overview.png @@ -123,7 +123,7 @@ Design Principles **Less is more** We strive to solve unique challenges by doing less while enabling others to do more. -We can't solve whole worlds' problems, but by building an open platform we can enable others to solve world's problems. +We can't solve whole world's problems, but by building an open platform, we can enable others to solve them. This design principle means we intentionally limit the scope of the implementation, only building the necessary components. For a given implementation, we follow specifications in a way that allows others to easily customize and extend. diff --git a/docs/index.rst b/docs/index.rst index 16e1fb8788..b14ea35f57 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -59,10 +59,9 @@ Additional examples can be found at the :ref:`Examples Applications `. -When you are ready to for a secure, distributed deployment, the :ref:`Real World Federated Learning ` section covers the tools and process -required to deploy and operate a secure, real-world FLARE project. +When you are ready for a secure, distributed deployment, the :ref:`Real World Federated Learning ` section covers the tools and processes required to deploy and operate a secure, real-world FLARE project. FLARE for Developers ==================== -When you're ready to build your own application, the :ref:`Programming Guide `, :ref:`Programming Best Practices `, :ref:`FAQ`, and :ref:`API Reference ` -give an in depth look at the FLARE platform and APIs. +When you're ready to build your own application, the :ref:`Programming Guide `, :ref:`Programming Best Practices `, :ref:`FAQ `, and :ref:`API Reference ` +provide an in-depth look at the FLARE platform and APIs. diff --git a/docs/programming_guide/controllers/cross_site_model_evaluation.rst b/docs/programming_guide/controllers/cross_site_model_evaluation.rst index 75936806d5..2407021b40 100644 --- a/docs/programming_guide/controllers/cross_site_model_evaluation.rst +++ b/docs/programming_guide/controllers/cross_site_model_evaluation.rst @@ -18,8 +18,8 @@ example that implements the :class:`cross site model evaluation workflow` is configured to run cross-site + workflows, cross-site validation is no longer in the NVFlare framework but is instead handled by the workflow. + The :github_nvflare_link:`cifar10 example ` is configured to run cross-site model evaluation and ``config_fed_server.json`` is configured with :class:`ValidationJsonGenerator` to write the results to a JSON file on the server. diff --git a/docs/programming_guide/controllers/scatter_and_gather_workflow.rst b/docs/programming_guide/controllers/scatter_and_gather_workflow.rst index 44c9d232a8..cf61020e3f 100644 --- a/docs/programming_guide/controllers/scatter_and_gather_workflow.rst +++ b/docs/programming_guide/controllers/scatter_and_gather_workflow.rst @@ -2,8 +2,8 @@ Scatter and Gather Workflow --------------------------- -The Federated scatter and gather workflow is an included reference implementation of the default workflow of previous versions -of NVIDIA FLARE with a Server aggregating results from Clients that have produced Shareable results from their Trainer. +The federated scatter and gather workflow is an included reference implementation of the default workflow in previous versions +of NVIDIA FLARE, with a server aggregating results from clients that have produced shareable results from their trainer. At the core, the control_flow of :class:`nvflare.app_common.workflows.scatter_and_gather.ScatterAndGather` is a for loop: @@ -26,7 +26,7 @@ Learnable For example, in the deep learning scenario, it can be the model weights. In the AutoML case, it can be the network architecture. -A :class:`LearnablePersistor` defines how to load +A :class:`LearnablePersistor` defines how to load and save and save a ``Learnable``. ``Learnable`` is a subset of the model file (which can contain other data like LR schedule) which is to be learned, like the model weights. @@ -34,7 +34,7 @@ which is to be learned, like the model weights. Aggregator ^^^^^^^^^^ -:class:`Aggregators` define the aggregation algorithm to aggregate the ``Shareable``. +:class:`Aggregator` defines the aggregation algorithm to aggregate the ``Shareable``. For example, a simple aggregator would be just average all the ``Shareable`` of the same round. Below is the signature for an aggregator. diff --git a/docs/programming_guide/execution_api_type/3rd_party_integration.rst b/docs/programming_guide/execution_api_type/3rd_party_integration.rst index 093490897f..d93afd76b5 100644 --- a/docs/programming_guide/execution_api_type/3rd_party_integration.rst +++ b/docs/programming_guide/execution_api_type/3rd_party_integration.rst @@ -31,7 +31,7 @@ Requirements Depending on where the trainer is running, the connection may or may not need to be in secure mode (TLS). - We will need to modify the "project.yml" for NVFlare provision system and generate new package folders for each participating sites -- The trainer must be a Python program that can integrate with the NVFLARE library. +- The trainer must be a Python program that can integrate with the NVFlare library. - The trainer must be able to connect to the server, as well as the address that is dynamically opened by the FL client. diff --git a/docs/programming_guide/execution_api_type/client_api.rst b/docs/programming_guide/execution_api_type/client_api.rst index fb002f8934..2e5847a59a 100644 --- a/docs/programming_guide/execution_api_type/client_api.rst +++ b/docs/programming_guide/execution_api_type/client_api.rst @@ -248,16 +248,16 @@ that use Client API to write the Selection of Job Templates ========================== -To help user quickly setup job configurations, we create many job templates. You can pick one job template that close to your use cases -and adapt to your needs by modify the needed variables. +To help users quickly set up job configurations, we have created numerous job templates. You can select a job template that closely matches +your use case and adapt it to your needs by modifying the necessary variables. -use command ``nvflare job list_templates`` you can find all job templates nvflare provided. +Using the command ``nvflare job list_templates``, you can find all the job templates provided by NVFlare. .. image:: ../../resources/list_templates_results.png :height: 300px -looking at the ``Execution API Type``, you will find ``client_api``. That's indicates the specified job template will use -Client API configuration. You can further nail down the selection by choice of machine learning framework: pytorch or sklearn or xgboost, +looking at the ``Execution API Type``, you will find ``client_api``. This indicates that the specified job template will use the Client API +configuration. You can further nail down the selection by choice of machine learning framework: pytorch or sklearn or xgboost, in-process or not, type of models ( GNN, NeMo LLM), workflow patterns ( Swarm learning or standard fedavg with scatter and gather (sag)) etc. @@ -271,11 +271,11 @@ For example: .. code-block:: python class CustomClass: - def __init__(self, x, y): - self.x = 1 - self.y = 2 + def __init__(self, x, y): + self.x = 1 + self.y = 2 -If you are using classes derived from ``Enum`` or dataclass, they will be handled by the default decomposers. +If your code uses classes derived from ``Enum`` or dataclasses, they will be handled by the default decomposers. For other custom classes, you will need to write a dedicated custom decomposer and ensure it is registered using fobs.register on both the server side and client side, as well as in train.py. diff --git a/docs/programming_guide/filters.rst b/docs/programming_guide/filters.rst index 5fd8f807ed..fbbf984835 100644 --- a/docs/programming_guide/filters.rst +++ b/docs/programming_guide/filters.rst @@ -48,9 +48,9 @@ For an example application using SVTPrivacy, see :github_nvflare_link:`Different DXO - Data Exchange Object =========================== -The message object passed between the server and clients are of Shareable class. Shareable is a general structure for all kinds of communication (task interaction, aux messages, fed events, etc.) that in addition to the message payload, also carries contextual information (such as peer FL context). NVFLARE's DXO object is a general-purpose structure that is meant to be used to carry message payload in a self-descriptive manner. As an analogy, think of Shareable as an HTTP message, whereas a DXO as a JPEG image that is carried by the HTTP message. +The message object passed between the server and clients is of the Shareable class. Shareable is a general structure for all kinds of communication (task interaction, aux messages, fed events, etc.) that in addition to the message payload, also carries contextual information (such as peer FL context). NVFLARE's DXO object is a general-purpose structure that is meant to be used to carry message payload in a self-descriptive manner. As an analogy, think of Shareable as an HTTP message, whereas a DXO as a JPEG image that is carried by the HTTP message. -An DXO object has the following properties: +A DXO object has the following properties: - Data Kind - the kind of data the DXO object carries (e.g. WEIGHTS, WEIGHT_DIFF, COLLECTION of DXOs, etc.) - Meta - meta properties that describe the data (e.g. whether processed/encrypted and processing algorithm). This is a dict. diff --git a/docs/programming_guide/fl_model.rst b/docs/programming_guide/fl_model.rst index 702af3a4de..aeb735d772 100644 --- a/docs/programming_guide/fl_model.rst +++ b/docs/programming_guide/fl_model.rst @@ -9,9 +9,9 @@ that captures the common attributes needed for exchanging learning results. This is particularly useful when NVFlare system needs to exchange learning information with external training scripts/systems. -The external training script/system only need to extract the required -information from received FLModel, run local training, and put the results +The external training script or system only needs to extract the required +information from the received FLModel, run local training, and put the results in a new FLModel to be sent back. -For a detailed explanation of each attributes, please refer to the API doc: +For a detailed explanation of each attribute, please refer to the API doc: :mod:`FLModel` diff --git a/docs/programming_guide/provisioning_system.rst b/docs/programming_guide/provisioning_system.rst index c49ef091ec..d313c2dfa8 100644 --- a/docs/programming_guide/provisioning_system.rst +++ b/docs/programming_guide/provisioning_system.rst @@ -105,7 +105,7 @@ the Project instance: Participant ----------- Each participant is one entity that communicates with other participants inside the NVIDIA FLARE system during runtime. -Each participant has the following attributes: type, name, org and props. The attribute ``props`` is a dictionary and +Each participant has the following attributes: type, name, org, and props. The attribute ``props`` is a dictionary and stores additional information: .. code-block:: python diff --git a/docs/programming_guide/resource_manager_and_consumer.rst b/docs/programming_guide/resource_manager_and_consumer.rst index d2e6597f69..c6f5adcc08 100644 --- a/docs/programming_guide/resource_manager_and_consumer.rst +++ b/docs/programming_guide/resource_manager_and_consumer.rst @@ -166,7 +166,7 @@ You can easily write your own resource manager and consumer following the API sp @abstractmethod def report_resources(self, fl_ctx) -> dict: """Reports resources.""" - Pass + pass A more friendly interface (AutoCleanResourceManager) is provided as well: diff --git a/docs/programming_guide/system_architecture.rst b/docs/programming_guide/system_architecture.rst index b4c6ae9084..12bfd37766 100644 --- a/docs/programming_guide/system_architecture.rst +++ b/docs/programming_guide/system_architecture.rst @@ -35,7 +35,7 @@ See the example :ref:`project_yml` for how these components are configured in St Overseer -------- The Overseer is a system component that determines the hot FL server at any time for high availability. -The name of the Overseer must be unique and in the format of fully qualified domain names. During +The name of the Overseer must be unique and in the format of a fully qualified domain name. During provisioning time, if the name is specified incorrectly, either being duplicate or containing incompatible characters, the provision command will fail with an error message. It is possible to use a unique hostname rather than FQDN, with the IP mapped to the hostname by having it added to ``/etc/hosts``. diff --git a/docs/real_world_fl/flare_api.rst b/docs/real_world_fl/flare_api.rst index 8165724061..eb7dfe0785 100644 --- a/docs/real_world_fl/flare_api.rst +++ b/docs/real_world_fl/flare_api.rst @@ -48,7 +48,7 @@ the session in a finally clause: try: print(sess.get_system_info()) - job_id = sess.submit_job("/workspace/locataion_of_jobs/job1") + job_id = sess.submit_job("/workspace/location_of_jobs/job1") print(job_id + " was submitted") # monitor_job() waits until the job is done, see the section about it below for details sess.monitor_job(job_id) diff --git a/docs/real_world_fl/kubernetes.rst b/docs/real_world_fl/kubernetes.rst index 8c52bddc8f..10593af116 100644 --- a/docs/real_world_fl/kubernetes.rst +++ b/docs/real_world_fl/kubernetes.rst @@ -45,7 +45,7 @@ Provision With NVIDIA FLARE installed in your local machine, you can create one set of startup kits easily with ``nvflare provision``. If there is a project.yml file in your current working directory, ``nvflare provision`` will create a workspace directory. If that project.yml file does not exist, ``nvflare provision`` will create a sample project.yml for you. For simplicity, we suggest you remove/rename any existing project.yml and workspace directory. Then provision the -set of startup kits from scratch. When selecting the sampel project.yml during provisioning time, select non-HA one as most clusters support HA easily. +set of startup kits from scratch. When selecting the sample project.yml during provisioning, select a non-HA one, as most clusters support HA easily. After provisioning, you will have a workspace/example_project/prod_00 folder, which includes server, site-1, site-2 and admin@nvidia.com folders. If you would like to use other names instead of ``site-1``, ``site-2``, etc, you can remove the workspace folder and modify the project.yml file. After that, @@ -54,8 +54,8 @@ you can run ``nvflare provision`` command to get the new set of startup kits. Persistent Volume ================= -EKS provides several ways to create persistent volumes. Before you can use create the volume, -you will need to create one OIDC provider, add one service account and attach a pollicy to two roles, the node instance group and that service account. +EKS provides several ways to create persistent volumes. Before you can create the volume, +you need to create an OIDC provider, add a service account, and attach a policy to two roles: the node instance group and the service account. .. code-block:: shell @@ -128,13 +128,13 @@ can run ``kubectl apply -f volume.yaml`` to make the volume available. storage: 5Gi storageClassName: gp2 -After that, your EKS persistent volme should be waiting for the first claim. +After that, your EKS persistent volume should be waiting for the first claim. Start Helper Pod ================ -Now you will need to copy your startup kits to your EKS cluster. Those startup kits will copied into the volume you just created. +Now you will need to copy your startup kits to your EKS cluster. Those startup kits will be copied into the volume you just created. In order to access the volume, we deploy a helper pod which mounts that persistent volume and use kubectl cp to copy files from your local machine to the cluster. @@ -190,8 +190,8 @@ And the same for site-1, site-2, admin@nvidia.com. This will make the entire startup kits available at the nvflare-pv-claim of the cluster so that NVIDIA FLARE system can mount that nvflare-pv-claim and access the startup kits. -After copying those folders to nvflare-pv-claim, you can shutdown the helper pod. The nvflare-pv-claim and its contents will stay and is -available to server/client/admin pods. +After copying those folders to nvflare-pv-claim, you can shutdown the helper pod. The nvflare-pv-claim and its contents will remain available to +server, client, and admin pods. Start Server Pod ================ diff --git a/docs/real_world_fl/migrating_to_flare_api.rst b/docs/real_world_fl/migrating_to_flare_api.rst index 97ece87210..bb86a8929e 100644 --- a/docs/real_world_fl/migrating_to_flare_api.rst +++ b/docs/real_world_fl/migrating_to_flare_api.rst @@ -18,7 +18,7 @@ Initialization of the FLAdminAPI was cumbersome due to all the necessary argumen :class:`FLAdminAPIRunner` was used for initializing the FLAdminAPI with the username of the admin user and the path to the admin startup kit directory. -Initialization the FLAdminAPI: +Initializing the FLAdminAPI: .. code-block:: python @@ -66,12 +66,12 @@ General Notes on Migrating to FLARE API Return Structure ^^^^^^^^^^^^^^^^ -The return structure for FLAdminAPI commands were ``FLAdminAPIResponse`` objects that contained the status, details, and raw response from the server. +The return structure for FLAdminAPI commands was an ``FLAdminAPIResponse`` object that contained the status, details, and raw response from the server. This required parsing the response to get the status or other information to then use or output. The FLARE API no longer returns an object with a status and a dictionary of details, but the response depends on the command and is greatly simplified. See the details of what each command returns below or in the docstrings at: :mod:`FLARE API`. -FLARE API now Raises Exceptions +FLARE API Now Raises Exceptions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Instead of having a status with an error that needs to be parsed in FLAdminAPI, FLARE API will now raise an exception if there is an error or something unexpected happens, and the handling of these exceptions will be the responsibility of the code using the FLARE API. This means that in general, diff --git a/docs/real_world_fl/notes_on_large_models.rst b/docs/real_world_fl/notes_on_large_models.rst index 5394d81b42..82332690a0 100644 --- a/docs/real_world_fl/notes_on_large_models.rst +++ b/docs/real_world_fl/notes_on_large_models.rst @@ -17,7 +17,7 @@ The Azure VM size of the NVIDIA FLARE server was M32-8ms, which has 875GB memory Job of 128GB Models ******************* -We slightly modified the hello-numpy example to generate a model, which was a dictionary of 64 keys. Each key held a 2GB numpy array. The local training task was to add a small number to +We slightly modified the hello-numpy example to generate a model, which was a dictionary of 64 keys. Each key contained a 2GB NumPy array. The local training task was to add a small number to those numpy arrays. The aggregator on the server side was not changed. This job required at least two clients and ran 3 rounds to finish. @@ -31,13 +31,13 @@ On the ap-south-1 client, it took about 11000 seconds from the client to the ser - communication_timeout to 6000 -The `streaming_read_timeout` is used to check when a chunck of data is received but is not read out by the upper layer. The `streaming_ack_wait` is how long the sender should wait for acknowledgement returned by the receiver for one chunck. +The `streaming_read_timeout` is used to check when a chunk of data is received but not read by the upper layer. The `streaming_ack_wait` is how long the sender should wait for acknowledgement returned by the receiver for one chunk. The `communication_timeout` is used on three consecutive stages for a single request and response. When sending a large request (submit_update), the sender starts a timer with timeout = `communication_timeout`. When this timer expires, the sender checks if any progress is made during this period. If yes, the sender resets the timer with the same timeout value and waits again. If not, this request and response returns with timeout. After sending completes, the sender cancels the previous timer and starts a `remote processing` timer with timeout = `communication_timeout`. This is to wait for the first returned byte from the receiver. On -large models, the server requires much longer time to prepare the task when the clients send `get_task` requests. After receiving the first returned byte, the sender cancel the `remote processing` timer and starts +large models, the server requires much longer time to prepare the task when the clients send `get_task` requests. After receiving the first returned byte, the sender cancels the `remote processing` timer and starts a new timer. It checks the receiving progress just like sending. diff --git a/docs/real_world_fl/workspace.rst b/docs/real_world_fl/workspace.rst index 90fefdaa86..ba4a5bd7a9 100644 --- a/docs/real_world_fl/workspace.rst +++ b/docs/real_world_fl/workspace.rst @@ -49,9 +49,9 @@ Server In each ``job_id`` folder, there is the ``app_server`` folder that contains the :ref:`application` that is running on the server for this ``job_id``. -The ``log.txt`` inside each ``job_id`` folder are the loggings of this job. +The ``log.txt`` file inside each ``job_id`` folder contains the log entries for that job. -While the ``log.txt`` under server folder is the log for the server control process. +In contrast, the ``log.txt`` file under the server folder logs the server control process. The ``startup`` folder contains the config and the scripts to start the FL server program. @@ -70,8 +70,8 @@ You can issue the ``download_job [JOB_ID]`` in the admin client to download the The downloaded workspace will be in ``[DOWNLOAD_DIR]/[JOB_ID]/workspace/``. .. note:: - - If you issue ``download_job`` before the job is finished, the workspace folder will be empty. + + Issuing ``download_job`` before the job finishes will result in an empty workspace folder. .. _client_workspace: @@ -114,7 +114,7 @@ Client In each ``job_id`` folder, there is the ``app_clientname`` folder that contains the :ref:`application` that is running on the client for this ``job_id``. -The ``log.txt`` inside each ``job_id`` folder are the loggings of this job. +The ``log.txt`` file inside each ``job_id`` folder contains the log entries for that job. While the ``log.txt`` under client folder is the log for the client control process. diff --git a/docs/release_notes/flare_210.rst b/docs/release_notes/flare_210.rst index 485c6d90b6..ffeabac117 100644 --- a/docs/release_notes/flare_210.rst +++ b/docs/release_notes/flare_210.rst @@ -2,7 +2,7 @@ What's New in FLARE v2.1 ========================= FLARE 2.1 was the original release in June 2022 introducing HA and Multi-Job Execution: - - :ref:`High Availability (HA) ` supports multiple FL Servers and automatically cuts - over to another server when the currently active server becomes unavailable. + - :ref:`High Availability (HA) ` supports multiple FL servers and automatically fails over + to another server when the currently active server becomes unavailable. - :ref:`Multi-Job Execution ` supports resource-based multi-job execution by allowing for concurrent runs provided resources required by the jobs are satisfied. diff --git a/docs/release_notes/flare_220.rst b/docs/release_notes/flare_220.rst index de685629b4..2f20c1c08b 100644 --- a/docs/release_notes/flare_220.rst +++ b/docs/release_notes/flare_220.rst @@ -80,14 +80,14 @@ Federated XGBoost """"""""""""""""" XGBoost is a popular machine learning method used by applied data scientists in a wide variety of applications. In FLARE v2.2, -we introcuce federated XGBoost integration, with a controller and executor that run distributed XGBoost training among a group +we introduce federated XGBoost integration, with a controller and executor that run distributed XGBoost training among a group of clients. See the :github_nvflare_link:`hello-xgboost example ` to get started. Federated Statistics """""""""""""""""""" Before implementing a federated training application, a data scientist often performs a process of data exploration, analysis, and feature engineering. One method of data exploration is to explore the statistical distribution of a dataset. -With FLARE v2.2, we indroduce federated statistics operators - a server controller and client executor. With these +With FLARE v2.2, we introduce federated statistics operators - a server controller and client executor. With these pre-defined operators, users define the statistics to be calculated locally on each client dataset, and the workflow controller generates an output json file that contains global as well as individual site statistics. This data can be visualized to allow site-to-site and feature-to-feature comparison of metrics and histograms across the set of clients. diff --git a/docs/release_notes/flare_240.rst b/docs/release_notes/flare_240.rst index 4741b386f0..b10cf5fe2b 100644 --- a/docs/release_notes/flare_240.rst +++ b/docs/release_notes/flare_240.rst @@ -49,7 +49,7 @@ For more in-depth information on the Client API, refer to the :ref:`client_api` The 3rd-Party Integration Pattern --------------------------------- -In certain scenarios, users face challenges when attempting to moving the training logic to the FLARE client side due to pre-existing ML/DL training system infrastructure. +In certain scenarios, users face challenges when attempting to move the training logic to the FLARE client side due to pre-existing ML/DL training system infrastructure. In the 2.4.0 release, we introduce the Third-Party Integration Pattern, which allows the FLARE system and a third-party external training system to seamlessly exchange model parameters without requiring a tightly integrated system. See the :ref:`3rd_party_integration` documentation for more details. diff --git a/docs/user_guide/confidential_computing.rst b/docs/user_guide/confidential_computing.rst index f00f21a7ce..621f7dd5f5 100644 --- a/docs/user_guide/confidential_computing.rst +++ b/docs/user_guide/confidential_computing.rst @@ -4,15 +4,15 @@ Confidential Computing: Attestation Service Integration ####################################################### -Data used in NVFlare are encrypted during transmission between participants, which covers the communication between the NVFlare server, clients and admin. This security measure ensures +Data used in NVFlare is encrypted during transmission between participants, which covers the communication between the NVFlare server, clients, and admin. This security measure ensures data in transit is well protected. Users can also utilize existing infrastructure, such as storage encryption, to protect data at rest. With confidential computing, NVFlare can protect data in use and thus completes securing the entire lifecycle of data. -Confidential computing in NVFlare is designed to explicitly establish the trust between participants. Each participant must first capture the evidences related to the hardware (such as GPU), the software (GPU driver and VBIOS) and other components in its own platform. The evidences will +Confidential computing in NVFlare is designed to explicitly establish the trust between participants. Each participant must first capture the evidence related to the hardware (such as a GPU), the software (GPU driver and VBIOS), and other components in its own platform. The evidence will be validated and signed to ensure its validity and authenticity. The owner of signed evidence, called confidential computing token (CC token), can demonstrate the information about its computing environment to other -participants by providing the CC token. Upon receiving CC token, the participant (the relying party) can verify the claims inside the CC token against it own security policy on whether the CC token owner is -using required hardware/software/components for security. If the relying party finds the CC token does not meet its security policy, the relying party can inform the system that it chooses not to join the job deployment -and will not exchange models with others. Only participants who trust and is trusted by one another will work together to run the NVFlare job. +participants by providing the CC token. Upon receiving the CC token, the participant (the relying party) can verify the claims inside the CC token against its own security policy to determine whether the CC token owner is +using the required hardware, software, and components for security. If the relying party finds the CC token does not meet its security policy, the relying party can inform the system that it chooses not to join the job deployment +and will not exchange models with others. Only participants who trust and are trusted by one another will work together to run the NVFlare job. ********************** @@ -20,7 +20,7 @@ Configuring CC Manager ********************** In order to enable confidential computing in NVFlare, users need to include the CC manager, as a component, inside the resources.json file of startup kit local folder. The entire NVFlare system must -configure with CC manager for either all participants or no participants. +be configured with the CC manager for either all participants or no participants. The CC manager component depends on `NVIDIA Attestation SDK `_. Users have to install it as a prerequisite. This SDK also depends on other software stacks, such as GPU verifier, driver and others. @@ -38,12 +38,12 @@ The following is the sample configuration of CC manager. }, -The ``id`` is used internally to NVFlare so that other components can get its instance. The ``path`` is the complele python module hierarchy. +The ``id`` is used internally by NVFlare so that other components can get its instance. The ``path`` is the complete Python module hierarchy. The ``args`` contains only the verifiers, a list of possible verifiers. Each verifier is a dictionary and its keys are "devices", "env", "url", "appraisal_policy_file" and "result_policy_file." -The value of devices is either "gpu" or "cpu" for current Attestation SDK. The values of env is either "local" and "test" for current Attestation SDK. +The value of devices is either "gpu" or "cpu" for current Attestation SDK. The value of env is either "local" or "test" for the current Attestation SDK. Currently, valid combination is gpu and local or cpu and test. The value of url must be an empty string. The appraisal_policy_file and result_policy_file must point to an existing file. The former is currently ignored by Attestation SDK. The latter currently supports the following content only @@ -92,10 +92,10 @@ Runtime behavior When one participant, either server or client, starts, the CC manager reacts to EventType.SYSTEM_BOOTSTRAP and retrieves its own CC token via Attestation SDK after the Attestation SDK successfully communicates with the software stacks and hardware. This CC token will be stored locally in CC manager. -When the client registers itself to the server, it also includes its CC token in the regsitration data. Server will collect the client's CC token if it successfully registers. The server CC manager keeps +When the client registers itself with the server, it also includes its CC token in the registration data. Server will collect the client's CC token if it successfully registers. The server CC manager keeps all client's CC tokens as well as its own token. -After a submitted job is schedule to be deployed, the server verifies CC tokens of clients that are included in the deployment map basd on its result policy. If server finds +After a submitted job is scheduled to be deployed, the server verifies the CC tokens of clients that are included in the deployment map based on its result policy. If server finds all tokens from clients in the deployment map are verified successfully, those tokens will be sent to clients in deployment map for client side verification. The client can determine whether it wants to join this job or not based on the result of verifying others' CC tokens against its own result policy. If one client decides not to join the job, server will not deploy that job to that client. diff --git a/docs/user_guide/configurations/communication_configuration.rst b/docs/user_guide/configurations/communication_configuration.rst index dbe1ad2a4b..52d6d8c79e 100644 --- a/docs/user_guide/configurations/communication_configuration.rst +++ b/docs/user_guide/configurations/communication_configuration.rst @@ -10,7 +10,7 @@ All cells form a communication network called CellNet and each cell has a unique Any cell can communicate with any other cells via their FQCNs, regardless how the messages are routed. FLARE is a multi-job system in that multiple jobs can be executed at the same time. -When a FLARE system is started, the CellNet only comprises of the server and one client cell for each site. +When a FLARE system is started, the CellNet consists of the server and one client cell for each site. All client cells are connected to the server cell. This topology is the backbone of the communication system and cells are called Parent Cells. When a job is deployed, the job is done by new cells dedicated to the execution of the job, one cell at each site (server and clients). @@ -19,7 +19,7 @@ These cells are called Job Cells which are started when the job is deployed, and This communication system provides many powerful features (multiple choices of communication drivers, large message streaming, ad-hoc direct connections, etc.). However, for these features to work well, they need to be configured properly. -This document describes all aspects that can be configured and how to do configure them properly. +This document describes all aspects that can be configured and how to configure them properly. The following aspects of the communication system can be configured: diff --git a/docs/user_guide/configurations/job_configuration.rst b/docs/user_guide/configurations/job_configuration.rst index d72e342c21..ccd38f7c76 100644 --- a/docs/user_guide/configurations/job_configuration.rst +++ b/docs/user_guide/configurations/job_configuration.rst @@ -13,7 +13,7 @@ When a job is deployed, dedicated job-specific processes are created throughout Specifically, a dedicated server process is created to perform server-side logic; and dedicated client processes (one process for each site) are created to perform client-side logic. This design allows multiple jobs to be running in their isolated space at the same time. The success or failure of a job won't interfere with the execution of other jobs. -The task-based interactions between a FL client and the FL server is done with the ClientRunner on the client side and the ServerRunner on the server side. +The task-based interactions between an FL client and the FL server are done with the ClientRunner on the client side and the ServerRunner on the server side. When the job is deployed, the order of the job process creation is not guaranteed - the server-side job process may be started before or after any client-side job process. To ensure that the ClientRunner does not start to fetch tasks from the ServerRunner, the two runners need to be synchronized first. @@ -27,7 +27,7 @@ runner_sync_timeout This variable is for the client-side configuration (config_fed_client.json). This runner_sync_timeout specifies the timeout value for the "runner sync" request. -If a response is not received from the Server within this specified value, then another "runner sync" request will be sent. +If a response is not received from the server within this specified value, then another "runner sync" request will be sent. The default value is 2.0 seconds. @@ -41,7 +41,7 @@ If a response is still not received after this many tries, the client's job proc The default value is 30. -The default settings of these two variables mean that if the ClientRunner and the ServerRunner are not synched within one minute, the client will terminate. +The default settings of these two variables mean that if the ClientRunner and the ServerRunner are not synchronized within one minute, the client will terminate. If one minute is not enough, you can extend these two variables to meet your requirement. Task Check diff --git a/docs/user_guide/dashboard_api.rst b/docs/user_guide/dashboard_api.rst index 100882e815..0084524ea5 100644 --- a/docs/user_guide/dashboard_api.rst +++ b/docs/user_guide/dashboard_api.rst @@ -4,32 +4,32 @@ Dashboard in NVIDIA FLARE ######################### As mentioned in :ref:`provisioning`, the NVIDIA FLARE system requires a set of startup kits -which include the private keys and certificates (signed by the root CA) in order to communicate to one another. -The new :ref:`nvflare_dashboard_ui` in NVIDIA FLARE provides a simple way to collect information of clients and users from different organizations, +which include the private keys and certificates (signed by the root CA) in order to communicate with one another. +The new :ref:`nvflare_dashboard_ui` in NVIDIA FLARE provides a simple way to collect information about clients and users from different organizations, as well as to generate those startup kits for users to download. -Most of the details about provisioning can be found in :ref:`provisioning`. In this section, we focus on the user interaction with Dashboard and its backend API. +Most of the details about provisioning can be found in :ref:`provisioning`. In this section, we focus on the user interaction with Dashboard and its backend APIs. .. include:: nvflare_cli/dashboard_command.rst ********************************** -NVIDIA FLARE Dashboard backend API +NVIDIA FLARE Dashboard backend APIs ********************************** Architecture ============ -The Dashboard backend API follows the Restful concept. It defines four resources, Project, Organizations, Client and User. There is one and only one Project. +The Dashboard backend APIs follows the RESTful concept. It defines four resources: Project, Organizations, Client, and User. There is one and only one Project. The Project includes information about server(s) and overseer (if in HA mode). Clients are defined for NVIDIA FLARE clients and Users for NVIDIA FLARE admin console. -Organizations is a GET only operation, which returns a list of current registered organizations. +Organizations is a GET-only operation that returns a list of currently registered organizations. Details ======= -API +APIs --- -The following is the complete definition of the backend API, written in OpenAPI 3.0 syntax. Developers can implement the same API in different programming language or -develop different UI while calling the same API for branding purpose. +The following is the complete definition of the backend APIs, written in OpenAPI 3.0 syntax. Developers can implement the same APIs in different programming language or +develop different UI while calling the same APIs for branding purpose. .. literalinclude:: ../../nvflare/dashboard/dashboard.yaml :language: yaml @@ -37,23 +37,23 @@ develop different UI while calling the same API for branding purpose. Authentication and Authorization -------------------------------- -Most of the backend API requires users to login to obtain JWT for authorization purpose. The JWT includes claims of user's organization and his/her role. The JWT itself always -has the user's email address (user id for login). +Most of the backend APIs requires users to login to obtain JWT for authorization purpose. The JWT includes claims about the user's organization and their role. The JWT itself always +includes the user's email address (the user ID for login). -As shown in the above section, only ``GET /project``, ``GET /users`` and ``GET /organizations`` can be called without login credential. +As shown in the previous section, only ``GET /project``, ``GET /users``, and ``GET /organizations`` can be called without login credentials. -The project_admin role can operate on any resources. +The project_admin role can operate on any resource. Freezing project ---------------- -Because the project itself contains information requires by clients and users, changing project information after clients and users are created will +Because the project itself contains information required by clients and users, changing project information after clients and users are created will cause incorrect dependencies. It is required for the project_admin to freeze the project after all project related information is set and finalized so -that the Dashboard web can allow users to signup. Once the project is frozen, there is no way, from the Dashboard web, to unfreeze the project. +that the Dashboard web can allow users to sign up. Once the project is frozen, there is no way, from the Dashboard web, to unfreeze the project. Database schema --------------- -The following is the schema of the underlying database used by the backend API. +The following is the schema of the underlying database used by the backend APIs. .. image:: ../resources/dashboard_schema.png :height: 800px diff --git a/docs/user_guide/dashboard_ui.rst b/docs/user_guide/dashboard_ui.rst index 0525b4e624..13703c3c12 100644 --- a/docs/user_guide/dashboard_ui.rst +++ b/docs/user_guide/dashboard_ui.rst @@ -163,7 +163,7 @@ The ``Project Admin`` is the administrator for the site and is responsible for i then approving the users and client sites while making edits if necessary. After deploying the FLARE Dashboard website package, the Project Admin should log in from the Home Page with the bootstrapped credentials -provided in the deployment proecess. At this point, the Project Home Page only has a placeholder title since none of the project values have +provided in the deployment process. At this point, the Project Home Page only has a placeholder title since none of the project values have been set yet. .. note:: diff --git a/docs/user_guide/federated_xgboost/reliable_xgboost_design.rst b/docs/user_guide/federated_xgboost/reliable_xgboost_design.rst index 99747dcd11..e257d8f19c 100644 --- a/docs/user_guide/federated_xgboost/reliable_xgboost_design.rst +++ b/docs/user_guide/federated_xgboost/reliable_xgboost_design.rst @@ -22,7 +22,7 @@ There are a few potential problems with this approach: - For each job, the XGBoost Server must open a port for clients to connect to. This adds burden to request IT for the additional port in the real-world situation. Even if a fixed port is allowed to open, and we reuse that port, - multiple XGBoost jobs can not be run at the same time, + multiple XGBoost jobs cannot be run simultaneously; since each XGBoost job requires a different port number. @@ -30,7 +30,7 @@ There are a few potential problems with this approach: Flare as XGBoost Communicator ***************************** -FLARE provides a highly flexible, scalable and reliable communication mechanism. +FLARE provides a highly flexible, scalable, and reliable communication mechanism. We enhance the reliability of federated XGBoost by using FLARE as the communicator of XGBoost, as shown here: @@ -54,12 +54,12 @@ Similarly, there is a local GRPC Client (LGC) on the FL Server that interacts with the XGBoost Server. The message path between the XGBoost Client and the XGBoost Server is as follows: - 1. The XGBoost client generates a gRPC message and sends it to the LGS in FLARE Client - 2. FLARE Client forwards the message to the FLARE Server. This is a reliable FLARE message. - 3. FLARE Server uses the LGC to send the message to the XGBoost Server. - 4. XGBoost Server sends the response back to the LGC in FLARE Server. - 5. FLARE Server sends the response back to the FLARE Client. - 6. FLARE Client sends the response back to the XGBoost Client via the LGS. + 1. The XGBoost client generates a gRPC message and sends it to the LGS in the FLARE client. + 2. The FLARE client forwards the message to the FLARE server. This is a reliable FLARE message. + 3. The FLARE server uses the LGC to send the message to the XGBoost server. + 4. The XGBoost server sends the response back to the LGC in the FLARE server. + 5. The FLARE server sends the response back to the FLARE client. + 6. The FLARE client sends the response back to the XGBoost client via the LGS. Please note that the XGBoost Client (c++) component could be running as a separate process or within the same process of FLARE Client. diff --git a/docs/user_guide/federated_xgboost/reliable_xgboost_timeout.rst b/docs/user_guide/federated_xgboost/reliable_xgboost_timeout.rst index efecba5647..7ffd66ae21 100644 --- a/docs/user_guide/federated_xgboost/reliable_xgboost_timeout.rst +++ b/docs/user_guide/federated_xgboost/reliable_xgboost_timeout.rst @@ -29,7 +29,7 @@ ReliableMessage Timeout There are two timeout values to control the behavior of ReliableMessage (RM). -Per-message Timeout +Per-Message Timeout ------------------- Essentially RM tries to resend the message until delivered successfully. @@ -42,8 +42,8 @@ The per-message timeout should be set to 5 seconds. .. note:: - Note that the initial XGBoost message might take more than 100 seconds - depends on the dataset size. + Note that the initial XGBoost message may take more than 100 seconds, + depending on the dataset size. Transaction Timeout ------------------- @@ -93,4 +93,4 @@ client op timeout. In general, follow this rule: -Per-message Timeout < Transaction Timeout < XGBoost Client Operation Timeout +Per-Message Timeout < Transaction Timeout < XGBoost Client Operation Timeout diff --git a/docs/user_guide/flower_integration/flower_job_structure.rst b/docs/user_guide/flower_integration/flower_job_structure.rst index bf10a7ddde..6d56c71886 100644 --- a/docs/user_guide/flower_integration/flower_job_structure.rst +++ b/docs/user_guide/flower_integration/flower_job_structure.rst @@ -74,7 +74,7 @@ Here is an example of ``pyproject.toml``, taken from :github_nvflare_link:`this Project Name ------------ The project name should match the name of the project folder, though not a requirement. In this example, it is ``flwr_pt``. -Serverapp Specification +Server App Specification This value is specified following this format: @@ -92,7 +92,7 @@ where: app = ServerApp(server_fn=server_fn) -Clientapp Specification +Client App Specification ------------------------ This value is specified following this format: diff --git a/docs/user_guide/nvflare_cli/dashboard_command.rst b/docs/user_guide/nvflare_cli/dashboard_command.rst index be0e9aac64..054829ae87 100644 --- a/docs/user_guide/nvflare_cli/dashboard_command.rst +++ b/docs/user_guide/nvflare_cli/dashboard_command.rst @@ -76,7 +76,7 @@ from scratch and you can provide a project admin email address and get a new pas The Dashboard will also check the cert folder inside current the working directory (or directory specified by the --folder option) to load web.crt and web.key. If those files exist, Dashboard will load them and run as an HTTPS server. If Dashboard does not find both of them, it runs as HTTP server. In both cases, the service listens to port 443, unless the ``--port`` option is used to specify a different port. Dashboard will run on ``0.0.0.0``, so by default it should be accessible on the same machine from -``localhost:443``. To make it available to users outside the network, port forwarding and other configurations may be needed to securely direct traffic to the maching running Dashboard. +``localhost:443``. To make it available to users outside the network, port forwarding and other configurations may be needed to securely direct traffic to the machine running Dashboard. .. note:: diff --git a/docs/user_guide/nvflare_cli/poc_command.rst b/docs/user_guide/nvflare_cli/poc_command.rst index 9f5d45e021..63f86c5833 100644 --- a/docs/user_guide/nvflare_cli/poc_command.rst +++ b/docs/user_guide/nvflare_cli/poc_command.rst @@ -12,7 +12,7 @@ Different processes represent the server, clients, and the admin console, making Syntax and Usage ================= -The POC command has been reorgaznied in version 2.4 to have the subcommands ``prepare``, ``prepare-jobs-dir``, ``start``, ``stop``, and ``clean``. +The POC command has been reorganized in version 2.4 to have the subcommands ``prepare``, ``prepare-jobs-dir``, ``start``, ``stop``, and ``clean``. .. code-block:: none diff --git a/docs/user_guide/nvflare_cli/preflight_check.rst b/docs/user_guide/nvflare_cli/preflight_check.rst index ea2dda4db5..ec7d5b3c1f 100644 --- a/docs/user_guide/nvflare_cli/preflight_check.rst +++ b/docs/user_guide/nvflare_cli/preflight_check.rst @@ -138,7 +138,7 @@ The problems that may be reported: Check overseer running, Can't connect to overseer,"1) Please check if overseer is up or certificates are correct 2) Please check if overseer hostname in project.yml is available - 3) if running in local machine, check if overseer defined in project.yml is defined in /etc/hosts." + 3) if running on a local machine, check if the overseer defined in project.yml is defined in /etc/hosts." Check primary service provider available,Can't get primary service provider ({psp}) from overseer,Please check if server is up. Check SP's socket server available,Can't connect to primary service provider's ({sp_end_point}) socketserver,Please check if server is up. Check SP's GRPC server available,Can't connect to primary service provider's ({sp_end_point}) grpc server,Please check if server is up. diff --git a/docs/user_guide/nvflare_security.rst b/docs/user_guide/nvflare_security.rst index d4a8c0e442..9377e82a8c 100644 --- a/docs/user_guide/nvflare_security.rst +++ b/docs/user_guide/nvflare_security.rst @@ -38,7 +38,7 @@ All other security concerns must be handled by the site's IT security infrastruc - Physical security - Firewall policies - - Data management policies: storage, retention, cleaning, distributions, access, etc. + - Data management policies: storage, retention, cleaning, distribution, access, etc. Security Trust Boundary and Balance of Risk and Usability --------------------------------------------------------- diff --git a/docs/user_guide/security/authorization_policy_previewer.rst b/docs/user_guide/security/authorization_policy_previewer.rst index fcac5315fe..3649234f8f 100644 --- a/docs/user_guide/security/authorization_policy_previewer.rst +++ b/docs/user_guide/security/authorization_policy_previewer.rst @@ -9,7 +9,7 @@ Since authorization policy is vital for system security, and many people can now to validate the policies before deploying them to production. The Authorization Policy Previewer is a tool for validating authorization policy definitions. The tool provides an interactive -user interface and commands for the user to validate different aspects of policy definitions: +user interface and commands for users to validate different aspects of policy definitions: - Show defined roles and rights - Show the content of the policy definition diff --git a/docs/user_guide/security/communication_security.rst b/docs/user_guide/security/communication_security.rst index 3dd5d71609..f5dbfe78d1 100644 --- a/docs/user_guide/security/communication_security.rst +++ b/docs/user_guide/security/communication_security.rst @@ -16,4 +16,4 @@ Specifically, we suggest against the use of port 443, the typical port number fo not exactly implement HTTPS to the letter, and the firewall of some sites may decide to block it. The IT infrastructure of FL Client sites must allow the FL application to connect to the address (domain and port) -opened by the FL server. +opened by the FL Server. diff --git a/docs/user_guide/security/data_privacy_protection.rst b/docs/user_guide/security/data_privacy_protection.rst index e1593faacd..95a6b2d752 100644 --- a/docs/user_guide/security/data_privacy_protection.rst +++ b/docs/user_guide/security/data_privacy_protection.rst @@ -6,10 +6,10 @@ Federated learning activities are performed with task-based interactions between issues tasks to the clients, and clients process tasks and return results back to the server. NVFLARE comes with a general-purpose data :ref:`filtering mechanism ` for processing task data and results: - - On the Server: before task data is sent to the client, the configured "task_data_filters" defined in the job are executed; - - On the Client: when the task data is received by the client and before giving it to the executor for processing, NVFLARE framework applies configured "task_data_filters" defined in the job; + - On the Server: before task data is sent to the client, the configured "task_data_filters" defined in the job are executed. + - On the Client: when the task data is received by the client and before giving it to the executor for processing, NVFLARE framework applies configured "task_data_filters" defined in the job. - On the Client: after the execution of the task by the executor and before sending the produced result back to the server, NVFLARE framework applies configured "task_result_filters" to the result before sending to the Server. - - On the Server: after receiving the task result from the client, the NVFLARE framework applies configured "task_result_filters" before giving it to the Controller for processing. + - On the Server: after receiving the task result from the client, the NVFLARE framework applies the configured "task_result_filters" before giving it to the controller for processing. This mechanism has been used for the purpose of data privacy protection on the client side. For example, differential privacy filters can be applied to model weights before sending to the server for aggregation. diff --git a/docs/user_guide/security/site_policy_management.rst b/docs/user_guide/security/site_policy_management.rst index d7297dfd19..3ce0aac549 100644 --- a/docs/user_guide/security/site_policy_management.rst +++ b/docs/user_guide/security/site_policy_management.rst @@ -8,7 +8,7 @@ It is possible for each site to define its own policies in the following areas: - Resource Management: the configuration of system resources that are solely the decisions of local IT; - Authorization Policy: local authorization policy that determines what a user can or cannot do on the local site; - Privacy Policy: local policy that specifies what types of studies are allowed and how to add privacy protection to the learning results produced by the FL client on the local site. - - Logging Configuration: each site can now define its own logging configuration for system generated log messages. + - Logging Configuration: each site can define its own logging configuration for system-generated log messages. Workspace Structure @@ -56,7 +56,7 @@ Here is the complete workspace structure, with the addition of the "local" folde config custom -Content highlighted in yellow is generated by the Provision process - the ZIP package generated by the Provision now contains two +The content highlighted in yellow is generated by the Provision process. The ZIP package generated by the Provision now contains two folders: startup and local. The "startup" folder contains security credentials needed for communication to the FL Server, as well as general system configuration information. The "local" folder contains default and/or samples for local policies. If the Org Admin wants to define his/her own policies, he/she can do so by creating separate files to override the default. These files are unhighlighted diff --git a/docs/user_guide/security/terminologies_and_roles.rst b/docs/user_guide/security/terminologies_and_roles.rst index c32a8ecfd4..ce330e0ad8 100644 --- a/docs/user_guide/security/terminologies_and_roles.rst +++ b/docs/user_guide/security/terminologies_and_roles.rst @@ -33,7 +33,7 @@ on its local data. Overseer ---------- An application responsible for overseeing overall system health and enabling seamless failover of FL servers. This -component is only needed for High Available. +component is only needed for high availability. User ----- diff --git a/docs/user_guide/security/unsafe_component_detection.rst b/docs/user_guide/security/unsafe_component_detection.rst index 6494c5af35..1539781dd8 100644 --- a/docs/user_guide/security/unsafe_component_detection.rst +++ b/docs/user_guide/security/unsafe_component_detection.rst @@ -5,7 +5,7 @@ Unsafe Component Detection ************************** NVFLARE is based on a componentized architecture in that FL jobs are performed by components that are configured in configuration files. These components are created at the beginning of job execution. To address the issue of components potentially being unsafe -and leaking sensitive information, NVFLARE uses an event based solutionm. +and leaking sensitive information, NVFLARE uses an event based solution. NVFLARE has a very powerful and flexible event mechanism that allows custom code to be plugged into defined moments of system workflow (e.g. start/end of the job, before/after a task is executed, etc.). At such moments, NVFLARE fires events and invokes @@ -42,8 +42,8 @@ The important points are: - The class must extend FLComponent - It defines the handle_event method, following the exact signature - - It checks the event_type to be ``EventType.BEFORE_BUILD_COMPONENT``. - - It checks the component being built based on the information provided in the fl_ctx. There are many properties in fl_ctx. The most important ones are the ``COMPONENT_CONFIG`` that is a dict of the component's configuration data. The fl_ctx also has ``WORKSPACE_OBJECT`` that allows you to access any file in the job's workspace. + - It checks if the event_type is ``EventType.BEFORE_BUILD_COMPONENT``. + - It checks the component being built based on the information provided in the fl_ctx. There are many properties in fl_ctx. The most important ones are the ``COMPONENT_CONFIG`` that is a dict of the component's configuration data. The fl_ctx also has ``WORKSPACE_OBJECT`` which allows access to any file in the job's workspace. - If any issue is detected with the component to be built, you raise the ``UnsafeComponentError`` exception with a meaningful text. The following properties in the fl_ctx could be helpful too: diff --git a/docs/whats_new.rst b/docs/whats_new.rst index 845d108182..e5cccf9f6c 100644 --- a/docs/whats_new.rst +++ b/docs/whats_new.rst @@ -18,4 +18,4 @@ Previous Releases of FLARE release_notes/flare_220 release_notes/flare_210 -Also refer to the the NVFlare GitHub `releases `_ to see minor release notes for RC versions. +Also refer to the NVFlare GitHub `releases `_ to see minor release notes for RC versions. diff --git a/examples/advanced/README.md b/examples/advanced/README.md index ebe97ab801..96c0acd45a 100644 --- a/examples/advanced/README.md +++ b/examples/advanced/README.md @@ -18,7 +18,7 @@ Please also install "./requirements.txt" in each example folder. and [homomorphic encryption](https://developer.nvidia.com/blog/federated-learning-with-homomorphic-encryption/). * [Federated XGBoost](./xgboost/README.md) * Includes examples of [histogram-based](./xgboost/histogram-based/README.md) algorithm, [tree-based](./xgboost/tree-based/README.md). - Tree-based algorithms also includes [bagging](./xgboost/tree-based/jobs/bagging_base) and [cyclic](./xgboost/tree-based/jobs/cyclic_base) approaches. + Tree-based algorithms also include [bagging](./xgboost/tree-based/jobs/bagging_base) and [cyclic](./xgboost/tree-based/jobs/cyclic_base) approaches. ## Traditional ML examples * [Federated Linear Model with Scikit-learn](./sklearn-linear/README.md) diff --git a/examples/advanced/federated-statistics/README.md b/examples/advanced/federated-statistics/README.md index 8af2844f3b..32b09124ce 100644 --- a/examples/advanced/federated-statistics/README.md +++ b/examples/advanced/federated-statistics/README.md @@ -1,7 +1,7 @@ # Federated Statistics Overview ## Objective -NVIDIA FLARE will provide built-in federated statistics operators (controller and executors) that +NVIDIA FLARE will provide built-in federated statistics operators (controllers and executors) that can generate global statistics based on local client side statistics. At each client site, we could have one or more datasets (such as "train" and "test" datasets); each dataset may have many diff --git a/examples/advanced/federated-statistics/hierarchical_stats/hierarchical_stats.ipynb b/examples/advanced/federated-statistics/hierarchical_stats/hierarchical_stats.ipynb index c2abee5d0e..e7065c7e97 100644 --- a/examples/advanced/federated-statistics/hierarchical_stats/hierarchical_stats.ipynb +++ b/examples/advanced/federated-statistics/hierarchical_stats/hierarchical_stats.ipynb @@ -49,7 +49,7 @@ "\n", "## Prepare data\n", "\n", - "In this example, we are using synthetic anonymous students scores datasets generated for student belonging to 7 different universities.\n", + "In this example, we are using synthetic anonymous student score datasets generated for students belonging to 7 different universities.\n", "\n", "Run the script `prepare_data.sh` that generates 7 different datasets each having random number of entries between 1000 to 2000. Each entry in the datasets has three columns - `Pass`, `Fail` and `Percentage`. `Pass`/`Fail` represents whether the particular student passed or failed the exam and `Percentage` represents the overall percentage marks scored by the student.\n", "\n" @@ -120,7 +120,7 @@ "source": [ "**Run Job using Simulator CLI**\n", "\n", - "From a **terminal** one can also the following equivalent CLI\n", + "From a **terminal**, one can also use the following equivalent CLI command:\n", "\n", "```\n", "cd NVFlare/examples/advanced/federated-statistics\n", @@ -167,7 +167,7 @@ }, "source": [ "## Visualization\n", - "We can visualize the results easly via the visualization notebook. Before we do that, we need to copy the data to the notebook directory \n" + "We can easily visualize the results via the visualization notebook. Before we do that, we need to copy the data to the notebook directory. \n" ] }, { @@ -198,7 +198,7 @@ }, "source": [ "## We are done !\n", - "Congratulations, you just completed the federated hierarchical stats calulation with data represented by data frame!\n" + "Congratulations, you have just completed the federated hierarchical statistics calculation with data represented by a DataFrame!\n" ] } ], diff --git a/examples/advanced/finance-end-to-end/notebooks/graph_construct.ipynb b/examples/advanced/finance-end-to-end/notebooks/graph_construct.ipynb index cd02ae469b..83d1f47cfa 100644 --- a/examples/advanced/finance-end-to-end/notebooks/graph_construct.ipynb +++ b/examples/advanced/finance-end-to-end/notebooks/graph_construct.ipynb @@ -12,7 +12,7 @@ "Each node represents a transaction, and the edges represent the relationships between transactions. Since each site consists of the same Sender_BIC, to define the graph edge, we use the following rules:\n", "\n", "1. The two transactions are with the same Receiver_BIC.\n", - "2. The two transactions time difference are smaller than 6000.\n", + "2. The time difference between the two transactions is smaller than 6000.\n", "\n", "Note that in real applications, such rules should be designed according to the characteristics of the candidate data." ] diff --git a/examples/advanced/finance-end-to-end/notebooks/prepare_data.ipynb b/examples/advanced/finance-end-to-end/notebooks/prepare_data.ipynb index 76d9264ae2..a499ad7c94 100644 --- a/examples/advanced/finance-end-to-end/notebooks/prepare_data.ipynb +++ b/examples/advanced/finance-end-to-end/notebooks/prepare_data.ipynb @@ -15,7 +15,7 @@ "source": [ "## Prepare Data\n", "First download data from [kaggle credit card fraud dataset](https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud) and save it to the below `data_path`\n", - "### Based on the riginal data, add randome synthentic data to make full dataset\n", + "### Based on the original data, add random synthetic data to make a full dataset\n", "* expand time in seconds x 200 times to cover 26 months\n", "* double the data record size\n", "* add other categorical features, sender_bic, receiever_bic, beneficiary_bic, orginator_bic, currency, country\n", diff --git a/examples/advanced/finance-end-to-end/xgboost.ipynb b/examples/advanced/finance-end-to-end/xgboost.ipynb index b1e15e6537..26dac6d0b9 100644 --- a/examples/advanced/finance-end-to-end/xgboost.ipynb +++ b/examples/advanced/finance-end-to-end/xgboost.ipynb @@ -7,7 +7,7 @@ "source": [ "# End-to-end credit card fraud detection with Federated XGBoost\n", "\n", - "This notebooks shows the how do we convert and existing tabular credit data, enrich and pre-process data using one-site (like centralized dataset) and then convert this centralized process into a federated ETL steps, easily. Then construct a federated XGBoost, the only thing user need to define is the XGboost data loader. \n", + "This notebook shows how to convert an existing tabular credit dataset, enrich and pre-process the data using a single site (like a centralized dataset), and then convert this centralized process into federated ETL steps easily. Then, construct a federated XGBoost; the only thing the user needs to define is the XGBoost data loader. \n", "\n", "## Install requirements\n" ] @@ -50,7 +50,7 @@ "1. **Feature Enrichment**: This process involves adding new features based on the existing data. For example, we can calculate the average transaction amount for each currency and add this as a new feature. \n", "2. **Feature Encoding**: This process involves encoding the current features and transforming them to embedding space via machine learning models. This model can be either pre-trained, or trained with the candidate dataset.\n", "\n", - "Considering the fact that the only two numerical features in the dataset are \"Amount\" and \"Time\", we will perform feature enrichment first. Optionally, we can also perform feature encoding. In this example, we use graph neural network (GNN): we will train the GNN model in a federated unsupervised fashion, and then use the model to encode the features for all sites. " + "Considering the fact that the only two numerical features in the dataset are \"Amount\" and \"Time\", we will perform feature enrichment first. Optionally, we can also perform feature encoding. In this example, we use a graph neural network (GNN); we will train the GNN model in a federated, unsupervised fashion and then use the model to encode the features for all sites. " ] }, { @@ -142,7 +142,7 @@ "source": [ "## Step 3: Federated XGBoost \n", "\n", - "Now that we have the data ready, either enriched and normalized features, or GNN feature embeddings, we can fit them with XGBoost. NVIDIA FLARE has already has written XGBoost Controller and Executor for the job. All we need to provide is the data loader to fit into the XGBoost.\n", + "Now that we have the data ready, either enriched and normalized features, or GNN feature embeddings, we can fit them with XGBoost. NVIDIA FLARE has already written XGBoost Controller and Executor for the job. All we need to provide is the data loader to fit into the XGBoost.\n", "\n", "To specify the controller and executor, we need to define a Job. You can find the job construction in\n", "\n", @@ -436,7 +436,7 @@ "\n", "With job running well in simulator, we are ready to run in a POC mode, so we can simulate the deployment in localhost or simply deploy to production. \n", "\n", - "All we need is the job definition. we can use job.export_job() method to generate the job configuration and export to given directory. For example, in xgb_job.py, we have the following\n", + "All we need is the job definition; we can use the job.export_job() method to generate the job configuration and export it to a given directory. For example, in xgb_job.py, we have the following\n", "\n", "```\n", " if work_dir:\n", diff --git a/examples/advanced/finance/README.md b/examples/advanced/finance/README.md index 8122417b55..2041b0ed8f 100644 --- a/examples/advanced/finance/README.md +++ b/examples/advanced/finance/README.md @@ -12,7 +12,7 @@ In these examples, we illustrate the use of NVFlare to carry out the following f - tree-based collaboration with cyclic federation - tree-based collaboration with bagging federation -For more details, please refer to the READMEs for +For more details, please refer to the README files for [vertical](https://github.com/NVIDIA/NVFlare/blob/main/examples/advanced/vertical_xgboost/README.md), [histogram-based](https://github.com/NVIDIA/NVFlare/tree/main/examples/advanced/xgboost/histogram-based/README.md), and [tree-based](https://github.com/NVIDIA/NVFlare/tree/main/examples/advanced/xgboost/tree-based/README.md) @@ -48,7 +48,7 @@ bash run_training.sh This will cover baseline centralized training, horizontal FL with histogram-based, tree-based cyclic, and tree-based bagging collaborations, as well as vertical FL. -Then we test the resulting models on the test dataset with +Then, we test the resulting models on the test dataset using ``` bash run_testing.sh ``` diff --git a/examples/advanced/fl_hub/jobs/numpy-cross-val/README.md b/examples/advanced/fl_hub/jobs/numpy-cross-val/README.md index 526bb75ca2..b373efa518 100644 --- a/examples/advanced/fl_hub/jobs/numpy-cross-val/README.md +++ b/examples/advanced/fl_hub/jobs/numpy-cross-val/README.md @@ -1,6 +1,6 @@ # Hello Numpy Scatter and Gather "[Scatter and Gather](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_common.workflows.scatter_and_gather.html)" is the standard workflow to implement Federated Averaging ([FedAvg](https://arxiv.org/abs/1602.05629)). -This workflow follows the hub and spoke model for communicating the global model to each client for local training (i.e., "scattering") and aggregates the result to perform the global model update (i.e., "gathering"). +This workflow follows the hub-and-spoke model for communicating the global model to each client for local training (i.e., "scattering") and aggregates the results to perform the global model update (i.e., "gathering"). > **_NOTE:_** This example uses a Numpy-based trainer and will generate its data within the code. diff --git a/examples/advanced/gnn/gnn_examples.ipynb b/examples/advanced/gnn/gnn_examples.ipynb index 54ab3ede98..212ff65f5a 100644 --- a/examples/advanced/gnn/gnn_examples.ipynb +++ b/examples/advanced/gnn/gnn_examples.ipynb @@ -197,7 +197,7 @@ "id": "5bad4f55-d582-4f37-a523-927dc015e564", "metadata": {}, "source": [ - "We shall see `sag_gnn` from the above command. We then create jobs using this template, and set local epochs to 10 with 7 rounds of FL to match local experiments' 70 epoch default training." + "We shall see `sag_gnn` from the above command. We then create jobs using this template and set the local epochs to 10 with 7 rounds of FL to match the default 70-epoch training in local experiments." ] }, { diff --git a/examples/advanced/job_api/README.md b/examples/advanced/job_api/README.md index ef4b70fdc0..8d1ddab5c4 100644 --- a/examples/advanced/job_api/README.md +++ b/examples/advanced/job_api/README.md @@ -1,13 +1,13 @@ # Additional Examples for NVIDIA FLARE Job API -you probably already have looked at [getting started](../../getting_started) examples, +you have probably already looked at [getting started](../../getting_started) examples, and [hello-world](../../hello-world) examples. Here are additional examples for advanced algorithms ### Basic Concepts -At the heart of NVFlare lies the concept of collaboration through "tasks." An FL controller assigns tasks +At the heart of NVFlare lies the concept of collaboration through tasks. An FL controller assigns tasks (e.g., training on local data) to one or more FL clients, processes returned results (e.g., model weight updates), and may assign additional tasks based on these results and other factors (e.g., a pre-configured number of training rounds). -The clients run executors which can listen for tasks and perform the necessary computations locally, such as model training. +The clients run executors that can listen for tasks and perform the necessary computations locally, such as model training. This task-based interaction repeats until the experiment’s objectives are met. We can also add data filters (for example, for [homomorphic encryption](https://www.usenix.org/conference/atc20/presentation/zhang-chengliang) @@ -18,8 +18,8 @@ or results received or produced by the server or clients. ### Examples We have several examples to illustrate job APIs -Each example folder includes basic job configurations for running different FL algorithms. -such as [FedOpt](https://arxiv.org/abs/2003.00295), or [SCAFFOLD](https://arxiv.org/abs/1910.06378). +Each example folder includes basic job configurations for running different FL algorithms, +such as [FedOpt](https://arxiv.org/abs/2003.00295) and [SCAFFOLD](https://arxiv.org/abs/1910.06378). ### 1. [PyTorch Examples](./pt/README.md) ### 2. [Tensorflow Examples](./tf/README.md) diff --git a/examples/advanced/job_api/pt/README.md b/examples/advanced/job_api/pt/README.md index 18830da6a7..16f3f19f41 100644 --- a/examples/advanced/job_api/pt/README.md +++ b/examples/advanced/job_api/pt/README.md @@ -41,7 +41,7 @@ Implementation of [cyclic weight transfer](https://arxiv.org/abs/1709.05929) usi ```commandline python cyclic_cc_script_runner_cifar10.py ``` -### 5. [Federated averaging using model learning](./fedavg_model_learner_xsite_val_cifar10.py)) +### 5. [Federated averaging using model learning](./fedavg_model_learner_xsite_val_cifar10.py) Implementation of [FedAvg](https://arxiv.org/abs/1602.05629) using the [model learner class](https://nvflare.readthedocs.io/en/main/programming_guide/execution_api_type/model_learner.html), followed by [cross site validation](https://nvflare.readthedocs.io/en/main/programming_guide/controllers/cross_site_model_evaluation.html) for federated model evaluation. diff --git a/examples/advanced/job_api/tf/README.md b/examples/advanced/job_api/tf/README.md index cdfbc6d3c7..25668a2134 100644 --- a/examples/advanced/job_api/tf/README.md +++ b/examples/advanced/job_api/tf/README.md @@ -64,12 +64,11 @@ script. > `export TF_FORCE_GPU_ALLOW_GROWTH=true && export > TF_GPU_ALLOCATOR=cuda_malloc_asyncp` -We use Dirichelet sampling (implementation from FedMA (https://github.com/IBM/FedMA)) on -CIFAR10 data labels to simulate data heterogeneity among data splits for different client -sites, controlled by an alpha value, ranging from 0 (not including 0) -to 1. A high alpha value indicates less data heterogeneity, i.e., an -alpha value equal to 1.0 would result in homogeneous data distribution -among different splits. +We use Dirichlet sampling (implementation from FedMA (https://github.com/IBM/FedMA)) on +CIFAR-10 data labels to simulate data heterogeneity among data splits for different client +sites, controlled by an alpha value ranging from 0 (exclusive) to 1. A high alpha value +indicates less data heterogeneity, i.e., analpha value equal to 1.0 would result in homogeneous +data distribution among different splits. ### 2.1 Centralized training diff --git a/examples/advanced/kaplan-meier-he/README.md b/examples/advanced/kaplan-meier-he/README.md index a662942efc..1d76d2ee4d 100644 --- a/examples/advanced/kaplan-meier-he/README.md +++ b/examples/advanced/kaplan-meier-he/README.md @@ -62,7 +62,7 @@ To run the baseline script, simply execute: ```commandline python utils/baseline_kaplan_meier.py ``` -By default, this will generate a KM curve image `km_curve_baseline.png` under `/tmp` directory. The resutling KM curve is shown below: +By default, this will generate a KM curve image `km_curve_baseline.png` under `/tmp` directory. The resulting KM curve is shown below: ![KM survival baseline](figs/km_curve_baseline.png) Here, we show the survival curve for both daily (without binning) and weekly binning. The two curves aligns well with each other, while the weekly-binned curve has lower resolution. diff --git a/examples/advanced/prostate/prostate_2D/README.md b/examples/advanced/prostate/prostate_2D/README.md index 1028bfa55b..dc4e8b35aa 100644 --- a/examples/advanced/prostate/prostate_2D/README.md +++ b/examples/advanced/prostate/prostate_2D/README.md @@ -8,7 +8,7 @@ The [U-Net](https://arxiv.org/abs/1505.04597) model is trained to segment the wh ## Run automated experiments We use the NVFlare simulator to run FL training automatically, the 6 clients are named `client_I2CVB, client_MSD, client_NCI_ISBI_3T, client_NCI_ISBI_Dx, client_Promise12, client_PROSTATEx` ### Prepare local configs -First, we copy the custom code to job folders, and add the image directory root to `config_train.json` files for generating the absolute path to dataset and datalist. In the current folder structure, it will be `${PWD}/..`, it can be any arbitary path where the data locates. +First, we copy the custom code to job folders, and add the image directory root to `config_train.json` files for generating the absolute path to dataset and datalist. In the current folder structure, it will be `${PWD}/..`, which can be any arbitrary path where the data is located. ``` for job in prostate_central prostate_fedavg prostate_fedprox prostate_ditto do @@ -36,7 +36,7 @@ For federated training, we use Note that since the current experiments are performed on a light 2D dataset, we used [`CacheDataset`](https://docs.monai.io/en/stable/data.html#cachedataset) and set cache rate to 1.0 to accelerate the training process. Please adjust the cache rate if memory resource is limited on your system. ### Experiment list -In this example, we perform the following examples: +In this example, we perform the following experiments: 1. Centralized training, using the combination of training and validation data from all clients 2. Standard [FedAvg](https://arxiv.org/abs/1602.05629) 3. [FedProx](https://arxiv.org/abs/1812.06127), which adds a regularizer to the loss used in `SupervisedProstateLearner` (`fedproxloss_mu`) @@ -62,7 +62,7 @@ python3 ./result_stat/plot_tensorboard_events.py The TensorBoard curves (smoothed with weight 0.8) for validation Dice for the 150 epochs (150 rounds, 1 local epochs per round) during training are shown below: ![All training curve](./figs/all_training.png) -### Testing score +### Testing Scores The testing score is computed based on the best global model for Central/FedAvg/FedProx, and the six best personalized models for Ditto. We provide a script for performing validation on testing data split. diff --git a/examples/advanced/sklearn-svm/sklearn_svm_cancer.ipynb b/examples/advanced/sklearn-svm/sklearn_svm_cancer.ipynb index 2f95028e42..b9be4a87c0 100644 --- a/examples/advanced/sklearn-svm/sklearn_svm_cancer.ipynb +++ b/examples/advanced/sklearn-svm/sklearn_svm_cancer.ipynb @@ -86,7 +86,7 @@ "metadata": {}, "source": [ "## 2. Data preparation \n", - "This example uses the the breast cancer dataset available from Scikit-learn's dataset API. " + "This example uses the breast cancer dataset available from Scikit-learn's dataset API." ] }, { diff --git a/examples/advanced/vertical_federated_learning/cifar10-splitnn/cifar10_split_learning.ipynb b/examples/advanced/vertical_federated_learning/cifar10-splitnn/cifar10_split_learning.ipynb index 82f0101c3c..d8ee938e24 100644 --- a/examples/advanced/vertical_federated_learning/cifar10-splitnn/cifar10_split_learning.ipynb +++ b/examples/advanced/vertical_federated_learning/cifar10-splitnn/cifar10_split_learning.ipynb @@ -134,7 +134,7 @@ "source": [ "The result will be saved on each client's working directory in `intersection.txt`.\n", "\n", - "We can check the correctness of the result by comparing to the generate ground truth overlap, saved in `overlap.npy`." + "We can check the correctness of the result by comparing it to the generated ground truth overlap, saved in `overlap.npy`." ] }, { diff --git a/examples/advanced/xgboost/histogram-based/README.md b/examples/advanced/xgboost/histogram-based/README.md index ba92e787ef..8c89f95eff 100644 --- a/examples/advanced/xgboost/histogram-based/README.md +++ b/examples/advanced/xgboost/histogram-based/README.md @@ -34,7 +34,7 @@ To run in a federated setting, follow [Real-World FL](https://nvflare.readthedoc start the overseer, FL servers and FL clients. You need to download the HIGGS data on each client site. -You will also need to install the xgboost on each client site and server site. +You will also need to install XGBoost on each client site and server site. You can still generate the data splits and job configs using the scripts provided. @@ -43,12 +43,12 @@ You might also need to modify the `data_path` in the `data_site-XXX.json` inside the `/tmp/nvflare/xgboost_higgs_dataset` folder, since each site might save the HIGGS dataset in different places. -Then you can use admin client to submit the job via `submit_job` command. +Then, you can use the admin client to submit the job via the `submit_job` command. ## Customization -The provided XGBoost executor can be customized using Boost parameters -provided in `xgb_params` argument. +The provided XGBoost executor can be customized using boost parameters +provided in the `xgb_params` argument. If the parameter change alone is not sufficient and code changes are required, a custom executor can be implemented to make calls to xgboost library directly. @@ -56,7 +56,7 @@ a custom executor can be implemented to make calls to xgboost library directly. The custom executor can inherit the base class `FedXGBHistogramExecutor` and overwrite the `xgb_train()` method. -To use other dataset, can inherit the base class `XGBDataLoader` and +To use a different dataset, you can inherit the base class `XGBDataLoader` and implement the `load_data()` method. ## Loose integration diff --git a/examples/advanced/xgboost_secure/README.md b/examples/advanced/xgboost_secure/README.md index e20a51730c..bd2268024f 100644 --- a/examples/advanced/xgboost_secure/README.md +++ b/examples/advanced/xgboost_secure/README.md @@ -7,7 +7,7 @@ Several mechanisms have been proposed for training an XGBoost model in a federat In this example, we further extend the existing horizontal and vertical federated learning approaches to support secure federated learning using homomorphic encryption. Depending on the characteristics of the data to be encrypted, we can choose between [CKKS](https://github.com/OpenMined/TenSEAL) and [Paillier](https://github.com/intel/pailliercryptolib_python). -In the following, we illustrate both *horizontal* and *vertical* federated XGBoost, *without* and *with* homomorphic encryption. Please refer to our [documentation]() for more details on the pipeline design and the encryption logic. +In the following, we illustrate both *horizontal* and *vertical* federated XGBoost, *without* and *with* homomorphic encryption. Please refer to our [documentation](https://nvflare.readthedocs.io/en/main/user_guide/federated_xgboost/secure_xgboost_user_guide.html) for more details on the pipeline design and the encryption logic. ## Installation To be able to run all the examples, please install the requirements first. @@ -69,8 +69,7 @@ This will cover baseline centralized training, federated xgboost run in the same (server and clients are running in different processes) with and without secure feature. ## Generates the FLARE Job -We can use our job template and `nvflare job` command to generates different jobs for -different scenarios: +We can use our job template and the `nvflare job` command to generate different jobs for different scenarios: ``` # config the job template directory diff --git a/examples/getting_started/pt/nvflare_pt_getting_started.ipynb b/examples/getting_started/pt/nvflare_pt_getting_started.ipynb index 79e1b99259..966388c67f 100644 --- a/examples/getting_started/pt/nvflare_pt_getting_started.ipynb +++ b/examples/getting_started/pt/nvflare_pt_getting_started.ipynb @@ -391,7 +391,7 @@ "id": "9ccbe893", "metadata": {}, "source": [ - "If using Google Colab and the output is not showing correctly, export the job and run it with the simulator command instead:" + "If you are using Google Colab and the output is not showing correctly, export the job and run it with the simulator command instead:" ] }, { diff --git a/examples/getting_started/sklearn/README.md b/examples/getting_started/sklearn/README.md index 741dde91f4..0512e6de17 100644 --- a/examples/getting_started/sklearn/README.md +++ b/examples/getting_started/sklearn/README.md @@ -5,7 +5,7 @@ We provide examples to quickly get you started using NVFlare's Job API. All examples in this folder are based on using [scikit-learn](https://scikit-learn.org/), a popular library for general machine learning with Python. ## Setup environment -First, install nvflare and dependencies: +First, install NVFlare and its dependencies: ```commandline pip install -r requirements.txt ``` diff --git a/examples/getting_started/tf/README.md b/examples/getting_started/tf/README.md index 7ea0208c2e..5478e4136d 100644 --- a/examples/getting_started/tf/README.md +++ b/examples/getting_started/tf/README.md @@ -55,16 +55,15 @@ script. > [!WARNING] > If you are using GPU, make sure to set the following > environment variables before running a training job, to prevent -> `Tensoflow` from allocating full GPU memory all at once: +> `TensorFlow` from allocating full GPU memory all at once: > `export TF_FORCE_GPU_ALLOW_GROWTH=true && export > TF_GPU_ALLOCATOR=cuda_malloc_asyncp` -We use Dirichelet sampling (implementation from FedMA (https://github.com/IBM/FedMA)) on -CIFAR10 data labels to simulate data heterogeneity among data splits for different client -sites, controlled by an alpha value, ranging from 0 (not including 0) -to 1. A high alpha value indicates less data heterogeneity, i.e., an -alpha value equal to 1.0 would result in homogeneous data distribution -among different splits. +We apply Dirichlet sampling (as implemented in FedMA: https://github.com/IBM/FedMA) to +CIFAR10 data labels to simulate data heterogeneity among client sites, controlled by an +alpha value between 0 (exclusive) and 1. A high alpha value indicates less data +heterogeneity, i.e., an alpha value equal to 1.0 would result in homogeneous data +distribution among different splits. ### 2.1 Centralized training diff --git a/examples/hello-world/hello-fedavg-numpy/hello-fedavg-numpy_flare_api.ipynb b/examples/hello-world/hello-fedavg-numpy/hello-fedavg-numpy_flare_api.ipynb index e8b09ed8eb..ca06322b4e 100644 --- a/examples/hello-world/hello-fedavg-numpy/hello-fedavg-numpy_flare_api.ipynb +++ b/examples/hello-world/hello-fedavg-numpy/hello-fedavg-numpy_flare_api.ipynb @@ -35,7 +35,7 @@ "\n", "In the rest of this example, we assume that 'nvflare provision' has been run in a workspace (set to '/workspace' below, but you can change this to the location you run provision from) to set up a project named `hello-example` with a server and two clients. Feel free to use an existing provisioned NVFLARE project if you have that available, or to try things out, you could set up and start a system in [POC mode](https://nvflare.readthedocs.io/en/main/getting_started.html#setting-up-the-application-environment-in-poc-mode).\n", "\n", - "Use the 'start.sh' scripts to start the server and clients in seperate terminals to start the system." + "Use the 'start.sh' scripts to start the server and clients in separate terminals to start the system." ] }, { @@ -124,7 +124,7 @@ "\n", "You should be able to see the output in the terminals where you are running your FL Server and Clients when you submitted the job. You can also use `monitor_job()` to follow along and give you updates on the progress until the job is done.\n", "\n", - "By default, `monitor_job()` only has one required arguement, the `job_id` of the job you are waiting for, and the default behavior is to wait until the job is complete before returning a Return Code of `JOB_FINISHED`.\n", + "By default, `monitor_job()` only has one required argument, the `job_id` of the job you are waiting for, and the default behavior is to wait until the job is complete before returning a Return Code of `JOB_FINISHED`.\n", "\n", "In order to follow along and see a more meaningful result, the following cell contains the `basic_cb_with_print` callback that keeps track of the number of times the callback is run and prints the `job_meta` the first three times and the final time before `monitor_job()` completes with every other call just printing a dot to save output space. This callback improves the output and is just an example of what can be done with additional arguments and the `job_meta` information of the job that is being monitored." ] diff --git a/examples/hello-world/hello-numpy-cross-val/README.md b/examples/hello-world/hello-numpy-cross-val/README.md index bba69d07db..92e35620f5 100644 --- a/examples/hello-world/hello-numpy-cross-val/README.md +++ b/examples/hello-world/hello-numpy-cross-val/README.md @@ -9,8 +9,7 @@ Follow the [Installation](../../getting_started/README.md) instructions. # Run training and cross site validation right after training -This example uses a Numpy-based trainer to simulate the training -steps. +This example uses a NumPy-based trainer to simulate the training steps. We first perform FedAvg training and then conduct cross-site validation. @@ -33,7 +32,7 @@ $ ls /tmp/nvflare/jobs/workdir/ server/ site-1/ site-2/ startup/ ``` -The cross site validation results: +The cross-site validation results: ```bash $ cat /tmp/nvflare/jobs/workdir/server/simulate_job/cross_site_val/cross_val_results.json @@ -57,7 +56,7 @@ python3 generate_pretrain_models.py ## 2. Prepare the job and run the experiment using simulator -Note that our pretrained models is generated under: +Note that our pre-trained models are generated under: ```python SERVER_MODEL_DIR = "/tmp/nvflare/server_pretrain_models" @@ -81,7 +80,7 @@ $ ls /tmp/nvflare/jobs/workdir/ server/ site-1/ site-2/ startup/ ``` -The cross site validation results: +The cross-site validation results: ```bash $ cat /tmp/nvflare/jobs/workdir/server/simulate_job/cross_site_val/cross_val_results.json diff --git a/examples/hello-world/hello-numpy-sag/hello_numpy_sag.ipynb b/examples/hello-world/hello-numpy-sag/hello_numpy_sag.ipynb index 6a2db8b6bc..7215cb2a0f 100644 --- a/examples/hello-world/hello-numpy-sag/hello_numpy_sag.ipynb +++ b/examples/hello-world/hello-numpy-sag/hello_numpy_sag.ipynb @@ -35,7 +35,7 @@ "\n", "In the rest of this example, we assume that 'nvflare provision' has been run in a workspace (set to '/workspace' below, but you can change this to the location you run provision from) to set up a project named `hello-example` with a server and two clients. Feel free to use an existing provisioned NVFLARE project if you have that available, or to try things out, you could set up and start a system in [POC mode](https://nvflare.readthedocs.io/en/main/getting_started.html#setting-up-the-application-environment-in-poc-mode).\n", "\n", - "Use the 'start.sh' scripts to start the server and clients in seperate terminals to start the system." + "Use the 'start.sh' scripts to start the server and clients in separate terminals to start the system." ] }, { @@ -124,7 +124,7 @@ "\n", "You should be able to see the output in the terminals where you are running your FL Server and Clients when you submitted the job. You can also use `monitor_job()` to follow along and give you updates on the progress until the job is done.\n", "\n", - "By default, `monitor_job()` only has one required arguement, the `job_id` of the job you are waiting for, and the default behavior is to wait until the job is complete before returning a Return Code of `JOB_FINISHED`.\n", + "By default, `monitor_job()` requires only one argument, the `job_id` of the job you are waiting for, and it waits until the job is complete before returning a Return Code of `JOB_FINISHED`.\n", "\n", "In order to follow along and see a more meaningful result, the following cell contains the `basic_cb_with_print` callback that keeps track of the number of times the callback is run and prints the `job_meta` the first three times and the final time before `monitor_job()` completes with every other call just printing a dot to save output space. This callback improves the output and is just an example of what can be done with additional arguments and the `job_meta` information of the job that is being monitored." ] diff --git a/examples/hello-world/hello_world.ipynb b/examples/hello-world/hello_world.ipynb index efcdac1fbd..f76aabc879 100644 --- a/examples/hello-world/hello_world.ipynb +++ b/examples/hello-world/hello_world.ipynb @@ -42,7 +42,7 @@ "## Prerequisites\n", "Before you can run the examples here, the following preparation work must be done:\n", "\n", - "1. Install a virturalenv following the instructions in [README.md](https://github.com/NVIDIA/NVFlare/tree/main/examples)\n", + "1. Install a virtual environment by following the instructions in [README.md](https://github.com/NVIDIA/NVFlare/tree/main/examples)\n", "2. Install Jupyter Lab and install a new kernel for the virtualenv called `nvflare_example`\n", "3. Install NVFlare following this [notebook](../nvflare_setup.ipynb)\n", "4. Start NVFlare in POC mode following this [notebook](../tutorials/setup_poc.ipynb). All the examples in this notebook require 2 clients to run." @@ -65,7 +65,7 @@ "\n", "Then making sure you are in the ``nvflare_example`` venv, you can run the ``./hw_pre_start.sh`` script to install NVFlare, provision, and start the FL system in POC mode.\n", "\n", - "If the you getting errors, **do not repeatedly run ./hw_pre_start.sh**. First, you need to try to shut down NVFLARE system, using:\n", + "If you encounter errors, **do not repeatedly run ./hw_pre_start.sh**. Instead, try shutting down the NVFLARE system using:\n", "\n", "```\n", "./hw_post_cleanup.sh \n", diff --git a/examples/hello-world/ml-to-fl/README.md b/examples/hello-world/ml-to-fl/README.md index 1b7a9b9745..f6434d0a06 100644 --- a/examples/hello-world/ml-to-fl/README.md +++ b/examples/hello-world/ml-to-fl/README.md @@ -46,4 +46,4 @@ These implementations can be easily configured using the JobAPI's ScriptRunner. By default, the ```InProcessClientAPIExecutor``` is used, however setting `launch_external_process=True` uses the ```ClientAPILauncherExecutor``` with pre-configured CellPipes for communication and metrics streaming. -Note: Avoid install TensorFlow and PyTorch on the same virtual environment due to library conflicts. +Note: Avoid installing TensorFlow and PyTorch in the same virtual environment due to library conflicts. diff --git a/examples/hello-world/ml-to-fl/tf/README.md b/examples/hello-world/ml-to-fl/tf/README.md index b4918b710d..19c49b4f10 100644 --- a/examples/hello-world/ml-to-fl/tf/README.md +++ b/examples/hello-world/ml-to-fl/tf/README.md @@ -2,13 +2,13 @@ We will demonstrate how to transform an existing DL code into an FL application step-by-step: -1. [How to modify an existing training script using DL2FL Client API](#transform-cifar10-tensorflow-training-code-to-fl-with-nvflare-client-api) +1. [How to modify an existing training script using the DL2FL Client API](#transform-cifar10-tensorflow-training-code-to-fl-with-nvflare-client-api) 2. [How to modify an existing multi GPU training script using DL2FL Client API](#transform-cifar10-tensorflow-multi-gpu-training-code-to-fl-with-nvflare-client-api) ## Software Requirements -Please install the requirements first, it is suggested to install inside a virtual environment: +Please install the requirements first. It is suggested to install them inside a virtual environment. ```bash pip install -r requirements.txt @@ -38,12 +38,12 @@ TF_FORCE_GPU_ALLOW_GROWTH=true TF_GPU_ALLOCATOR=cuda_malloc_async ``` If you possess more GPUs than clients, a good strategy is to run one client on each GPU. -This can be achieved using the `-gpu` argument during simulation, e.g., `nvflare simulator -n 2 -gpu 0,1 [job]`. +This can be achieved by using the `-gpu` argument during simulation, e.g., `nvflare simulator -n 2 -gpu 0,1 [job]`. ## Transform CIFAR10 TensorFlow training code to FL with NVFLARE Client API -Given a TensorFlow CIFAR10 example: [./src/cifar10_tf_original.py](./src/cifar10_tf_original.py). +Given a TensorFlow CIFAR-10 example: [./src/cifar10_tf_original.py](./src/cifar10_tf_original.py). You can run it using @@ -51,7 +51,7 @@ You can run it using python3 ./src/cifar10_tf_original.py ``` -To transform the existing code to FL training code, we made the following changes: +To transform the existing code into FL training code, we made the following changes: 1. Import NVFlare Client API: ```import nvflare.client as flare``` 2. Initialize NVFlare Client API: ```flare.init()``` diff --git a/examples/hello-world/step-by-step/cifar10/README.md b/examples/hello-world/step-by-step/cifar10/README.md index a73a8aa869..6bc307d756 100644 --- a/examples/hello-world/step-by-step/cifar10/README.md +++ b/examples/hello-world/step-by-step/cifar10/README.md @@ -1,7 +1,7 @@ # Training an image classifier with CIFAR10 dataset -We will use the original [Training a Classifer](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html) +We will use the original [Training a Classifier](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html) example from pytorch as the code base. The CIFAR10 dataset has the following classes: ‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’. @@ -16,7 +16,7 @@ In the following examples, we will show various Federated Learning workflows and * [sag_deploy_map](sag_deploy_map) - FedAvg with site-specific configs. * [sag_executor](sag_executor) - FedAvg with Executor API * [sag_mlflow](sag_mlflow) - FedAvg with MLflow experiment tracking logs. -* [sag_he](sag_he) - FedAvg with homomorphic encyption using POC -he mode. +* [sag_he](sag_he) - FedAvg with homomorphic encryption using POC -he mode. * [cse](cse) - Cross-site evaluation with server-side controller. * [cyclic](cyclic) - Cyclic Weight Transfer (cyclic) workflow with server-side controller. * [cyclic_ccwf](cyclic_ccwf) - Client-controlled cyclic workflow with client-side controller. diff --git a/examples/hello-world/step-by-step/cifar10/code/readme.md b/examples/hello-world/step-by-step/cifar10/code/readme.md index fdc8b90544..5f5e4a0aaf 100644 --- a/examples/hello-world/step-by-step/cifar10/code/readme.md +++ b/examples/hello-world/step-by-step/cifar10/code/readme.md @@ -47,7 +47,7 @@ nvflare job create -j /tmp/nvflare/cifar10_sag -w sag_pt -s fl/train.py * Run in POC mode or Production -Before you can the POC or production mode, you must make sure the server or clients are already started. +Before you can use POC or production mode, you must ensure that the server or clients are already started. You can refer the POC setup tutorial to see how to setup the POC, and documentation to refer to the production setup. diff --git a/examples/hello-world/step-by-step/cifar10/cse/cse.ipynb b/examples/hello-world/step-by-step/cifar10/cse/cse.ipynb index 3c4866fcf0..3fd8fdaeb6 100644 --- a/examples/hello-world/step-by-step/cifar10/cse/cse.ipynb +++ b/examples/hello-world/step-by-step/cifar10/cse/cse.ipynb @@ -24,7 +24,7 @@ "## Converting DL training code to FL training code with Multi-Task Support\n", "\n", "\n", - "We will be using the [Client API FL code](../code/fl/train.py) trainer converted from the original [Training a Classifer](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html) example.\n", + "We use the [Client API FL code](../code/fl/train.py) trainer, which is converted from the original [Training a Classifier](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html) example.\n", "\n", "Key changes when writing a FL code to support multiple tasks:\n", "- When using the default `launch_once` parameter of `SubprocessLauncher`, we encapsulate our code in `while flare.is_running():` loop so we can call `flare.receive()` and perform various tasks. This is useful when launching everytime would be inefficient, such as when having to perform data setup every time.\n", diff --git a/examples/hello-world/step-by-step/cifar10/cyclic/cyclic.ipynb b/examples/hello-world/step-by-step/cifar10/cyclic/cyclic.ipynb index 069880dee6..cb3ffb5834 100644 --- a/examples/hello-world/step-by-step/cifar10/cyclic/cyclic.ipynb +++ b/examples/hello-world/step-by-step/cifar10/cyclic/cyclic.ipynb @@ -12,9 +12,9 @@ "## Cyclic Workflow\n", "\n", "\n", - "Cyclic Weight Transfer (CWF) uses the server-controlled `CyclicController` to pass the model weights from one site to the next in a cyclic fashion. \n", + "Cyclic Weight Transfer (CWT) uses the server-controlled `CyclicController` to pass the model weights from one site to the next in a cyclic fashion. \n", "\n", - "In the Cyclic workflow, sites train one at a time, while sending the model to the next site. The order of the sites can be specifed as fixed, random, or random (without same in a row). A round is finished once all sites in the defined order have completed training once, and the final result is returned to the server. This differs from Scatter-and-Gather, wherein all sites train simultaneously and aggregrate their results together at the end of a round.\n", + "In the Cyclic workflow, sites train one at a time, while sending the model to the next site. The order of the sites can be specified as fixed, random, or random (without same in a row). A round is finished once all sites in the defined order have completed training once, and the final result is returned to the server. This differs from Scatter-and-Gather, wherein all sites train simultaneously and aggregate their results together at the end of a round.\n", "\n", "## Converting DL training code to FL training code\n", "\n", diff --git a/examples/hello-world/step-by-step/cifar10/cyclic_ccwf/cyclic_ccwf.ipynb b/examples/hello-world/step-by-step/cifar10/cyclic_ccwf/cyclic_ccwf.ipynb index d1a55cce8f..f6905b4906 100644 --- a/examples/hello-world/step-by-step/cifar10/cyclic_ccwf/cyclic_ccwf.ipynb +++ b/examples/hello-world/step-by-step/cifar10/cyclic_ccwf/cyclic_ccwf.ipynb @@ -22,7 +22,7 @@ "\n", "## Converting DL training code to FL training code\n", "\n", - "We will be using the [Client API FL code](../code/fl/train.py) trainer converted from the original [Training a Classifer](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html) example.\n", + "We will be using the [Client API FL code](../code/fl/train.py) trainer, which is converted from the original [Training a Classifier](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html) example.\n", "\n", "See [Converting to FL code using Client API](../sag/sag.ipynb#code) for more details." ] diff --git a/examples/hello-world/step-by-step/cifar10/data/readme.md b/examples/hello-world/step-by-step/cifar10/data/readme.md index b04a0cc9ea..19b96c429b 100644 --- a/examples/hello-world/step-by-step/cifar10/data/readme.md +++ b/examples/hello-world/step-by-step/cifar10/data/readme.md @@ -1,10 +1,10 @@ # Problem and Data -For this tutorial, we will use the CIFAR10 dataset. +For this tutorial, we will use the CIFAR-10 dataset. It has the classes: ‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’. -The images in CIFAR-10 are of size 3x32x32, i.e. 3-channel color images of 32x32 pixels in size. +The images in CIFAR-10 are of size 3x32x32, i.e., 3-channel color images with a resolution of 32x32 pixels. -The pytorch tutorial will training an image classifier. The examples shows the following steps in order: +The PyTorch tutorial will train an image classifier. The example shows the following steps in order: * Load and normalize the CIFAR10 training and test datasets using torchvision * Define a Convolutional Neural Network @@ -24,7 +24,7 @@ we usually need two steps to prepare data. To avoid each job having to download and split the data, we add a step to prepare the data for all the cifar10 jobs. -The CIFAR10 data will be downloaded to the common location, so rest of the job won't download it. +The CIFAR-10 data will be downloaded to a common location, so it will not need to be repeatedly downloaded. to download data ``` diff --git a/examples/hello-world/step-by-step/cifar10/sag/sag.ipynb b/examples/hello-world/step-by-step/cifar10/sag/sag.ipynb index df141967ce..9fb6e0ce81 100644 --- a/examples/hello-world/step-by-step/cifar10/sag/sag.ipynb +++ b/examples/hello-world/step-by-step/cifar10/sag/sag.ipynb @@ -21,7 +21,7 @@ "\n", "## Scatter and Gather (SAG)\n", "\n", - "FLARE's Scatter and Gather workflow is similar to the Message Passing Interface (MPI)'s MPI Broadcast + MPI Gather. [MPI](https://en.wikipedia.org/wiki/Message_Passing_Interface) is a standardized and portable message-passing standard designed to function on parallel computing architectures. MPI consists of some [collective communication routines](https://mpitutorial.com/tutorials/mpi-broadcast-and-collective-communication/), such as MPI Broadcast, MPI Scatter, and MPI Gather.\n", + "NVFLARE's Scatter and Gather workflow is similar to the Message Passing Interface (MPI)'s MPI Broadcast and MPI Gather. [MPI](https://en.wikipedia.org/wiki/Message_Passing_Interface) is a standardized and portable message-passing standard designed to function on parallel computing architectures. MPI consists of some [collective communication routines](https://mpitutorial.com/tutorials/mpi-broadcast-and-collective-communication/), such as MPI Broadcast, MPI Scatter, and MPI Gather.\n", "\n", "\"scatter\"\"gather\"\n", "\n", @@ -35,7 +35,7 @@ "\n", "\"FedAvg\" \"Scatter\n", "\n", - "The aggregation of FedAvg is done on the server side, its weighted on the number of training steps on each client\n", + "The aggregation of FedAvg is done on the server side; it is weighted by the number of training steps on each client.\n", " \n", "## Convert training code to federated learning training code\n", "\n", diff --git a/examples/hello-world/step-by-step/cifar10/sag_executor/sag_executor.ipynb b/examples/hello-world/step-by-step/cifar10/sag_executor/sag_executor.ipynb index 50ab525ade..4dc842a5f8 100644 --- a/examples/hello-world/step-by-step/cifar10/sag_executor/sag_executor.ipynb +++ b/examples/hello-world/step-by-step/cifar10/sag_executor/sag_executor.ipynb @@ -7,7 +7,7 @@ "source": [ "# FedAvg using Executor\n", "\n", - "In this example, we will demonstrate the FegAvg algorithm using the CIFAR10 dataset using an Executor. \n", + "In this example, we will demonstrate the FedAvg algorithm using the CIFAR10 dataset using an Executor. \n", "\n", "While the previous example [FedAvg with SAG workflow](../sag/sag.ipynb#title) utilized the Client API, here we will demonstrate how to convert the original training code into a Executor trainer, showcase its capabilities, and recommend the best use cases.\n", "\n", @@ -29,13 +29,13 @@ "\n", "### When to use Executors\n", "\n", - "The Executor is best used when implementing tasks and logic that do not fit the standard learning methods of higher level APIs such as the ModelLearner or Client API. In this example, in addition to the `train`, `validate`, and `submit_model` tasks, we also introduce the `get_weights` task. This pretrain task allows us to perform the `InitializeGlobalWeights` workflow, which would otherwise not be supported.\n", + "The Executor is best used when implementing tasks and logic that do not fit the standard learning methods of higher-level APIs, such as the ModelLearner or Client API. In this example, in addition to the `train`, `validate`, and `submit_model` tasks, we also introduce the `get_weights` task. This pretrain task allows us to perform the `InitializeGlobalWeights` workflow, which would otherwise not be supported.\n", "\n", "## Converting DL training code to FL Executor training code\n", "We will use the original [Training a Classifer](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html) example\n", "in PyTorch as our base [DL code](../code/dl/train.py).\n", "\n", - "In order to transform the existing PyTorch classifier training code into Federated Classifer training code, we must restructure our code to implement tasks to execute, as well as handle the data exchange formats. The converted code can be found at [FL Executor code](../code/fl/executor.py).\n", + "In order to transform the existing PyTorch classifier training code into Federated Classifier training code, we must restructure our code to implement tasks to execute, as well as handle the data exchange formats. The converted code can be found at [FL Executor code](../code/fl/executor.py).\n", "\n", "Key changes:\n", "- Encapsulate the original DL train and validate code inside `local_train()` and `local_validate()` and the dataset and PyTorch training utilities in `initialize()`\n", diff --git a/examples/hello-world/step-by-step/cifar10/sag_he/sag_he.ipynb b/examples/hello-world/step-by-step/cifar10/sag_he/sag_he.ipynb index c80f7d37af..7fd5aaec4e 100644 --- a/examples/hello-world/step-by-step/cifar10/sag_he/sag_he.ipynb +++ b/examples/hello-world/step-by-step/cifar10/sag_he/sag_he.ipynb @@ -8,7 +8,7 @@ "# SAG workflow with Homomorphic Encryption\n", "\n", "In this example, we will demonstrate how to use homomorphic encryption (HE)\n", - "by building on the previous [FedAvg with SAG workflow](../sag/sag.ipynb#title) example using the CIFAR10 datset.\n", + "by building on the previous [FedAvg with SAG workflow](../sag/sag.ipynb#title) example using the CIFAR10 dataset.\n", "\n", "## Homomorphic Encryption\n", "\n", @@ -130,7 +130,7 @@ "id": "8ba25168", "metadata": {}, "source": [ - "To support HE, we need the provision process to generate and write the TenSEAL homomorphic encryption contexts for the server and client.\n", + "To support HE, we need the provisioning process to generate and write the TenSEAL homomorphic encryption contexts for the server and client.\n", "Currently the simulator does not support HE, however we can use the POC command `-he` option to prepare the HE supported POC workspace with the `HEBuilder`:" ] }, diff --git a/examples/hello-world/step-by-step/cifar10/sag_mlflow/sag_mlflow.ipynb b/examples/hello-world/step-by-step/cifar10/sag_mlflow/sag_mlflow.ipynb index 9d9b7c10e1..ec6a5587d9 100644 --- a/examples/hello-world/step-by-step/cifar10/sag_mlflow/sag_mlflow.ipynb +++ b/examples/hello-world/step-by-step/cifar10/sag_mlflow/sag_mlflow.ipynb @@ -7,7 +7,7 @@ "source": [ "# FedAvg with MLflow tracking\n", "\n", - "In this example, we will demonstrate the FegAvg using the CIFAR10 dataset with MLflow tracking. \n", + "In this example, we will demonstrate the FedAvg using the CIFAR10 dataset with MLflow tracking. \n", "\n", "We will show how to add tracking capabilities to the previous example [FedAvg with SAG workflow](../sag/sag.ipynb#title), specifically we will show how to add MLflow in this example.\n", "\n", diff --git a/examples/hello-world/step-by-step/cifar10/stats/image_stats.ipynb b/examples/hello-world/step-by-step/cifar10/stats/image_stats.ipynb index 48e3dd7608..e857aa5e8d 100644 --- a/examples/hello-world/step-by-step/cifar10/stats/image_stats.ipynb +++ b/examples/hello-world/step-by-step/cifar10/stats/image_stats.ipynb @@ -7,7 +7,7 @@ "source": [ "# Calculate CIFAR10 Image Histogram\n", "\n", - "Before one training the image classifer, the pytorch example follows the following steps: \n", + "Before training the image classifier, the PyTorch example follows the following steps: \n", "\n", "* **Prepare Data**\n", " * Load and normalize the CIFAR10 training and test datasets using torchvision\n", @@ -18,7 +18,7 @@ " * Train the network on the training data\n", " * Test the network on the test data\n", " \n", - "We will add another step to calculate the data historgram and compare the local (site) histogram and global historgrams. So the above steps become\n", + "We will add another step to calculate the data histogram and compare the local (site) histogram and global histograms. So the above steps become\n", "\n", "\n", "* **Prepare Data**\n", @@ -84,7 +84,7 @@ "source": [ "## Prepare Data\n", "\n", - "Generally, when you have to deal with image, text, audio or video data, you can use standard python packages that load data into a numpy array. Then you can convert this array into a torch.*Tensor. Torch provied a package called torchvision, that has data loaders for common datasets such as ImageNet, CIFAR10, MNIST, etc. and data transformers for images, viz., torchvision.datasets and torch.utils.data.DataLoader.\n", + "Generally, when you have to deal with image, text, audio or video data, you can use standard python packages that load data into a numpy array. Then you can convert this array into a torch.*Tensor. Torch provided a package called torchvision, that has data loaders for common datasets such as ImageNet, CIFAR10, MNIST, etc. and data transformers for images, viz., torchvision.datasets and torch.utils.data.DataLoader.\n", "\n", "For CIFAR10 dataset, it has the classes: ‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’. \n", "The images in CIFAR-10 are of size 3x32x32, i.e. 3-channel color images of 32x32 pixels in size.\n", @@ -158,7 +158,7 @@ "id": "ef5a2400-6c8c-4a19-af1b-3cc8ad526a77", "metadata": {}, "source": [ - "Lets explore the data, first" + "Let's explore the data first" ] }, { @@ -239,7 +239,7 @@ "source": [ "## Create Local Image Intensity Histogram Calculator\n", "\n", - "We ignored all other statistics calculations (mean, stddev etc. as they don't apply). all methods have default implementations.\n" + "We ignored all other statistical calculations (mean, standard deviation, etc.) as they do not apply. all methods have default implementations.\n" ] }, { @@ -408,7 +408,7 @@ "id": "ec38287e-c425-4c31-8760-f10e3a1edc8b", "metadata": {}, "source": [ - "The code is working. Let's setup NVFLARE job in federated computing. " + "The code is working. Let's set up an NVFLARE job in federated computing." ] }, { @@ -482,7 +482,7 @@ "\n", "**Examine the result**\n", "\n", - "Notice the result is written at \n", + "Note that the result is written at \n", "\n", "**/tmp/nvflare/image_stats/server/simulate_job/statistics/image_statistics.json**" ] @@ -574,7 +574,7 @@ "tags": [] }, "source": [ - "The global and local histograms differences are none as we are using the same dataset for all clients. \n", + "The global and local histograms show no differences because we are using the same dataset for all clients.\n", "\n", "## We are done !\n", "Congratulations! you have just completed the federated stats image histogram calulation. \n", diff --git a/examples/hello-world/step-by-step/cifar10/swarm/swarm.ipynb b/examples/hello-world/step-by-step/cifar10/swarm/swarm.ipynb index b5219b1ed4..c6147874e1 100644 --- a/examples/hello-world/step-by-step/cifar10/swarm/swarm.ipynb +++ b/examples/hello-world/step-by-step/cifar10/swarm/swarm.ipynb @@ -13,10 +13,10 @@ "\n", "\"swarm\n", "\n", - "Swarm Learning is a decentralized Federated Averaging algorithm where the key difference is that the server is not trusted with any sensitive information. The server is now only responsible for job health and lifecycle management via the `SwarmServerController`, while the clients are now responsible for training and aggregration logic via the swarm client-controlled `SwarmClientController`.\n", + "Swarm Learning is a decentralized Federated Averaging algorithm where the key difference is that the server is not trusted with any sensitive information. The server is now only responsible for job health and lifecycle management via the `SwarmServerController`, while the clients are now responsible for training and aggregation logic via the swarm client-controlled `SwarmClientController`.\n", "\n", "- `SwarmServerController`: manages swarm job lifecycle and configurations such as `aggr_clients` and `train_clients`\n", - "- `SwarmClientController`: sends `learn_task` to all training clients to invoke their executors for `train` task each round, and sends results to designated `aggr_client` for aggregration.\n", + "- `SwarmClientController`: sends `learn_task` to all training clients to invoke their executors for `train` task each round, and sends results to designated `aggr_client` for aggregation.\n", "\n", "Required tasks: `train`\n", "\n", diff --git a/examples/tutorials/flare_simulator.ipynb b/examples/tutorials/flare_simulator.ipynb index 6c3b56f754..10d5630e4b 100644 --- a/examples/tutorials/flare_simulator.ipynb +++ b/examples/tutorials/flare_simulator.ipynb @@ -10,7 +10,7 @@ "The [FL Simulator](https://nvflare.readthedocs.io/en/latest/user_guide/nvflare_cli/fl_simulator.html) runs a local simulation of a running NVFLARE FL deployment. This allows researchers to test and debug an application without provisioning a real, distributed FL project. The FL Simulator runs a server and multiple clients in the same local process, with communication that mimics a real deployment. This allows researchers to more quickly build out new components and jobs that can be directly used in a production deployment.\n", "\n", "### Setup\n", - "The NVFlare [Getting Started Guide](https://nvflare.readthedocs.io/en/main/getting_started.html) provides instructions for setting up FLARE on a local system or in a Docker image. We've also cloned the NVFlare GitHub in our top-level working directory." + "The NVFlare [Getting Started Guide](https://nvflare.readthedocs.io/en/main/getting_started.html) provides instructions for setting up NVFlare on a local system or in a Docker image. We've also cloned the NVFlare GitHub in our top-level working directory." ] }, { diff --git a/examples/tutorials/job_cli.ipynb b/examples/tutorials/job_cli.ipynb index b2395368d0..84ac4226f2 100644 --- a/examples/tutorials/job_cli.ipynb +++ b/examples/tutorials/job_cli.ipynb @@ -42,7 +42,7 @@ "\n", "## Step-by-step walk-through: from creating a job to running a job\n", "\n", - "Taking the converted CIFAR10 with pytorch training code for a 2-client federated learning [program](https://github.com/NVIDIA/NVFlare/tree/main/examples/hello-world/step-by-step/cifar10/code), we can use the standard Scatter and Gatter (SAG) workflow pattern to demonstrate the features of the Job CLI. \n", + "Taking the converted CIFAR10 with pytorch training code for a 2-client federated learning [program](https://github.com/NVIDIA/NVFlare/tree/main/examples/hello-world/step-by-step/cifar10/code), we can use the standard Scatter and Gather (SAG) workflow pattern to demonstrate the features of the Job CLI. \n", "\n", "Now, we would like to see what are the available pre-configured job templates the user can use and modify. \n", "\n", @@ -1287,14 +1287,14 @@ "source": [ "### Modify custom configuration\n", "\n", - "Up to so far, we have been discussion how to create and modify NVFLARE specific configurations such as meta.conf, config_fed_client.conf and config_fed_server.conf. How about custom configuration that needed by the training code. The custom configuration files are located in the custom directory of the app. \n", + "Up to this point, we have been discussing how to create and modify NVFLARE-specific configurations, such as meta.conf, config_fed_client.conf, and config_fed_server.conf. What about custom configurations needed by the training code? The custom configuration files are located in the custom directory of the app. \n", "\n", "such as \n", " app1/custom/my_config.yaml\n", " app2/custom/my_config.yaml\n", " app_server/custom/my_config.yaml\n", " \n", - "in such cases, the config file name is arbitary, the file format could be any one of the JSON, PYHOCON or OmegaConf. We can still modify these files. Before jump into the specifics. Let us see what format the CLI offers to input files. We can specify the config file in one of the following ways: \n", + "In such cases, the configuration file name is arbitrary, and the file format can be any one of JSON, PYHOCON, or OmegaConf. We can still modify these files. Before jumping into the specifics, let's see what format the CLI offers for input files. We can specify the config file in one of the following ways: \n", "\n", "```\n", "\n", @@ -1378,7 +1378,7 @@ "id": "bc7195b6", "metadata": {}, "source": [ - "Noticed that the weight_decay for the app_server megatron_gpt_peft_tuning_config.yaml value is updated to 0.02. We can also look at the file saved. " + "Notice that the weight_decay value in the app_server's megatron_gpt_peft_tuning_config.yaml is updated to 0.02. We can also look at the file saved. " ] }, { @@ -1400,7 +1400,7 @@ "tags": [] }, "source": [ - "This works! Alternatively, you can spell out the variable path, this can be used to update the exact variable in case duplicated variables ( such as two weight_decay under different paths). Let's do it again " + "This works! Alternatively, you can specify the variable path, which can be used to update the exact variable in cases where there are duplicate variables (e.g., two weight_decay variables under different paths). Let's do it again " ] }, { diff --git a/examples/tutorials/setup_poc.ipynb b/examples/tutorials/setup_poc.ipynb index 0a64706a5d..0454f1542f 100644 --- a/examples/tutorials/setup_poc.ipynb +++ b/examples/tutorials/setup_poc.ipynb @@ -12,7 +12,7 @@ "POC mode allows users to test the features of a full FLARE deployment on a single machine, without the overhead of a true distributed deployment.\n", "Compared to the FL Simulator, where the job run is automated on a single system, POC mode allows you to establish and connect distinct server and client \"systems\" which can then be orchestrated using the FLARE Console. Users can also experiment with various deployment options (project.yml), making POC mode a useful tool in preparation for a distributed deployment.\n", "\n", - ">It is ideal to start your NVFLARE system in POC mode from a **terminal**, not from a notebook. The terminal's virual env. must match the kernel's virtual env. In our case, we are using 'nvflare_example'.\n", + ">It is ideal to start your NVFLARE system in POC mode from a **terminal**, not from a notebook. The terminal's virtual environment must match the kernel's virtual environment. In our case, we are using 'nvflare_example'.\n", "\n", "To get started, let's look at the NVFlare CLI usage for the ``poc`` subcommand:" ] diff --git a/nvflare/app_common/psi/README.md b/nvflare/app_common/psi/README.md index f722aa1c20..6e31db8e76 100644 --- a/nvflare/app_common/psi/README.md +++ b/nvflare/app_common/psi/README.md @@ -7,7 +7,7 @@ for two-party. We took the two-party direct communication PSI protocol and extended to Federated Computing setting where all exchanges are funneled via a central FL server. We supported multi-party PSI via pair-wise approach. -Here is the detailed Sequence diagrams for DH PSI. +Here is the detailed sequence diagram for DH PSI. ```mermaid @@ -71,7 +71,7 @@ sequenceDiagram * Note each site/client is both a PSI Client and PSI Server. * Initially, the items() is the original data items -* Once the client has get the intersection from the previous Clients' intersect operation, the items will be +* Once the client has gotten the intersection from the previous clients' intersect operation, the items will be * the intersection instead of original items. ```mermaid diff --git a/tests/README.md b/tests/README.md index c8c232d6a3..eeae2c86f1 100644 --- a/tests/README.md +++ b/tests/README.md @@ -1,9 +1,9 @@ # NVIDIA Flare Test -This file introduces how the tests in NVIDIA FLARE is organized. +This file introduces how the tests in NVIDIA FLARE are organized. -We divide tests into unit test and integration test. +We divide tests into unit tests and integration tests. ```commandline tests: @@ -17,7 +17,7 @@ tests: The structure of unit test is organized as parallel directories of the production code. -Each directory in `test/unit_test` is mapping to their counterparts in `nvflare`. +Each directory in `test/unit_test` maps to its counterpart in `nvflare`. For example, we have `test/unit_test/app_common/job_schedulers/job_scheduler_test.py` that tests `nvflare/app_common/job_schedulers/job_scheduler.py`. diff --git a/tests/integration_test/README.md b/tests/integration_test/README.md index 27695717b7..41f42fcebd 100644 --- a/tests/integration_test/README.md +++ b/tests/integration_test/README.md @@ -21,14 +21,14 @@ You can also choose to run just one set of tests using "-m" option. **NOTE** There are 7 options: numpy, tensorflow, pytorch, ha, auth, overseer, preflight. -The overseer and preflight tests have their own entry file. +The overseer and preflight tests have their own entry files. All other options share the same test entry file `tests/integration_test/system_test.py` --- ## Test structure -The integration tests has 3 entry files: +The integration tests have 3 entry files: - The integration tests entry file is `tests/integration_test/system_test.py`. It will read all test configurations from `./test_configs.yml`. @@ -39,7 +39,7 @@ The integration tests has 3 entry files: ### Test configuration -Each test configuration yaml define a whole FL system. +Each test configuration YAML defines a whole FL system. The `system_test.py` will read and parse the config to determine which `SiteLauncher` to use to set up the whole system. @@ -85,7 +85,7 @@ Each test case has the following attributes: | teardown (optional) | What shell command to run after this test case. | -The most important part is "event_sequence", it will be trigger one by one. +The most important part is the "event_sequence", which is triggered one by one. After all events in event_sequence is triggered, then this test case is done.