style: fix pre-cimmit

PKU-Alignment · Sep 3, 2024 · 0a24402 · 0a24402
1 parent 6f2f365
commit 0a24402
Showing 1 changed file with 20 additions and 6 deletions.
diff --git a/docs/source/start/efficiency.rst b/docs/source/start/efficiency.rst
@@ -24,15 +24,29 @@ To demonstrate the effectiveness and resource utilization of OmniSafe as a SafeR
    +------------------------------------+----------------------------+------------------+------------------+------------------+
 
 
-In our comparative experiments, we rigorously ensure uniformity across all experimental settings. More specifically, PPOLag and CPO implement early stopping techniques, which vary the number of updates based on the KL divergence between the current and reference policies. This introduces randomness into the time measurements. To control for consistent variables, we fixed the number of ``update_iters`` at 1, ``steps_per_epoch`` at 20,000, and ``batch_size`` at 64, conducting the tests on the same machine with no other processes running. The specific device parameters are:
+In our comparative experiments, we rigorously ensure uniformity across all experimental settings. More specifically, PPOLag and CPO implement early stopping techniques, which vary the number of
+updates based on the KL divergence between the current and reference policies. This introduces
+randomness into the time measurements. To control for consistent variables, we fixed the number of
+``update_iters`` at 1, ``steps_per_epoch`` at 20,000, and ``batch_size`` at 64, conducting the tests on the same machine with no other processes running. The specific device parameters are:
 
 - **CPU**: AMD Ryzen Threadripper PRO 3975WX 32-Cores
 - **GPU**: NVIDIA GeForce RTX 3090, Driver Version: 535.154.05
 
-Under these consistent conditions, **OmniSafe achieved the lowest computational time consumption on the same baseline algorithms**, which we attribute to 3 factors: *vectorized environment parallelism* for accelerated data collections, `asynchronous agent parallelism <https://arxiv.org/abs/1602.01783>`_ for parallelized learning, and *GPU resource utilization* for immense network support. We will elaborate on how these features contribute to OmniSafe's computational efficiency.
-
-**Vectorized Environment Parallelism**: OmniSafe and SafePO support vectorized environment interfaces and buffers. In this experiment, we set the parallelism number of vectorized environments to 10, meaning that a single agent can simultaneously generate 10 actions based on 10 vectorized observations and perform batch updates through a vectorized buffer. This feature enhances the efficiency of agents' data sampling from environments.
-
-**Asynchronous Agent Parallelism**: OmniSafe supports *Asynchronous Advantage Actor-Critic (A3C)* parallelism based on the distributed framework ``torch.distributed``. In this experiment, we set the parallelism number of asynchronous agents to 2, meaning two agents were instantiated to sample and learn simultaneously, synchronizing their neural network parameters at the end of each epoch. This feature further enhances the efficiency of agent sampling and updating.
+Under these consistent conditions, **OmniSafe achieved the lowest computational time consumption on
+the same baseline algorithms**, which we attribute to 3 factors: *vectorized environment
+parallelism* for accelerated data collections, `asynchronous agent parallelism <https://arxiv.org/abs/1602.01783>`_ for parallelized learning, and *GPU resource utilization* for immense network
+support. We will elaborate on how these features contribute to OmniSafe's computational efficiency.
+
+**Vectorized Environment Parallelism**: OmniSafe and SafePO support vectorized environment
+interfaces and buffers. In this experiment, we set the parallelism number of vectorized
+environments to 10, meaning that a single agent can simultaneously generate 10 actions based on 10
+vectorized observations and perform batch updates through a vectorized buffer. This feature
+enhances the efficiency of agents' data sampling from environments.
+
+**Asynchronous Agent Parallelism**: OmniSafe supports *Asynchronous Advantage Actor-Critic (A3C)*
+parallelism based on the distributed framework ``torch.distributed``. In this experiment, we set
+the parallelism number of asynchronous agents to 2, meaning two agents were instantiated to sample
+and learn simultaneously, synchronizing their neural network parameters at the end of each epoch.
+This feature further enhances the efficiency of agent sampling and updating.
 
 **GPU Resource Utilization**: Since only OmniSafe and SafePO utilize GPU computing resources, in this experiment, we used the NVIDIA GeForce RTX 3090 as the computing device. As shown in :ref:`Table 1 <appendix_f>`., when the hidden layer parameters increased from 64 x 64 to 1024 x 1024, the runtime of RL-Safety-Algorithms and Safety-starter-agents significantly increased, whereas the runtime increase for OmniSafe and SafePO was relatively smaller. This trend is particularly notable with the CPO algorithm, which requires computing a second-order Hessian matrix during updates. If computed using a CPU, the computational overhead would increase with the size of the neural network parameters. However, OmniSafe and SafePO, which support GPU acceleration, are almost unaffected.