From ecf7cb97207d3432aeada3cb36c96067c7ed15ee Mon Sep 17 00:00:00 2001
From: stnolting <stnolting@gmail.com>
Date: Mon, 3 Feb 2025 22:30:25 +0100
Subject: [PATCH] [docs] update section "memory coherence"

---
 docs/datasheet/soc.adoc | 54 +++++++++++++++++++++++------------------
 1 file changed, 30 insertions(+), 24 deletions(-)

diff --git a/docs/datasheet/soc.adoc b/docs/datasheet/soc.adoc
index 30808c046..dbe2a0e6a 100644
--- a/docs/datasheet/soc.adoc
+++ b/docs/datasheet/soc.adoc
@@ -1,5 +1,4 @@
-
-// ####################################################################################################################
+<<<
 :sectnums:
 == NEORV32 Processor (SoC)
 
@@ -595,7 +594,7 @@ content of the addresses memory cell) is sent back to the requesting CPU.
 .Direct Access
 [IMPORTANT]
 Atomic operations **always bypass** the CPU's <<_processor_internal_data_cache_dcache, data cache>>
-using direct/uncached accesses. Care must be taken to maintain data <<_cache_coherency>>.
+using direct/uncached accesses. Care must be taken to maintain data <<_memory_coherence>>.
 
 .Physical Memory Attributes
 [NOTE]
@@ -610,43 +609,50 @@ cannot be interrupted. Hence, they execute in an atomic way.
 
 
 :sectnums:
-==== Cache Coherency
+==== Memory Coherence
 
-In total the NEORV32 Processor provides up to three optional caches organized in two levels. Level-1
-caches are closer to the CPU while level-2 caches are closer to main memory (however, this highly depends
-on the the actual cache configurations).
+Depending on the configuration, the NEORV32 processor provides several _layer_ of memory consisting
+of caches, buffers and storage.
 
+* The CPU instruction prefetch buffer ("level-0")
 * The <<_processor_internal_data_cache_dcache>> (level-1)
 * The <<_processor_internal_instruction_cache_icache>> (level-1)
 * The cache of the <<_processor_external_bus_interface_xbus>> (level-2)
+* Internal and external memories
 
-As all caches operate transparently for the software, special attention must therefore be paid to coherence.
-Note that coherence and cache _synchronization_ is **not** performed by the hardware itself (there is no
-snooping implemented).
+All caches and buffers operate transparently for the software. Hence, special attention must therefore be
+paid to maintain coherence. Note that coherence and cache _synchronization_ is **not** automatically performed
+by the hardware itself as there is no snooping implemented.
 
-The NEORV32 uses two instructions for manual cache synchronization (both instructions are always available
-regardless of the actual CPU/ISA configuration):
+NEORV32 uses two instructions for manual memory synchronization which are always available
+regardless of the actual CPU/ISA configuration:
 
 * `fence` (<<_i_isa_extension>> / <<_e_isa_extension>>)
 * `fence.i` (<<_zifencei_isa_extension>>)
 
-By executing the "data" `fence` instruction the CPU's data cache is synchronized in four steps:
+By executing the "data" `fence` instruction the CPU's load/store operations are ordered
+and synchronized across the entire system:
 
 [start=1]
-. The CPU data cache is flushed: all local modifications are copied to the next higher memory level;
-this can be the XBUS cache or main memory.
-. The CPU data cache is cleared invalidating all local entries.
-. The synchronization request is sent to the next-higher memory level (for example to the XBUS cache
-so it can perform the same synchronization steps).
-. The CPU data cache is reloaded with up-to-date data from the next higher memory level.
+. The CPU data cache (if enabled) is flushed and invalidated: all local modifications are copied to
+the next higher memory level (for example the internal DMEM or the XBUS-cache).
+. The CPU data cache is cleared invalidating so the next load/store access will cause a cache miss
+that will fetch up-to-date data from the memory system.
+. The synchronization request is forwarded to the next-higher memory level. If the XBUS cache is implemented
+it will also be flushed and invalidated.
 
-By executing the "instruction" `fence.i` instruction the CPU's instruction cache is synchronized in three steps:
+By executing the "instruction" `fence.i` instruction the CPU's instruction-fetch cache is are ordered
+and synchronized across the entire system:
 
 [start=1]
-. The synchronization request is sent to the next-higher memory level (for example to the XBUS cache
-so it can perform the same synchronization steps).
-. The CPU instruction cache is cleared invalidating all local entries.
-. The CPU instruction cache is reloaded with up-to-date data from the next higher memory level.
+. Perform all the steps that are performed by the `fence` instruction.
+. The CPU instruction cache is cleared invalidating all local entries so the next instruction fetch access
+will cause a cache miss that will fetch up-to-date data from the memory system.
+
+.CPU Stall While Synchronizing
+[IMPORTANT]
+Executing any fence instruction will stall the CPU until all the requested ordering/synchronization
+steps are completed.
 
 
 <<<