You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ibex has a feed-through path that connects data_rvalid_i and data_err_i to data_req_o. This can cause timing issues in the memory system Ibex is connected to and may be an outright violation of the specification for various buses (e.g. AXI) which complicates the creation of fully compliant bus adapters. You have to make a choice between either violating the bus specification or introducing latency to break the feed-through path.
The reason this path exists is to provide precise exception handling for bus errors on loads and stores without compromising performance. To give precise exception behaviour on loads and stores the CPU has to wait for a response before it can send out the request for a load or store that follows. Imagine there are two back to back loads, the first to address X and the second to address Y. Were request for address Y be sent before the response for address X has been seen and if the response to X is a bus error, any exception we take cannot be precise. Its mepc would point to the load instruction for the address X but the following load to address Y would already have at least partially executed (in this scenario you could drop the load data for Y when it returns so no RF changes would happen but the load itself can have observable effects on the wider system).
Then to provide good performance we want to send out the request for a pending load as soon as we have seen the response, which results in a combinational path from data_rvalid_i to data_req_o. Without this path we could only send the new request the cycle following the response resulting in a stall cycle every time we have back to back memory accesses.
Within OpenTitan Earl Grey this feed-through has been implementable though it has appeared in various timing paths that have been fixed in other ways.
This issue lays out some routes to making this feed-through optional. I think it's important it remains an option as it is useful and, as demonstrated in Earl Grey, feasible for certain implementations.
The proposal is for a new bus adapter where you can make decisions about how bus errors work and whether or not to have the feed-through. This will have the side effect of allowing a family of bus adapters providing Ibex with multiple bus interface options which it currently lacks.
Feed-through details
The feed-through itself is created by two separate pieces of logic within Ibex, the first is the outstanding_memory_access signal within ibex_id_stage.sv:
You can see the lsu_resp_valid_i (which is turn derives from the data_rvalid_i top-level bus signal and is calculated in the ibex_load_sture_unit module) is directly used here. outstanding_memory_access when asserted prevents an instruction from executing and stops any request going out on to the data memory port. So when an outstanding response appears outstanding_memory_access drops and allows a pending load or store to send its request, all in the same cycle.
The load_err_i and store_err_i signals are derived from the data_err_i top-level bus signal and is calculated in the ibex_load_store_unit module. The wb_exception_o signal sets the instr_kill signal in ibex_id_stage which does various things but in particular stops data_req_i from being asserted if there's a pending load or store in the ID stage.
(Side note: the eagle eyed may notice load_err_i and store_err_i aren't factored into id_exception_o in the two pipeline stage configuration, which has no writeback exceptions. The bus errors do still trigger exceptions here however they don't raise id_exception_o. The id_exception_o signal is purely used to calculate the instr_kill signal and in the two stage config there's nothing that needs killing in the bus error case. The load or store that caused the bus error has already executed so there's nothing to kill, it's just sitting the ID stage waiting to retire, perhaps id_exception_o should be renamed as it's not quite the general purpose signal the name indicates).
We can remove the feed-through entirely with some modest changes, first alter the outstanding_memory_access in ibex_id_stage so it doesn't drop the cycle we get a valid response:
- // Is a memory access ongoing that isn't finishing this cycle- assign outstanding_memory_access = (outstanding_load_wb_i | outstanding_store_wb_i) &- ~lsu_resp_valid_i;+ // Is a memory access ongoing+ assign outstanding_memory_access = (outstanding_load_wb_i | outstanding_store_wb_i);
Then alter the computation of load_err_o and store_err_o in ibex_load_store_unit. These signals also indicate PMP errors so we need to refactor them to avoid combining the response valid with the PMP error:
This second change shouldn't produce any functional difference.
Finally hard-wire the top-level data_err_i input to 0.
This gives you an Ibex core without the data_rvalid_i to data_req_o feed-through. However it also ignores bus errors and cannot do back to back memory instructions without a stall cycle in between (note I've checked this removes the feed-through in synthesis, but I haven't checked the core is indeed functional and continues to pass verification with these changes).
Feed-through removal design choices
Clearly one could add a further parameter to Ibex so you can choose if the feed-through is present. Though I think this is a bad idea as it adds another configuration option and one we should probably test in all of our existing supported configurations, giving us 2x the number of configurations to test. Instead we should consider ways to adjust the top-level memory IO so we can push the feed-through up to the top-level. You could provide a number of bus adapters, some of which have the feed-through some of which don't and each adapter can implement a different top-level bus interface. The main verification environment that connects directly to the ibex bus interface (rather than the interface exposed by any particular adapter) and verifies all behaviours of it.
This gives you a verified design where you can easily switch out different bus adapters to give full flexibility on the presence of the feed-through (and hence on the possible behaviours around bus errors) and full flexibility on the top-level memory interface without the need to separately verify each option. The bus adapters will be small pieces of logic that would be amenable to formal verification and would get verified separately.
One option is just to provide two versions of the data_rvalid_i and data_err_i signals call them data_rvalid_i, data_rvalid_ft_i. data_err_i and data_err_ft_i. In ibex_id_stage for outstanding_memory_access we'd use a version of lsu_resp_valid_i that used data_rvalid_ft_i and not data_rvalid_i. Then when we don't want the feed-through hard-wire data_rvalid_ft_i to 0. More modification would be needed to handle data_err_i and data_err_ft_i but the general idea is asserting data_err_i should prevent writeback of any load data into the register file and trigger an exception but not instantly kill any on-going instruction where data_err_ft_i would give that instant kill.
A downside here is explaining what data_rvalid_ft_i and data_err_ft_i are in the context of a bus protocol specification, without reference to ibex micro-architectural details, is fiddly/impossible. However you can come up with a fairly clean interface spec around them (data_rvalid_ft_i can only be asserted if data_rvalid_i is asserted, similarly with data_err_ft_i and data_err_ft_i cannot be asserted if data_rvalid_i isn't asserted).
With these signals if you don't want the feed-through simply hard-wire them to 0. For verification you can randomly choose whether they'll get asserted with data_rvalid_i and data_err_i as appropriate. The downside is with this setup you can't do back to back accesses without stalls unless you have the feed-through (as outstanding_memory_access will stay asserted the cycle the response comes back so you cannot send out the next request until the following cycle).
One way around this for the non feed-through case is to hardwire data_rvalid_ft_i to 1 rather than 0. As we know what it's connected to internally in Ibex we know this is a safe thing to do and will have the effect of forcing outstanding_memory_access to 0 always. We'll be able to start a new memory access before the response has returned from a previous one (or on the same cycle the response comes back if that's how the bus timing works bout) and we don't have the feed-through.
The main downside here is this is very micro-architecture specific. It works because we know exactly how this signal is used internally. Coming up with a specific specification and sensible assertions for the interface is far harder and there may be micro-architectural changes inside Ibex that mean we need to redo the top-level interface to keep the same possibilities.
Another solution is to move the responsibility for blocking new requests whilst there is still an outstanding one into the bus adapter itself. The adapter could block the request from going out whilst there is still an ongoing memory request. Optionally the adapter could allow a new request the same cycle a request comes back or it could always let them through (giving the back to back performance without feed-through). In this solution you'd drop the data_rvalid_ft_i signal, though note you still need something like data_err_i and data_err_ft_i. As we don't want to write load data to the register file if a bus error has been observed so we need an error signal that suppresses the write (and trigger the exception) and still need the separate signal to avoid a feed-through where it's not wanted.
Having the blocking implemented in the adapter itself also allows the adapter to implement different behaviours with different PMAs (physical memory attributes). E.g. for device/non-idempotent memory you could disallow the request until an ongoing request is complete but to normal/idempotent memory you can allow overlapping requests. Clearly that can be implemented in the Ibex core itself as well but by doing it in the bus adapter you can get full flexibility on how this works and how it the feed-through is/isn't employed here without having to add new configuration options to Ibex itself.
There's an extra complexity to the bus adapter blocks request if on-going solution due to unaligned accesses. In the existing Ibex design where there is an unaligned access that gets split into two requests we allow the second request to be sent before we get a response to the first. If we blocked a new request in the adapter with existing requests on-going we cannot replicate this behaviour as the adapter doesn't know if any particular request is a second half of an unaligned access or an entirely new request from a new load or store instruction.
To work around this you can add another signal data_req_allow_overlap_o or similar that would bypass the blocking. This would get set on all second halves of unaligned requests.
Outline Proposal
Given the discussion above here's one proposal for refactoring the bus interface and microarchitecture so the feed-through can be dealt with in a bus wrapper without adding new configuration parameters to the Ibex core.
Remove logic from ibex_id_stage that prevents load/stores from progressing where there is an outstanding memory access (this is probably just remove the outstanding_memory_access signal.
Remove logic from ibex_id_stage (and maybe elsewhere) that prevents a load or store from sending a request if a bus error is seen
Add a new top-level signal data_req_allow_overlap_o that is asserted on the second half of an unaligned load/store
It is then a responsibility of the bus interface to decide when to forward a new request, it could always forward them (in which case we don't get precise exception behaviour on bus errors), it could only forward them if there's no outstanding requests (providing precise exception behaviour but stall cycles between back to back accesses) or allow forwarding with a single outstanding request provided the response is available (providing the existing ibex behaviour with the feed-through). It could use the address of the request to determine an appropriate action allowing precise exception behaviour on non-idempotent.
The data_req_allow_overlap_o signal would bypass request blocking where this is an on-going request, though it would be optional for a bus adapter to do this (it could choose to ignore and then we can have the property the second half of an unaligned access isn't sent until we know the first has completed successfully).
To handle bus errors an extra data_err_ft_i as proposed above isn't added but rather the bus adapter handles suppressing a request in the scenario the feed-through is used (so it can send a new request the same cycle a response comes back) and you have a potential request the same cycle a bus error response is seen.
The data_err_i then prevents received load data from being written back and also triggers a suitable exception but doesn't suppress any new requests. There's no internal feed-through if it doesn't factor into the data_req_o signal. Though it does need careful handling where the bus adapter does allow a new request out the same cycle an error response is seen. The data_err_i will cause a flush of a load or store in the ID stage, however if that load or store has just has its requested granted we don't want to flush the instruction. The solution here is to only flush it if the request hasn't been granted.
Other points of note
With the proposal above it will be possible for a load or store in ID to have its request granted and not move into the writeback stage (because there's another load or stall stalled there awaiting a response). There's an assumption in the microarchitecture that once a request is granted the load or store will immediately move into writeback (i.e. if a response appears it's always related to the instruction in the writeback stage). This will need examining and the scenario where we have loads/stores in both ID and writeback both with outstanding requests needs to be correctly handled (with new cover points written and hit in DV).
You need to consider the most appropriate handling of bus errors in the non feed-through case where they do not give precise exceptions. As described above they will still appear as a normal exception. They instead could be routed as an interrupt (typically an NMI). In this case I think you need a new signal to go with data_err_i, e.g. data_write_suppress_i. This one would simply prevent writing results into the register file but not do anything else (flushing pipeline/triggering exceptions etc) and then the interrupt will trigger the jump into the appropriate handler.
It's a slight shame to replicate outstanding memory access tracking in the bus adapter, when you've also got this information available in the pipeline stages. An extra memory_access_ongoing_o output could be created to avoid replicating tracking logic. However it's only a single flop and a handful gates so perhaps better to avoid the interface complexity. Some assertion should check the bus adapter always agrees with the pipeline as to whether or not a request is ongoing.
Co-simulation checking will be made more complex with bus errors as depending on how the top-level interface behaves they may or may not be precise! One way to do with this would be some probing into ID to determine which instruction gets the bus error along with some backup error checking that ensures a bus error is actually taken. You could also randomly decide at the beginning of any test if you will always have precise exceptions or have a mix. With the former option you can retain the existing checking. This is important as it checks precise exceptions do indeed always work properly. With looser checking you might have an imprecise exception that passes the checks where it should have been precise.
The text was updated successfully, but these errors were encountered:
Thanks Greg, I've read through this text and I think it makes sense to me. I like the idea of making the decision of feed-through dependent on the instantiation of Ibex instead of making the decision statically.
Overview
Ibex has a feed-through path that connects
data_rvalid_i
anddata_err_i
todata_req_o
. This can cause timing issues in the memory system Ibex is connected to and may be an outright violation of the specification for various buses (e.g. AXI) which complicates the creation of fully compliant bus adapters. You have to make a choice between either violating the bus specification or introducing latency to break the feed-through path.The reason this path exists is to provide precise exception handling for bus errors on loads and stores without compromising performance. To give precise exception behaviour on loads and stores the CPU has to wait for a response before it can send out the request for a load or store that follows. Imagine there are two back to back loads, the first to address X and the second to address Y. Were request for address Y be sent before the response for address X has been seen and if the response to X is a bus error, any exception we take cannot be precise. Its
mepc
would point to the load instruction for the address X but the following load to address Y would already have at least partially executed (in this scenario you could drop the load data for Y when it returns so no RF changes would happen but the load itself can have observable effects on the wider system).Then to provide good performance we want to send out the request for a pending load as soon as we have seen the response, which results in a combinational path from
data_rvalid_i
todata_req_o
. Without this path we could only send the new request the cycle following the response resulting in a stall cycle every time we have back to back memory accesses.Within OpenTitan Earl Grey this feed-through has been implementable though it has appeared in various timing paths that have been fixed in other ways.
This issue lays out some routes to making this feed-through optional. I think it's important it remains an option as it is useful and, as demonstrated in Earl Grey, feasible for certain implementations.
The proposal is for a new bus adapter where you can make decisions about how bus errors work and whether or not to have the feed-through. This will have the side effect of allowing a family of bus adapters providing Ibex with multiple bus interface options which it currently lacks.
Feed-through details
The feed-through itself is created by two separate pieces of logic within Ibex, the first is the
outstanding_memory_access
signal withinibex_id_stage.sv
:ibex/rtl/ibex_id_stage.sv
Lines 912 to 914 in 60fbb6b
You can see the
lsu_resp_valid_i
(which is turn derives from thedata_rvalid_i
top-level bus signal and is calculated in theibex_load_sture_unit
module) is directly used here.outstanding_memory_access
when asserted prevents an instruction from executing and stops any request going out on to the data memory port. So when an outstanding response appearsoutstanding_memory_access
drops and allows a pending load or store to send its request, all in the same cycle.The second is in the exception logic.
ibex/rtl/ibex_controller.sv
Line 268 in 60fbb6b
The
load_err_i
andstore_err_i
signals are derived from thedata_err_i
top-level bus signal and is calculated in theibex_load_store_unit
module. Thewb_exception_o
signal sets theinstr_kill
signal inibex_id_stage
which does various things but in particular stopsdata_req_i
from being asserted if there's a pending load or store in the ID stage.(Side note: the eagle eyed may notice
load_err_i
andstore_err_i
aren't factored intoid_exception_o
in the two pipeline stage configuration, which has no writeback exceptions. The bus errors do still trigger exceptions here however they don't raiseid_exception_o
. Theid_exception_o
signal is purely used to calculate theinstr_kill
signal and in the two stage config there's nothing that needs killing in the bus error case. The load or store that caused the bus error has already executed so there's nothing to kill, it's just sitting the ID stage waiting to retire, perhapsid_exception_o
should be renamed as it's not quite the general purpose signal the name indicates).We can remove the feed-through entirely with some modest changes, first alter the
outstanding_memory_access
inibex_id_stage
so it doesn't drop the cycle we get a valid response:Then alter the computation of
load_err_o
andstore_err_o
inibex_load_store_unit
. These signals also indicate PMP errors so we need to refactor them to avoid combining the response valid with the PMP error:This second change shouldn't produce any functional difference.
Finally hard-wire the top-level
data_err_i
input to 0.This gives you an Ibex core without the
data_rvalid_i
todata_req_o
feed-through. However it also ignores bus errors and cannot do back to back memory instructions without a stall cycle in between (note I've checked this removes the feed-through in synthesis, but I haven't checked the core is indeed functional and continues to pass verification with these changes).Feed-through removal design choices
Clearly one could add a further parameter to Ibex so you can choose if the feed-through is present. Though I think this is a bad idea as it adds another configuration option and one we should probably test in all of our existing supported configurations, giving us 2x the number of configurations to test. Instead we should consider ways to adjust the top-level memory IO so we can push the feed-through up to the top-level. You could provide a number of bus adapters, some of which have the feed-through some of which don't and each adapter can implement a different top-level bus interface. The main verification environment that connects directly to the ibex bus interface (rather than the interface exposed by any particular adapter) and verifies all behaviours of it.
This gives you a verified design where you can easily switch out different bus adapters to give full flexibility on the presence of the feed-through (and hence on the possible behaviours around bus errors) and full flexibility on the top-level memory interface without the need to separately verify each option. The bus adapters will be small pieces of logic that would be amenable to formal verification and would get verified separately.
One option is just to provide two versions of the
data_rvalid_i
anddata_err_i
signals call themdata_rvalid_i
,data_rvalid_ft_i
.data_err_i
anddata_err_ft_i
. Inibex_id_stage
foroutstanding_memory_access
we'd use a version oflsu_resp_valid_i
that useddata_rvalid_ft_i
and notdata_rvalid_i
. Then when we don't want the feed-through hard-wiredata_rvalid_ft_i
to 0. More modification would be needed to handledata_err_i
anddata_err_ft_i
but the general idea is assertingdata_err_i
should prevent writeback of any load data into the register file and trigger an exception but not instantly kill any on-going instruction wheredata_err_ft_i
would give that instant kill.A downside here is explaining what
data_rvalid_ft_i
anddata_err_ft_i
are in the context of a bus protocol specification, without reference to ibex micro-architectural details, is fiddly/impossible. However you can come up with a fairly clean interface spec around them (data_rvalid_ft_i
can only be asserted ifdata_rvalid_i
is asserted, similarly withdata_err_ft_i
anddata_err_ft_i
cannot be asserted ifdata_rvalid_i
isn't asserted).With these signals if you don't want the feed-through simply hard-wire them to 0. For verification you can randomly choose whether they'll get asserted with
data_rvalid_i
anddata_err_i
as appropriate. The downside is with this setup you can't do back to back accesses without stalls unless you have the feed-through (asoutstanding_memory_access
will stay asserted the cycle the response comes back so you cannot send out the next request until the following cycle).One way around this for the non feed-through case is to hardwire
data_rvalid_ft_i
to 1 rather than 0. As we know what it's connected to internally in Ibex we know this is a safe thing to do and will have the effect of forcingoutstanding_memory_access
to 0 always. We'll be able to start a new memory access before the response has returned from a previous one (or on the same cycle the response comes back if that's how the bus timing works bout) and we don't have the feed-through.The main downside here is this is very micro-architecture specific. It works because we know exactly how this signal is used internally. Coming up with a specific specification and sensible assertions for the interface is far harder and there may be micro-architectural changes inside Ibex that mean we need to redo the top-level interface to keep the same possibilities.
Another solution is to move the responsibility for blocking new requests whilst there is still an outstanding one into the bus adapter itself. The adapter could block the request from going out whilst there is still an ongoing memory request. Optionally the adapter could allow a new request the same cycle a request comes back or it could always let them through (giving the back to back performance without feed-through). In this solution you'd drop the
data_rvalid_ft_i
signal, though note you still need something likedata_err_i
anddata_err_ft_i
. As we don't want to write load data to the register file if a bus error has been observed so we need an error signal that suppresses the write (and trigger the exception) and still need the separate signal to avoid a feed-through where it's not wanted.Having the blocking implemented in the adapter itself also allows the adapter to implement different behaviours with different PMAs (physical memory attributes). E.g. for device/non-idempotent memory you could disallow the request until an ongoing request is complete but to normal/idempotent memory you can allow overlapping requests. Clearly that can be implemented in the Ibex core itself as well but by doing it in the bus adapter you can get full flexibility on how this works and how it the feed-through is/isn't employed here without having to add new configuration options to Ibex itself.
There's an extra complexity to the bus adapter blocks request if on-going solution due to unaligned accesses. In the existing Ibex design where there is an unaligned access that gets split into two requests we allow the second request to be sent before we get a response to the first. If we blocked a new request in the adapter with existing requests on-going we cannot replicate this behaviour as the adapter doesn't know if any particular request is a second half of an unaligned access or an entirely new request from a new load or store instruction.
To work around this you can add another signal
data_req_allow_overlap_o
or similar that would bypass the blocking. This would get set on all second halves of unaligned requests.Outline Proposal
Given the discussion above here's one proposal for refactoring the bus interface and microarchitecture so the feed-through can be dealt with in a bus wrapper without adding new configuration parameters to the Ibex core.
ibex_id_stage
that prevents load/stores from progressing where there is an outstanding memory access (this is probably just remove theoutstanding_memory_access
signal.ibex_id_stage
(and maybe elsewhere) that prevents a load or store from sending a request if a bus error is seendata_req_allow_overlap_o
that is asserted on the second half of an unaligned load/storeIt is then a responsibility of the bus interface to decide when to forward a new request, it could always forward them (in which case we don't get precise exception behaviour on bus errors), it could only forward them if there's no outstanding requests (providing precise exception behaviour but stall cycles between back to back accesses) or allow forwarding with a single outstanding request provided the response is available (providing the existing ibex behaviour with the feed-through). It could use the address of the request to determine an appropriate action allowing precise exception behaviour on non-idempotent.
The
data_req_allow_overlap_o
signal would bypass request blocking where this is an on-going request, though it would be optional for a bus adapter to do this (it could choose to ignore and then we can have the property the second half of an unaligned access isn't sent until we know the first has completed successfully).To handle bus errors an extra
data_err_ft_i
as proposed above isn't added but rather the bus adapter handles suppressing a request in the scenario the feed-through is used (so it can send a new request the same cycle a response comes back) and you have a potential request the same cycle a bus error response is seen.The
data_err_i
then prevents received load data from being written back and also triggers a suitable exception but doesn't suppress any new requests. There's no internal feed-through if it doesn't factor into thedata_req_o
signal. Though it does need careful handling where the bus adapter does allow a new request out the same cycle an error response is seen. Thedata_err_i
will cause a flush of a load or store in the ID stage, however if that load or store has just has its requested granted we don't want to flush the instruction. The solution here is to only flush it if the request hasn't been granted.Other points of note
With the proposal above it will be possible for a load or store in ID to have its request granted and not move into the writeback stage (because there's another load or stall stalled there awaiting a response). There's an assumption in the microarchitecture that once a request is granted the load or store will immediately move into writeback (i.e. if a response appears it's always related to the instruction in the writeback stage). This will need examining and the scenario where we have loads/stores in both ID and writeback both with outstanding requests needs to be correctly handled (with new cover points written and hit in DV).
You need to consider the most appropriate handling of bus errors in the non feed-through case where they do not give precise exceptions. As described above they will still appear as a normal exception. They instead could be routed as an interrupt (typically an NMI). In this case I think you need a new signal to go with
data_err_i
, e.g.data_write_suppress_i
. This one would simply prevent writing results into the register file but not do anything else (flushing pipeline/triggering exceptions etc) and then the interrupt will trigger the jump into the appropriate handler.It's a slight shame to replicate outstanding memory access tracking in the bus adapter, when you've also got this information available in the pipeline stages. An extra
memory_access_ongoing_o
output could be created to avoid replicating tracking logic. However it's only a single flop and a handful gates so perhaps better to avoid the interface complexity. Some assertion should check the bus adapter always agrees with the pipeline as to whether or not a request is ongoing.Co-simulation checking will be made more complex with bus errors as depending on how the top-level interface behaves they may or may not be precise! One way to do with this would be some probing into ID to determine which instruction gets the bus error along with some backup error checking that ensures a bus error is actually taken. You could also randomly decide at the beginning of any test if you will always have precise exceptions or have a mix. With the former option you can retain the existing checking. This is important as it checks precise exceptions do indeed always work properly. With looser checking you might have an imprecise exception that passes the checks where it should have been precise.
The text was updated successfully, but these errors were encountered: