Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tlul,rtl] Correct the calculation of data_intg in tlul_adapter_sram #25925

Merged
merged 1 commit into from
Jan 17, 2025

Conversation

rswarbrick
Copy link
Contributor

This is the value that gets supplied as a user field that provides data integrity. If we are responding with an error response, we need to make sure we send integrity bits that correspond.

The logic that was previously here detected this condition with

(vld_rd_rsp && reqfifo_rdata.error)

but that is wrong because it shouldn't depend on vld_rd_rsp. Imagine we send a TL write with a TL error. When we read the response, the d_error flag will be high (because u_reqfifo contains the faulty TL write) and d_data will be error_blanking_data. But the integrity bits will be SecdedInv3932ZeroEcc because vld_rd_rsp is false (we haven't seen a TL read at all!)

This is also possible to trigger by using only reads. Suppose we send an TL read with a TL error and then, a few cycles later, read the TL response.

When we read the response, the d_error flag will again be high (because u_reqfifo contains the faulty TL read). Again, d_data will (correctly) be error_blanking_data. Again, we should be using error_blanking_integ for error bits but we actually use SecdedInv3932ZeroEcc.

Dropping the vld_rd_rsp term will fix the behaviour in both cases.

So it remains in sync with the RTL, we also drop a conditional coverage exclusion for rom_ctrl. Tracking down how it was actually possible to see this happen led us to the design change.

@KinzaQamar: Thanks for helping me understand this in the first place.

This is the value that gets supplied as a user field that provides
data integrity. If we are responding with an error response, we need
to make sure we send integrity bits that correspond.

The logic that was previously here detected this condition with

  (vld_rd_rsp && reqfifo_rdata.error)

but that is wrong because it shouldn't depend on vld_rd_rsp. Imagine
we send a TL write with a TL error. When we read the response, the
d_error flag will be high (because u_reqfifo contains the faulty TL
write) and d_data will be error_blanking_data. But the integrity bits
will be SecdedInv3932ZeroEcc because vld_rd_rsp is false (we haven't
seen a TL read at all!)

This is also possible to trigger by using only reads. Suppose we send
an TL read with a TL error and then, a few cycles later, read the
TL response.

When we read the response, the d_error flag will again be
high (because u_reqfifo contains the faulty TL read). Again, d_data
will (correctly) be error_blanking_data. Again, we should be using
error_blanking_integ for error bits but we actually use
SecdedInv3932ZeroEcc.

Dropping the vld_rd_rsp term will fix the behaviour in both cases.

So it remains in sync with the RTL, we also drop a conditional
coverage exclusion for rom_ctrl. Tracking down how it was actually
possible to see this happen led us to the design change.

Signed-off-by: Rupert Swarbrick <[email protected]>
@rswarbrick rswarbrick requested review from vogelpi and alees24 January 17, 2025 13:21
@rswarbrick rswarbrick requested a review from a team as a code owner January 17, 2025 13:21
@rswarbrick rswarbrick requested review from marnovandermaas and removed request for a team January 17, 2025 13:21
@vogelpi vogelpi requested a review from nasahlpa January 17, 2025 13:40
Copy link
Contributor

@vogelpi vogelpi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rswarbrick for the fix and the carefully written description, that's very useful!

IIRC, the issue also affected reads because during a read generating a TL error, the request is not actually sent out to the SRAM and thus no response is obtained from the SRAM and u_rspfifo remains empty, meaning rspfifo_rvalid and thus vld_rd_rsp remain deasserted. Is this correct?

@vogelpi
Copy link
Contributor

vogelpi commented Jan 17, 2025

@nasahlpa according to the description of Rupert we have been generating ECC errors upon TL errors in the past for the SRAMs is this inline with your recollection?

@vogelpi
Copy link
Contributor

vogelpi commented Jan 17, 2025

CHANGE AUTHORIZED: hw/ip/tlul/rtl/tlul_adapter_sram.sv

This PR fixes a bug in the error reporting. Previously, the adapater did insert an ECC error was generated when experiencing a TL-UL error. Fixing this is a good thing.

@rswarbrick
Copy link
Contributor Author

@vogelpi: Yep, I think so. Indeed, that's what I originally saw when reasoning about things. See the text starting with "This is also possible to trigger".

Copy link
Member

@nasahlpa nasahlpa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spotting & fixing this!

I've created an issue #25927 as we def. should check the error response of the SRAM controller more carefully.

Copy link
Contributor

@marnovandermaas marnovandermaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I ran into this issue on Sonata at some point as well, I just didn't know how to fix it. Thanks for this!

@marnovandermaas
Copy link
Contributor

CHANGE AUTHORIZED: hw/ip/tlul/rtl/tlul_adapter_sram.sv

@rswarbrick rswarbrick merged commit 36dfa5f into lowRISC:master Jan 17, 2025
28 checks passed
@rswarbrick rswarbrick deleted the tlul-adapter-sram-data-intg branch January 17, 2025 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants