Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sram throughput #279

Closed
wants to merge 3 commits into from
Closed

Sram throughput #279

wants to merge 3 commits into from

Conversation

GregAC
Copy link
Contributor

@GregAC GregAC commented Oct 28, 2024

This allows us to have back to back accesses to SRAM without any stall cycles.

Sadly it increases timing pressure (shouldn't be a big difference, now the response FIFO in SRAM must choose between two different flopped responses and the response direct from SRAM and previously it chose between a single flopped response and a response direct from SRAM, though will add a logic level at least and I guess that's pushed a few things closer to the edge).

When you build this under Vivado it sometimes reports a timing failure but when you ask for a detailed timing report it changes its mind and says timing is fine. Possibly something to do with the over-constraining @elliotb-lowrisc put in? Or maybe the reported timing is based on a rough timing analysis and the more detailed one shows it's actually fine.

I think this is important to get in, takes coremark from 1.34 to 1.47 (almost 10% improvement) and clearly is generally applicable.

Another change to be considered is switching to a single cycle multiplier. Unfortunately that also adds in more timing pressure so left that off for now (but did add in the Ibex changes so we can choose which multiplier we want). It does still meet timing but pushes synthesis time up yet more.

@elliotb-lowrisc
Copy link
Contributor

Where in the output files is it reporting this timing failure? There are definitely some points in the flow where it will give bad timing estimates. I've had a go at building this with Vivado v2021.1 (64-bit) and couldn't see anything concerning at a glance at the high-level results.

Incidentally, do you know what the register u_sonata_system/u_top_tracing/u_ibex_top/u_ibex_core/cs_registers_i/gen_cntrs[1].gen_imp.mcounters_variable_i/counter_q_reg[*] is for? It keeps popping up as the endpoint of critical paths, but looking in ibex_cs_registers.sv seems to show it as some sort of reserved counter.

@GregAC
Copy link
Contributor Author

GregAC commented Oct 28, 2024

Where in the output files is it reporting this timing failure?

When you open the Vivado GUI it's listed in the 'Timing' section of the 'Project Summary' window, with the negative slacks noted in the columns in the 'Design Runs' tab plus there's a critical warning noting 'The design failed to meet timing requirements' output from the 'Route Design' stage.

Incidentally, do you know what the register u_sonata_system/u_top_tracing/u_ibex_top/u_ibex_core/cs_registers_i/gen_cntrs[1].gen_imp.mcounters_variable_i/counter_q_reg[*] is for?

Yes it's a performance counter, we could flop the increment signal for these. No big deal if your performance counters have one cycle of latency (other than for the poor person trying to get a cycle accurate match in a simulator but we don't do that!). I can have a look at doing this in our cheriot-ibex fork.

I've got another easy timing fix I should really get in as well, didn't have a dramatic effect on fmax but could be good for overall timing pressure.

@elliotb-lowrisc
Copy link
Contributor

'The design failed to meet timing requirements'

Odd, that should be correct as it's output after post-route optimisation (in the logs at least). The Timing section of the Project Summary window seems clean in my build + tool-version, so perhaps it's a version-specific bug (not uncommon in my experience). Could also be something to do with the stage-specific clock over-constraining I put in I suppose, if it was not being cleared properly for some reason.

Took me a while to find the Project Summary as I had been opening the Design Checkpoints (.dcp) files. Nice to know there's this nice overview too.

@GregAC
Copy link
Contributor Author

GregAC commented Oct 28, 2024

Took me a while to find the Project Summary as I had been opening the Design Checkpoints (.dcp) files. Nice to know there's this nice overview too.

I use the build-gui target of the fusesoc make file

make -C ./build/lowrisc_sonata_system_0/synth-vivado/ build-gui

That provides the design summary when it opens.

Copy link
Contributor

@marnovandermaas marnovandermaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't realized this timing failure also occurs without the single cycle multiplier. I would be hesitant to merge this in without first understanding why that negative slack is reported and fix the constraining first.

@GregAC
Copy link
Contributor Author

GregAC commented Oct 29, 2024

I would guess it's a tooling bug, seems it doesn't occur in earlier Vivado versions (such as v2021.1 Elliot is using). I'll download the latest release and try it today.

I would rather not block v1.0 on getting to the bottom of a tooling bug given the actual timing analysis reports a pass and we can always use v2021.1 if we're concerned by this. I am keen to get this in for v1.0 as it's a noticeable performance impact.

@GregAC
Copy link
Contributor Author

GregAC commented Oct 29, 2024

Same behaviour (negative slack that disappears when you run a full timing analysis) is seen on v2024.1 but I also see this with current main. We should investigate in more detail but for let's just use 2021.1 to create releases: see #289

@marnovandermaas are you happy accepting this given it builds fine under v2021.1 and our current main sees a similar failure under v2024.1 (i.e. whatever this issue is it's already present in sonata-system not inherent to this change).

@marnovandermaas marnovandermaas dismissed their stale review October 29, 2024 14:36

Also broken on main

@marnovandermaas
Copy link
Contributor

I'm not happy with that current main is broken on 2024, but that is not a problem with this PR.

@GregAC
Copy link
Contributor Author

GregAC commented Oct 29, 2024

Just done a build on v2021.1 of this rebased on top of main. As reported by @elliotb-lowrisc builds fine and reports no timing issues.

@marnovandermaas are you happy to approve this?

@GregAC GregAC mentioned this pull request Oct 30, 2024
@marnovandermaas
Copy link
Contributor

I am using Vivado 2024.1 and main is passing for me. It is not with this PR, even when I open the more detailed timing report.

Need a minimum of 2 (this is what is used in OpenTitan) to enable back
to back requests without stall cycles.
Update code from upstream repository
https://github.com/lowrisc/cheriot-ibex.git to revision
ea2df9db3bcea776f0dc72d6d89c31c73798ecd4

* Feed RV32M through ibexc_top_tracing/ibexc_top (Greg Chadwick)
* Switch to no bitmanip by default (Greg Chadwick)
* Feed RV32B through in ibexc_top (Greg Chadwick)

Signed-off-by: Greg Chadwick <[email protected]>
This is effectively a no-op change. Before the latest Ibex was vendored
we had no bitmanip (the RV32BFull parameter was not fully passed
through) and RV32M was the fast multiplier.

Sadly the single cycle multiplier seems to be increasing timing
pressure. It does just meet timing but greatly increases synthesis
times. As it's implemented with in-built FPGA DSP blocks it shouldn't be
a big issue to use it so something to examine here but for now leave
things as they are.
Copy link
Contributor

@marnovandermaas marnovandermaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even after a rebase this is reporting timing failure on 2024.1 while main is passing fine. I'm marking this as request changes until I test it out on 2021.1

@marnovandermaas
Copy link
Contributor

I just confirmed that timing is passing on 2021.1 but I would still prefer we figure out why timing is failing from main to this PR on 2024.1 before merging.

@marnovandermaas
Copy link
Contributor

My last force push is a rebase.

Copy link
Contributor

@alees24 alees24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've built this cleanly on top of latest main this evening using Vivado 2022.2 and tests/test_runner passes (modulo removal of pinmux loopback wire-dependent checks).

@GregAC
Copy link
Contributor Author

GregAC commented Oct 31, 2024

but I would still prefer we figure out why timing is failing from main to this PR on 2024.1 before merging.

My view (as stated above) is this is an important performance improvement that we want for 1.0. It is unfortunate it's having issues with 2024.1 but 2021.1 is the agreed sign-off tool and it's working fine under that. We can investigate what's going on with 2024.1 (could just be some changes in the TCL scripting environment meaning @elliotb-lowrisc's flow improvements aren't working properly) but I don't think it's worth spending time on right now and certainly shouldn't be required for the 1.0 release.

@GregAC
Copy link
Contributor Author

GregAC commented Nov 3, 2024

Closing in favour of #314

@GregAC GregAC closed this Nov 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants