Fast (last) vector dimension being used by different threads might cause false cache sharing #198

hdante · 2024-09-03T21:25:04Z

Hello, this is a report of a conceptual performance issue in the code, but without any practical effects - when declaring the locChi2 and locInd multi-dimensional vectors, the dimension used for splitting the work into multiple threads was the last dimension. This means there will be vectors with contiguous data where each contiguous element is accessed by different hardware threads. Since typical cache lines are 64 bytes wide, the same cache lines will be requested and "locked" by multiple hardware threads at the same time, in practice serializing the access to the chi2 values, slowing down and prohibiting the parallelism in the loops using this vector.

Instead, the dimension used for distributing the work between the threads shouldn't be the last one, so that values dispatched to different threads are guaranteed to be in different cache lines.

Before submitting
Please check the following:

I have described the situation in which the bug arose, including what code was executed, information about my environment, and any applicable data others will need to reproduce the problem.
I have included available evidence of the unexpected behavior (including error messages, screenshots, and/or plots) as well as a descriprion of what I expected instead.
If I have a solution in mind, I have provided an explanation and/or pseudocode and/or task list.

The text was updated successfully, but these errors were encountered:

johannct · 2024-09-04T06:31:44Z

Do you think you can help us investigate this and the other performance issue you flagged? It would be great to have you in the developer team, given the input you provided so far.

hdante · 2024-09-04T19:32:11Z

Hello, Johann, I can help you, but these are not real performance issues. In this issue, the relevant loop is already fast and small, and changes to it won't noticeably impact the performance, so this issue is a matter of recommended practices and avoiding future problems. I've reported two other issues that might be potential multi-threading bugs. I still need to report a real performance issue found in the hot loop, where dictionary lookups are eating part of the performance, but the expected speed-up is around 10%.

I'll leave the suggestions for the reported issues as comments. For this issue, I'd solve it by changing the order of the affected vector's dimensions while assigning the first dimension for the thread ID:

vector<vector<vector<double>>> locChi2(number_threads, vector<vector<double>>(3, vector<double>(dimzg, 1.e9)));
vector<vector<vector<int>>> locInd(number_threads, vector<vector<int>>(3, vector<int>(dimzg, -1)));

Doing only this change does not guarantee that there will be zero false sharing because the different 1-D vectors of length dimzg might be aligned on 16-byte boundaries when using the default dynamic memory allocator, which is smaller than the 64-byte cache line, but this already moves away from the previous situation where there's 100% guarantee of false sharing because the values for different threads are 8-byte contiguous values by design. Another possible change would be to flatten this 3-D vector into a large 1-D vector, but given this partial solution, I wouldn't go any further to optimize an already low cost loop.

hdante added the bug Something isn't working label Sep 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast (last) vector dimension being used by different threads might cause false cache sharing #198

Fast (last) vector dimension being used by different threads might cause false cache sharing #198

hdante commented Sep 3, 2024

johannct commented Sep 4, 2024

hdante commented Sep 4, 2024

Fast (last) vector dimension being used by different threads might cause false cache sharing #198

Fast (last) vector dimension being used by different threads might cause false cache sharing #198

Comments

hdante commented Sep 3, 2024

johannct commented Sep 4, 2024

hdante commented Sep 4, 2024