Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
试了各种方法,从godbolt.org边看汇编出来的代码边调试,把数组下标从int改成size_t以后有一点的提升,加了-ffast-math和-march=native以后提升明显,把star的结构体从AOS改成了SOA,提升了很多,把一些临时变量变成了数组,空间换时间,alignas试了8,16,32,64,128,32和64差不多都比较好,128会有所下降,还有就是把一些比较复杂的循环拆成了一些小循环,有一点点提升,#prama omp simd在所有循环之前都加了,但测试好像只要在最外层循环外面加就行,里面加不加都不影响,看汇编发现三个及以上的连乘好像会比两个数乘法更复杂的样子,不知道会不会降低速度。
baseline的结果是:
Initial energy: -8.571528
Final energy: -8.511633
Time elapsed: 1546 ms
优化以后:
simd optimized
Initial energy: -9.936085
Final energy: -9.926659
Time elapsed: 198 ms
大概提升了7.8倍