-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve rt ffs providing another find least non-0 bit position method with no memory requirement #9729
base: master
Are you sure you want to change the base?
Conversation
感谢提交PR,无需处理CI报的错误,我会把您的PR重新整理一下。 |
ci改动不需要提交,直接手动触发即可 |
Some compiler complains value & (value - 1) ^ value better (value & (value - 1)) ^ value
最新更新:了解到很多编译器有内置函数,尽可能直接利用处理器专有指令来计算最低非0位的位置,经测试比原本的ffs和原作者改进的tiny_ffs再快一些,加入使用GCC/KEIL/IAR的内置函数来计算的方法。(家里的实体机英特尔处理器快28%,GitHub上快35%,有机会可以测试一些ARM(新一些的都有专有指令)) ffs execution time: 14580990 microseconds |
上面的CLA协议,好像我用了本机的Visual Studio Community来测试了一下性能,后来提交代码时,记录了Visual Studio Community上配的微软账号,这样就多了一个账号,还没办法签协议,不是GitHub的账户。 |
rebase改下作者,重新提交到同名分支
|
b9c8f95
to
4ba2594
Compare
谢谢,解决好了 |
static code analysis report error at kservice.c:404:5, uninitialized 'pc', seems has not relation with rt_ffs? I'll check to confirm later. |
File format / license checking: Will confirm later. |
您先确认一下核心算法没有问题,如果没有问题的话 我来处理Kconfig和开关宏的问题就行。CI不过我来处理就行 |
OK, I'll re-examine the CI errors. the result of new algorithm has been confirmed with several original one on every possible value, and then compare time usage. Maybe the judge condition on the existence of compiler-built-in function on some compile environments have problems, I'll check and update progress. |
One by one, first : 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 |
RTduino/Arduino Libraries (Raspberry Pico) 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 |
stm32l4_f0_f1 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 |
stm32_f2_f4 llvm-arm ESP32C3 |
Static code analysis ''' #ifdef GNUC
#else #endif /* GNUC */ #endif /* RT_HW_BACKTRACE_FRAME_GET_SELF */ Complete checking, seems errors not in compiling the updated algorithm. |
拉取/合并请求描述:再提供两种查找最低非0bit位的算法选择
最新更新:了解到很多编译器有内置函数,尽可能直接利用处理器专有指令来计算最低非0位的位置,经测试比原本的ffs和原作者改进的tiny_ffs再快一些,加入使用GCC/KEIL/IAR的内置函数来计算的方法。(家里的实体机英特尔处理器快28%,GitHub上快35%,有机会可以测试一些ARM(新一些的都有专有指令))
ffs execution time: 14580990 microseconds
tiny_ffs execution time: 13520661 microseconds
puny_ffs execution time: 20757811 microseconds
builtin_ffs execution time: 8865771 microseconds
KConfig里默认配的优先使用编译器内置函数来计算
微信上看到原作者分享算法经验,为rt-thread中查找最低非0bit位的算法做了改进,觉得很赞,又觉得还可以改进,
不使用额外空间,算法简单直接,故再提供一个选项,可以极限压榨内存。
本算法采用折半查找的原理,判断非0bit位的位置,32bit需连续5轮判断比较。
三种算法比较:32bit整数全部查找一遍:
ffs execute 4294967295 times execution time: 9307483 microseconds
tiny_ffs execute 4294967295 times execution time: 9487902 microseconds
puny_ffs execute 4294967295 times execution time: 16202501 microseconds
这个不额外占内存的慢一些。
上面是在自家Intel上测试的8bit时相当,16bit慢33%,32bit慢67%大致;
刚才又用GitHub测试了一下,32bit大约慢25%
我的仓库中Action编译链接成功:
https://github.com/pegasusplus/rt-thread/actions/runs/12099488207
当前拉取/合并请求的状态 Intent for your PR
必须选择一项 Choose one (Mandatory):
代码质量 Code Quality:
我在这个拉取/合并请求中已经考虑了 As part of this pull request, I've considered the following:
#if 0
代码,不包含已经被注释了的代码 All redundant code is removed and cleaned up