Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve rt ffs providing another find least non-0 bit position method with no memory requirement #9729

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

pegasusplus
Copy link

@pegasusplus pegasusplus commented Nov 30, 2024

拉取/合并请求描述:再提供两种查找最低非0bit位的算法选择

最新更新:了解到很多编译器有内置函数,尽可能直接利用处理器专有指令来计算最低非0位的位置,经测试比原本的ffs和原作者改进的tiny_ffs再快一些,加入使用GCC/KEIL/IAR的内置函数来计算的方法。(家里的实体机英特尔处理器快28%,GitHub上快35%,有机会可以测试一些ARM(新一些的都有专有指令))

ffs execution time: 14580990 microseconds
tiny_ffs execution time: 13520661 microseconds
puny_ffs execution time: 20757811 microseconds
builtin_ffs execution time: 8865771 microseconds

KConfig里默认配的优先使用编译器内置函数来计算

微信上看到原作者分享算法经验,为rt-thread中查找最低非0bit位的算法做了改进,觉得很赞,又觉得还可以改进,
不使用额外空间,算法简单直接,故再提供一个选项,可以极限压榨内存。

本算法采用折半查找的原理,判断非0bit位的位置,32bit需连续5轮判断比较。
三种算法比较:32bit整数全部查找一遍:

ffs execute 4294967295 times execution time: 9307483 microseconds
tiny_ffs execute 4294967295 times execution time: 9487902 microseconds
puny_ffs execute 4294967295 times execution time: 16202501 microseconds

这个不额外占内存的慢一些。
上面是在自家Intel上测试的8bit时相当,16bit慢33%,32bit慢67%大致;
刚才又用GitHub测试了一下,32bit大约慢25%

我的仓库中Action编译链接成功:
https://github.com/pegasusplus/rt-thread/actions/runs/12099488207

当前拉取/合并请求的状态 Intent for your PR

必须选择一项 Choose one (Mandatory):

  • 本拉取/合并请求是一个草稿版本 This PR is for a code-review and is intended to get feedback
  • 本拉取/合并请求是一个成熟版本 This PR is mature, and ready to be integrated into the repo

代码质量 Code Quality:

我在这个拉取/合并请求中已经考虑了 As part of this pull request, I've considered the following:

  • 已经仔细查看过代码改动的对比 Already check the difference between PR and old code
  • 代码风格正确,包括缩进空格,命名及其他风格 Style guide is adhered to, including spacing, naming and other styles
  • 没有垃圾代码,代码尽量精简,不包含#if 0代码,不包含已经被注释了的代码 All redundant code is removed and cleaned up
  • 所有变更均有原因及合理的,并且不会影响到其他软件组件代码或BSP All modifications are justified and not affect other components or BSP
  • 对难懂代码均提供对应的注释 I've commented appropriately where code is tricky
  • 代码是高质量的 Code in this PR is of high quality
  • 已经使用formatting 等源码格式化工具确保格式符合RT-Thread代码规范 This PR complies with RT-Thread code specification
  • 如果是新增bsp, 已经添加ci检查到.github/workflows/bsp_buildings.yml 详细请参考链接BSP自查

@CLAassistant
Copy link

CLAassistant commented Nov 30, 2024

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added Kernel PR has src relate code action github action yml imporve labels Nov 30, 2024
@pegasusplus pegasusplus changed the title Improve rt ffs Improve rt ffs providing another find least non-0 bit position method with no memory requirement Nov 30, 2024
@mysterywolf
Copy link
Member

感谢提交PR,无需处理CI报的错误,我会把您的PR重新整理一下。

@supperthomas
Copy link
Member

ci改动不需要提交,直接手动触发即可

Some compiler complains value & (value - 1) ^ value
better (value & (value - 1)) ^ value
@pegasusplus
Copy link
Author

pegasusplus commented Dec 2, 2024

最新更新:了解到很多编译器有内置函数,尽可能直接利用处理器专有指令来计算最低非0位的位置,经测试比原本的ffs和原作者改进的tiny_ffs再快一些,加入使用GCC/KEIL/IAR的内置函数来计算的方法。(家里的实体机英特尔处理器快28%,GitHub上快35%,有机会可以测试一些ARM(新一些的都有专有指令))

ffs execution time: 14580990 microseconds
tiny_ffs execution time: 13520661 microseconds
puny_ffs execution time: 20757811 microseconds
builtin_ffs execution time: 8865771 microseconds

@pegasusplus
Copy link
Author

上面的CLA协议,好像我用了本机的Visual Studio Community来测试了一下性能,后来提交代码时,记录了Visual Studio Community上配的微软账号,这样就多了一个账号,还没办法签协议,不是GitHub的账户。

@aozima
Copy link
Member

aozima commented Dec 2, 2024

上面的CLA协议,好像我用了本机的Visual Studio Community来测试了一下性能,后来提交代码时,记录了Visual Studio Community上配的微软账号,这样就多了一个账号,还没办法签协议,不是GitHub的账户。

rebase改下作者,重新提交到同名分支

push --force-with-lease

@pegasusplus
Copy link
Author

谢谢,解决好了

@pegasusplus
Copy link
Author

static code analysis report error at kservice.c:404:5, uninitialized 'pc', seems has not relation with rt_ffs? I'll check to confirm later.

@pegasusplus
Copy link
Author

File format / license checking:
[file_check.py 178 INFO] [src/kservice.c]: encoding check success.
[file_check.py 186 ERROR] files format check fail.
[file_check.py 285 ERROR] file format check or license check fail.

Will confirm later.

@mysterywolf
Copy link
Member

mysterywolf commented Dec 19, 2024

您先确认一下核心算法没有问题,如果没有问题的话 我来处理Kconfig和开关宏的问题就行。CI不过我来处理就行

@pegasusplus
Copy link
Author

OK, I'll re-examine the CI errors. the result of new algorithm has been confirmed with several original one on every possible value, and then compare time usage. Maybe the judge condition on the existence of compiler-built-in function on some compile environments have problems, I'll check and update progress.

@pegasusplus
Copy link
Author

One by one, first :
RTduino/Arduino Libraries (STM32F412 Nucleo)
It seems has no relation with code compiling:
'''
scons: done building targets.
scons -c
Error: Process completed with exit code 7.
0s
Run curl -X POST -H "Authorization: token "
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed

0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
{
100 272 100 109 100 163 1303 1948 --:--:-- --:--:-- --:--:-- 3277
"message": "Bad credentials",
"documentation_url": "https://docs.github.com/rest",
"status": "401"
}
'''

@pegasusplus
Copy link
Author

RTduino/Arduino Libraries (Raspberry Pico)
Similar with previous one.
'''
/home/runner/work/rt-thread/rt-thread/bsp/raspberry-pico/tools/elf2uf2 rtthread-pico.elf rtthread-pico.uf2
scons: done building targets.
scons -c
Error: Process completed with exit code 8.
0s
Run curl -X POST -H "Authorization: token "
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed

0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 272 100 109 100 163 570 853 --:--:-- --:--:-- --:--:-- 1431
{
"message": "Bad credentials",
"documentation_url": "https://docs.github.com/rest",
"status": "401"
}
'''

@pegasusplus
Copy link
Author

stm32l4_f0_f1
Similar credential problem?
'''
arm-none-eabi-size rt-thread.elf
text data bss dec hex filename
47840 1556 2292 51688 c9e8 rt-thread.elf
scons: done building targets.
scons -c
Error: Process completed with exit code 15.
0s
Run curl -X POST -H "Authorization: token "
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed

0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 272 100 109 100 163 1339 2002 --:--:-- --:--:-- --:--:-- 3358
{
"message": "Bad credentials",
"documentation_url": "https://docs.github.com/rest",
"status": "401"
}'''

@pegasusplus
Copy link
Author

stm32_f2_f4
stm32_f7_g0_h7_mp15_u5_h5_wb5
Infineon_TI_microchip
nordic(yml)
ditto.

llvm-arm
fatal: unable to access 'https://github.com/RT-Thread-packages/stm32l4_cmsis_driver.git/': The requested URL returned error: 502

ESP32C3
scons: *** [build/drivers/board.o] Error 1
drivers/drv_gpio.c:14:10: fatal error: driver/gpio.h: No such file or directory
14 | #include "driver/gpio.h"
| ^~~~~~~~~~~~~~~
compilation terminated.
scons: *** [build/drivers/drv_gpio.o] Error 1
scons: building terminated because of errors.
scons -c
Error: build ESP32_C3 devices.wifi failed.

@pegasusplus
Copy link
Author

pegasusplus commented Dec 19, 2024

Static code analysis
This error is really in src/kservice.c, checking the source I found the original macro of RT_HW_BACKTRACE_FRAME_GET_SELF is using an uninitialized variable: pc; it needs to be some certain value, 0 probably.

'''
#ifndef RT_HW_BACKTRACE_FRAME_GET_SELF

#ifdef GNUC
#define RT_HW_BACKTRACE_FRAME_GET_SELF(frame) do {
(frame)->fp = (rt_uintptr_t)__builtin_frame_address(0U); \

error line 97:
(frame)->pc = ({label pc; pc: (rt_uintptr_t)&&pc;}); \

} while (0)

#else
#define RT_HW_BACKTRACE_FRAME_GET_SELF(frame) do {
(frame)->fp = 0;
(frame)->pc = 0;
} while (0)

#endif /* GNUC */

#endif /* RT_HW_BACKTRACE_FRAME_GET_SELF */
'''
Scan code format and license
[file_check.py 148 ERROR] components/drivers/rtc/dev_soft_rtc.c line[278]: the RT-Thread error code should return negative value. e.g. return -RT_ERROR
in drivers/rtc/dev_soft_rtc.c, maybe no relation with updated files in this PR.

Complete checking, seems errors not in compiling the updated algorithm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action github action yml imporve Kernel PR has src relate code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants