我们的工作习惯

不是让软件能出包就算修复，对于 Patch 的内容我们有着一些不成文的约定。你可以在这篇文章中了解我们以往对某个错误的通用修复方式，也可以了解我们的一些要求。

XX is not available for the 'riscv64' architecture

这是因为 PKGBUILD 的 arch 数组设置 riscv64 的值。请不要把 arch 改为 any，any 表示不同架构出的包二进制上没有区别。

你可以用 setconf 重新设置值：

setconf PKGBUILD arch '(riscv64 x86_64)'

还有请注意不要提交对 arch 的修改。

修代码优先还是改编译 flag 优先？

优先修代码。

参考： https://build.opensuse.org/package/view_file/devel:ARM:Factory:Contrib:ILP32/presage/presage-0.9.1-gcc11.patch?expand=0

tarball URL 变了该怎么办

给 Arch 交 BUG

版本号

版本号不要超过主线，尽量以催更为主。如果实在不行就加一行注释，提一下上游的进度，让以后维护的人有个参考。

循环依赖

有些包可以一起打，但在状态页上显示互相缺了对方，这种需要 bootstrap。

详细参考 ghc bootstrap Patch: https://github.com/felixonmars/archriscv-packages/pull/17/commits/36279fff5314fbcd212549ede73db5257ef10b1d

patch hunk offset

Q: 平时更新 patch 的时候，如果有代码的 patch hunk 飘了，能打上但是有 offset，需要把 patch 更新成最新的行号吗？

A: 不需要

测试

如果有的包不在乎测试是否通过，比如 test || echo，那可以直接把测试注释掉。

访问一些会在编译期被删除的文件

有一个比较脏的做法：往 PKGBUILD 里加一句 bash -i，然后就用这个 subshell 来交互 build env。

关于 upstream

如果上游的 maintainer 是肥猫的话，可以直接在群里 @ 肥猫。

Python 2to3

Python 的包不管 major version 都要跑一次 2to3。

qemu-user-blacklist.txt

当且仅当包出错的原因来源于 QEMU User 的 BUG 本身，且其他 riscv 机器无法复现此 BUG 的时候，才可以往这个清单里加新的包。

为什么我们不尝试对 qemu user 进行修复呢？

现在我们遇到好几个 qemu 挂死的版本，我们维护了好几个版本

首先是这个 always-malloc，我们手打了个 glib，稍微缓解了一些 https://github.com/ZenithalHourlyRate/glib
然后 qemu 在 futex pi 这个 syscall 上翻译不行，这在某些 glibc 版本下会导致 pthread 挂（影响所有 qemu-user 版本，不只 riscv），然后粗糙搓了一个https://github.com/ZenithalHourlyRate/qemu
我们还发现一些情况下进程会 Z 掉（打 rust 的时候），我们猜测和这个 issue 有关 https://gitlab.com/qemu-project/qemu/-/issues/140 ，但看起来非常 non-trivial 不好修，称其为 vfork 吧
在 always-malloc 修了以后，我们依然会遇到和 1 一样的 futex hang 的问题（打 go 的时候），我们还没调查咋回事

为什么要选择手动维护 noqemu blacklist 这种看似低效的方式呢？

根据测试，QEMU-User 编译速度大于 RISC-V 板子大于 QEMU-System，手工维护 noqemu blacklist 其实效率反而更高一些。

修改 qemu-user-blacklist.txt

该文件应该保持字典序, 可以通过以下命令向该文件中添加一个新的包:

echo "包名" | LC_ALL=C sort -o qemu-user-blacklist.txt - qemu-user-blacklist.txt

注意 LC_ALL=C 是必须的. 不指定 LC_ALL=C 会导致 sort 按照当前系统的本地化设置进行排序, 可能会忽略除字母和数字之外的其他字符, 导致最后顺序不一样. 参考 man sort 和此链接.

Unknown public key....

这是因为本地的 gpg 数据库没有这个开发者的公钥。你可以用命令 gpg --recv-key keyid 下载并导入这个 key。

Arch Linux 最近通过了把 key 放到 SVN 的 RFC，在 PKGBUILD 同级目录下会有 key 文件夹放置这些 gpg key。如果找不到 key 就到上游开 bug report 让他们把 key 放到 git 里。如果你找到了 key，而且上游没动静，可以暂时先把 key 和 riscv64.patch 放一起提交到肥猫仓库。

如果遇到了 gpg: keyserver receive failed: No data 这样的错误，大概率是 keyserver 被橄榄了。

sks pool 因为是单增的，无法满足 GDPR 对被遗忘权的规定，被欧盟法律橄榄了。

你可以先尝试在下面几个 sks 镜像看看能不能找到：

https://pgp.surfnet.nl/
pgp.uni-mainz.de 11370
pgp.circl.lu 11370
keys.andreas-puls.de 11370

使用 gpg --keyserver https://example.com --recv-key keyid 来指定下载服务器。

如果还是找不到，可以找找他们项目主页，团队介绍，个人博客等等等页面。如果是 GitHub 的项目，你可以找到那个开发者的主页，然后在后面加上 .gpg 来查看他的 key。

https://github.com/user.gpg
                       ^^^^

然后用 gpg import 导入： curl https://github.com/user.gpg | gpg --import。下载完之后可以往其他公共服务器上都丢一份。

如果实在是找不到 key，但是又急着测试，可以用参数 --skippgpcheck 暂时跳过检查。

extra-riscv64-build -- -d "..." -- --skippgpcheck
                                 # ^^^^^^^^^^^^^^^^^ 多用一个 -- 把 skippgpcheck 传给更下层的 makepkg

或者可以参考这个： https://github.com/archlinuxcn/lilac/blob/master/recv_gpg_keys

Rust

error: failed to run `rustc` to learn about target-specific information

解决方案：删除 PKGBUILD 里的 --target "$CARCH-unknown-linux-gnu" 参数

这个问题源自于 Rust 和 LLVM 的 Triple 定义不一致。Rust 学 GCC 搞 rv64<ext> 这样的风格，于是 --target 期待输入 riscv64gc-unknown-linux-gnu，但 CARCH 并不是这样的风格，于是就会传入 riscv64-unknown-linux-gnu 这样的字符串。

References:

https://github.com/sodiumoxide/sodiumoxide/pull/474/commits/6dd994493ac592cf59798df75c293c6212c973a9#diff-6651588311ef4e5d49026d64fe43063739e047195958490a79c64146a719bd28R152

https://github.com/rust-lang/rust/issues/62117

https://github.com/lowRISC/riscv-llvm/blob/master/0002-RISCV-Recognise-riscv32-and-riscv64-in-triple-parsin.patch

error: failed to run custom build command for `ring v0.16.20`

解决方案：使用肥猫 patch 后的 ring。在 PKGBUILD 的 prepare 函数（如果没有这个函数就自己加上）内加入下面两行：

echo -e "\n[patch.crates-io]\nring = { git = 'https://github.com/felixonmars/ring', branch = '0.16.20' }" >> Cargo.toml
cargo update -p ring

无法编译 jemallocator 这个依赖

解决方案：给上游发 PR，把 jemallocator 这个 crate 换成 tikv-jemallocator。

参考 PR:

https://github.com/meilisearch/MeiliSearch/pull/1692

在 QEMU-user 里编译 Rust 程序卡住了

打开 htop，摁 F4 搜索编译程序的名字，可以用方向键移动。在 S 那一列里查看应用状态，如果是 Z，即 Zombie 僵尸状态，说明编译程序卡死了，需要杀掉重开。选上这个程序之后按 k 键，左侧会弹出 Send Signal 窗口，选择 SIGABRT 信号并回车，程序就会被干掉了，然后你再重试。

Cargo component is not applicable...

error:
  the 'cargo' binary, normally provided by the 'cargo' component,
  is not applicable to the 'stable-2020-12-31-x86_64-unknown-linux-gnu' toolchain

这个错误源自于 rustup 因为更新中断或其他错误，只部分更新了工具链。这个问题不是很好修，我的建议是重新安装 cargo 工具链，能更快解决问题。

xxx have differnt source paths...

之前曾经想过往上游 PR 架构特定的依赖链接。比如像下面这样，在普通的环境使用 4.4.0 版本，在 riscv64 里用我 fork 并修好的版本。

[depedencies.ffmepg-sys-next]
version="4.4.0-next.2"
default-features=false

[target.'cfg(target_arch="riscv64")'.dependencies.ffmpeg-sys-next]
git="https://github.com/Avimitin/rust-ffmpeg-sys"
branch="risc-v"

但是这样是无法编译的：

Caused by:
  Dependency 'ffmpeg-sys-next' has different source paths depending on the build target.
  Each dependency must have a single canonical source path irrespective of build target.

Cargo 不支持对同一个依赖使用不同的源。你可以这么写：

[depedencies]
A
B

[target.'cfg(target_arch="riscv64")'.dependencies]
C
D

但是不能这么写：

[depedencies]
A = { git = "https://github.com/UserA/crate" }

[target.'cfg(target_arch="riscv64")'.dependencies]
A = { git = "https://github.com/UserB/crate" }

在依赖里用了 git 之后指定的 feature 无效了

当这样设置依赖的时候，指定的 feature 不会被编译

[dependencies.dashmap]
git = "https://github.com/Avimitin/dashmap"
branch = "risc-v"
features = ["serde"]

暂时不知道什么原因。会导致编译的时候有的 trait 没有实现在 struct 上导致传参失败之类的错误等等。

如果遇到某个模块没法编译 features 进去的话，优先把他这个包修了然后发 PR 到上游。

Golang

替换依赖

编辑 go.mod，用 replace 语法进行替换（可以放在文件最后面）。

例子：把 modernc.org/libc 的 v1.4.1 版本升到 v1.14.11 版本

replace modernc.org/libc v1.4.1 => modernc.org/libc v1.14.11

然后把这个修改生成出 patch，在 PKGBUILD 的 prepare 函数里添加：

patch -Ni ../fix-libc.patch
go get -d modernc.org/libc

最后把 riscv64.patch 和修改的 patch 放在一起，提交 PR。

Patch `go dep`

go mod edit -replace github.com/prometheus/client_golang=github.com/prometheus/client_golang@efe7aa7
go mod download github.com/prometheus/client_golang

# 这个是根据不加 go get 的报错提示来添加的
# 没报错可以不用加，注意要去掉版本号（@ 后面的部分）
go get github.com/prometheus/client_golang/prometheus

go generate  # if necessary
cd cmd/traefik
go build

具体参考： https://github.com/felixonmars/archriscv-packages/pull/346/files

给 go 包的依赖（的依赖）打patch的正确方式

Go 比较特殊，Go 是在 build 的时候 install。假设有一系列的依赖链：a->b->c->d->e。如果需要修 d，则 bcd 都需要 fork，修好 d 之后 bc 改依赖。

Go 还可以在 go.mod 里用 replace 语法来替换依赖：

replace github.com/creack/goselect => github.com/creack/goselect v0.1.2

具体参考： https://github.com/felixonmars/archriscv-packages/pull/387/commits/ad0e66c31b21efd93fff93be8cce5b9e13b59953

编译 golang 包的时候经常卡死

QEMU 的锅，直接杀掉重新开始。

bad pointer

runtime: bad pointer in frame ... at 0x...
fatal error: invalid pointer found on stack

可以跟踪一下这个 issue: https://github.com/golang/go/issues/51101

以后 go build 的时候加上这个 flag -gcflags=all="-N -l"，并且往上面那个 issue 报一下。

C/C++

config.guess: unable to guess system type

解决方案：

首先向上游汇报他们的 config.guess 和 config.sub 文件过老。请一定要向上游报告！ config.sub 上游有职责及时更新。
然后在 PKGBUILD 里尝试使用 autoreconf -fi 命令来更新 config.guess
如果上游的 config.guess 文件已经老到无法更新，将 /usr/share/autoconf/build-aux/ 目录下的 config.guess 和 config.sub 文件拷贝到本地。
提交 PR，并附带上你的上游汇报链接。
持续跟踪你的上游报告，等上游更新完并发版之后删除仓库里的 patch。

Executable "wasm-ld" doesn't exist! (STUCK)

要安装 lld，wasm-ld 只在 lld 里面有。Firefox 的话，引入 lld 会带来另外的问题。lld 是需要默认关闭的，问题来源于 -mno-relax 和 -mrelax 。

可以查看这个 PR 了解详情 https://github.com/felixonmars/archriscv-packages/pull/139

pthread related issue

标准答案： PR #1035

gcc atomic

gcc 的 1B 2B atomic 实现在 riscv64 有问题。gcc 的 1B 2B atomic 实现需要跑到 libatomic，于是没法在编译期判断是否无锁

ref: https://code.videolan.org/videolan/vlc/-/issues/20683

The GCC docs make it clear that -pthread is the correct way to use the pthread library on POSIX systems. Not only does it add extra necessary linker options for some targets, but it also adds some extra necessary preprocessor options for many targets. There is also the issue that the library is called -lpthread on some systems, and -lpthreads on others. The -pthread option handles this correctly.

It is a known problem that the RISC-V gcc atomic support needs more work. RISC-V only supports 4 and 8 byte atomic operations in hardware. 1 and 2 byte operations are implemented using instruction sequences with locks, and this is currently done via a call into libatomic. However, mixing lock free and non-locking atomic sequences can potentially cause run-time failures. We need to instead emit instruction sequences using 4 and/or 8 byte atomic instructions without locks. This is on the list of things that need to be fixed, but there are a lot of things that need to be fixed and we haven't gotten around to fixing this one yet. This definitively needs to be fixed before gcc 9, and hopefully much sooner than that.

While the RISC-V gcc port does need to be fixed, it is still true that you should be using -pthread instead of -lpthread.

除此之外还有一个很诡异的事情：就是在 gcc 里，std::atomic<bool>::is_always_lock_free 是 false，std::atomic<int>::is_always_lock_free 是 true。但是如果你在运行的时候整一个 bool b; std::atomic_is_lock_free(&b)，你会发现它是 true，而且是，不管怎么试，在哪试，它都是 true。

在 gcc 里面大概 ATOMIC_BOOL_LOCK_FREE 是 1（1 for the built-in atomic types that are sometimes lock-free)

ref: https://www.mail-archive.com/[email protected]/msg664254.html

tldr: 只能等 gcc 的人实现 sub-word lock-free atomic 了

All atomic types except for std::atomic_flag may be implemented using mutexes or other locking operations, rather than using the lock-free atomic CPU instructions. Atomic types are also allowed to be sometimes lock-free, e.g. if only aligned memory accesses are naturally atomic on a given architecture, misaligned objects of the same type have to use locks.

Thanks for the detailed writeup!

Your analysis is basically correct, but I would add that ICC's behavior here is unsound (it only appears to "implement[] this behavior correctly" in simple cases) whereas the GCC/Clang behavior is sound but surprising. For GCC, ICC, and Clang, identity <fp> and identity<float> are the same type. Therefore it's not possible for identity <fp>::type and identity<float>::type to have different alignments, because they're the same type. So, for an example such as this:

typedef float fp  __attribute__((aligned(16)));
std::cout << alignof(typename identity<fp>::type) << std::endl;
std::cout << alignof(typename identity<float>::type) << std::endl;

under GCC and Clang, both lines print out 4, whereas under ICC, they either both print out 4 or both print out 16 depending on which one happens to appear first in the program (and in general you can encounter ODR violations when using ICC despite there being nothing wrong at the source level).

Fundamentally, the 'aligned' attribute is a GCC extension, so the GCC folks get to define how it works. And they chose that instead of it resulting in a different type that's almost like the original type (for example, treating it as a type qualifier), it results in the same type, but that in some contexts that same type behaves differently. That's a semantic disaster, but it's what we live with. And in particular, the only way for template instantiation to be sound in the presence of this semantic disaster is for it to ignore all such attributes on template type arguments.

一些参考解决方案：
- https://github.com/savoirfairelinux/opendht/pull/598/files
- 如何检查 libatomic: https://github.com/danvratil/qcoro/pull/52/commits/312f2fca861b4c623481da58241a1139e013ef83
给上游看的解释范例：
- https://github.com/savoirfairelinux/opendht/pull/598#issuecomment-1086945416
- https://github.com/telegramdesktop/tdesktop/pull/24275
- 证明建立"你显式用了atomic"和"你得检查libatomic"直接关系的参考文档: https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_concurrency.html

-fno-common

Q: -fno-common 取代了 -fcommon, 这种情况优先 -fno-common 还是 patch source？

A: 优先 patch source。一般来说能不动flags尽量不动。少数例外是，比如，这个项目死了十年了，连个repo都找不到。

不过如果是 -fcommon 这种，比较推荐的是先提给上游，咱们patch着打。这样下次上游发版本arch那边更新的时候，就会莫名其妙好了，他们不用做什么，我们也只用删掉patch。

-Wno-format

如果遇到 cc1: error: -Wformat-security ignored without -Wformat [-Werror=format-security]，这个错误是因为代码被编译的时候，CFLAG 选项多了一个 -Wno-format，这个选项会关闭 -Wformat，但 -Werror=format-security 仍然保留着。

解决方案首先是让上游修，然后引用 patch，参考上面那一章节。

如果是一万年不更新的代码仓库，那可以酌情使用 ${CFLAG#-Werror=format-security} 或者 sed -i 's/ -Werror=format-security // Makefile 把这个选项删除。

但是尽量现在群里讨论，问问前辈的意见。

-Werror=format-security

-Wformat-security
   If -Wformat is specified, also warn about uses of format functions that represent possible security problems.  At present, this warns about calls to "printf" and "scanf" functions where the
   format string is not a string literal and there are no format arguments, as in "printf (foo);".  This may be a security hole if the format string came from untrusted input and contains %n.
   (This is currently a subset of what -Wformat-nonliteral warns about, but in future warnings may be added to -Wformat-security that are not included in -Wformat-nonliteral.)

首先，优先给上游发补丁。从 manpage 也可以看的出来，只需要把 printf 改成 printf("%s") 的形式即可。

除非上游十多年没动静，才可以选择把这个选项删掉。可以用： CFLAGS=${CFLAGS/-Werror=format-security/}

如果上游很难沟通，或者上游代码不太好修，优先自己修，然后和群里的前辈商量一下解决方案。

Reference: https://github.com/felixonmars/archriscv-packages/pull/559

SSE Related issue

~~放到 riscv 板子上跑，然后标 noqemu~~

现在可以用 simde2 来模拟 SSE2 实现，以提供 SSE2 支持，具体可以参考这个 PR： #3011。

Intel related

Intel 的一些包，没留 Generic 实现，只有 x86 simd 的，比如 hyperscan，这种包就直接跳过不管了。

Qmake

使用 qmake 构建的包构建时提示 Could not find qmake spec 'default'.

这种情况目前需要去板子上打包。

msse

Error: c++: error: unrecognized command-line option '-msse'

这是因为 -msse 这个编译选项暂时还不支持 x86_64 以外的 Arch，你需要查看一下这个项目的 CMakeList 选项，暂时把他关闭。

比如 cmake -DSSE=OFF。

gcc nostdlib

-nostdlib 会覆盖 -pthread

glibc optimization

error: glibc cannot be compiled without optimization

solution: 加上 -O1

unguarded-availability-new

Error:

cc1plus: error: ‘-Werror=unguarded-availability-new’: no option ‘-Wunguarded-availability-new’

这个选项只有 clang 才有，如果没有强制使用 GNU 的构建工具，可以设置环境变量 CC=clang CXX=clang++ 强制使用 clang 来编译。

如果这个项目用的 cmake，可以加两个参数：

"-DCMAKE_C_COMPILER=clang"
"-DCMAKE_CXX_COMPILER=clang++"

同时记得给上游报 BUG 让他们默认使用 clang 来编译。

Electron

v8 版本低于 9.0 的就不用修了，从 9.0 开始有实验性的 risc-v 支持。

node-sass

node-sass v6 就是不支持 node v17 的，node-sass 对 njs v17 的支持始于版本 7.0.0， 6.x 支持的 njs version：12, 14, 15, 16，所以要么就把 dependency 从 nodejs 改成 nodejs-lts-gallium，要么就催上游更新 package.json 里 node-sass 的版本，which is nonsense，因为大版本号的更新往往带来不兼容。

invalid or corrupted package (pgp signature)

如果在构建 any 包的时候遇到这个问题，可能是因为缓存冲突问题。 x86_64 和 riscv64 下的 any 包的名字是一样的，但是这两个的签名不同。如果有人构建包的时候没建缓存目录，完事儿了也没删掉，当构建的时候用到了别人的缓存就会出现这个问题。