Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perl 正则中后置约束贪婪匹配字符过长的问题 #25

Open
hexh250786313 opened this issue Sep 23, 2022 · 0 comments
Open

perl 正则中后置约束贪婪匹配字符过长的问题 #25

hexh250786313 opened this issue Sep 23, 2022 · 0 comments
Labels
post blog post type: POST

Comments

@hexh250786313
Copy link
Owner

hexh250786313 commented Sep 23, 2022

不要点开, 博客网站用的
博文标题图片

pic

博文置顶图片

pic

博文置顶说明
最近写一些文本处理脚本的时候遇到了使用 perl 提示 "Lookbehind longer than 255 not implemented in regex" 这样的错误, 不是什么大问题, StackOverflow 里也能找到答案, 但是中文互联网上却没有相关的条目, 于是这里稍微记录下

相关

背景

我的脚本调用了 perl 来做文本处理, 里面有一个正则用到了后置约束:

perl -0777 -i -pe "s/(?<!(.*\\S.*))name:.*/name:\ 'hexh'/gi" ./test.txt

目的是把不带任何非空前缀的 name 值改为 hexh, 例如:

// source:
  name: 'hexuhua'
  fullname: 'hexuhua'

// expected:
  name: 'hexh'
  fullname: 'hexuhua'

但是却报错了:

Lookbehind longer than 255 not implemented in regex m/(?<!(.*\S.*))name:.*/ at -e line 1.

方案

根据这篇博客: http://blogs.perl.org/users/tom_wyant/2019/03/native-variable-length-lookbehind.html

Now, there is at least one restriction. No lookbehind assertion can be more than 255 characters long. This limit has been around, as nearly as I can tell, ever since lookaround assertions were introduced in 5.005. But it has been lightly documented until now. This restriction means you can not use quantifiers * or +. But bracketed quantifiers are OK, as is ?.

大概翻译下: 任何 lookbehind 断言的长度都不能超过 255 个字符, 自从 5.005 版引入 lookbehind 断言以来, 这个限制就一直存在, 这个限制意味着你不能使用 .* 或者 .+ 这样的贪婪匹配, 而用非贪婪匹配如: 大括号限制 255 字符内或者 .? 的形式则是没问题的

那么也就是说对于上述正则: s/(?<!(.*\\S.*))name:.*/name:\ 'hexh'/gi 的问题就出在了后置约束 ?<!(.*\\S.*) 中, perl 要求约束中的字符不能用贪婪匹配且少于 255 个字符

由此分析, 可以改成类似这样的形式: ?<!({0,127}\\S.{0,127}), 保证括弧内的字符数量少于等于 255 个即可

最终命令如下:

perl -0777 -i -pe "s/(?<!(.{0,127}\\S.{0,127}))name:.*/name:\ 'hexh'/gi" ./test.txt

值得一提

值得一提的是, 后置约束对于 perl 来说依然属于实验性功能, 每次用后置断言后它都会有这样的提示:

Variable length negative lookbehind with capturing is experimental in regex;

@hexh250786313 hexh250786313 added the post blog post type: POST label Sep 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
post blog post type: POST
Projects
None yet
Development

No branches or pull requests

1 participant