Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

People's Daily Database/人民日报图文数据库 图书馆VPN访问 #354

Closed
2 tasks done
doubanchan opened this issue Jun 20, 2024 · 3 comments
Closed
2 tasks done
Labels
bug Something isn't working

Comments

@doubanchan
Copy link

doubanchan commented Jun 20, 2024

你遇到了什么问题? [必填]

  • 无法识别条目
  • 文章作者提取

问题描述 [必填]
1、通过图书馆VPN系统访问人民日报图文数据库,网址前面的 http://data.people.com.cn 替换成了图书馆的代理网址,网页其他格式未变。
2、
(2.1)2024-06-20第17版第1篇中含有作者,不过格式不是“【作者:本报评论员】”(此样式是原testcast中2023-12-31第1版第5篇),只有作者名称。
(2.2)2024-06-20第1版第2篇,作者是“李林蔚”,将末尾的“蔚”截掉了,是因为slice(4, -1)
你的预期结果
正常识别。

@doubanchan doubanchan added the bug Something isn't working label Jun 20, 2024
@doubanchan doubanchan mentioned this issue Jun 20, 2024
3 tasks
@doubanchan
Copy link
Author

页面识别个人临时解决办法:将targte对应为 ^https?://.*(data\\.people\\.com\\.cn)?/rmrb

@jiaojiaodubai
Copy link
Collaborator

不建议更改target,正如 #347 提到的,应该使用 Connector 的 Proxy来兼容 VPN,否则我们不可能用正则表达式涵盖所有机构的 VPN。

@doubanchan
Copy link
Author

@jiaojiaodubai 深圳图书馆的VPN代理网址,每次登录都会更新后缀,似乎没办法添加到Proxy 中,如果代理地址是固定的,应该没可以。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants