Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config fuzzy watch support #11856

Conversation

stone-98
Copy link
Contributor

@stone-98 stone-98 commented Mar 20, 2024

What is the purpose of the change

Nacos supports fuzzy config listening.

Config a fuzzy listen scheme(配置模糊监听方案)

Brief changelog

Nacos supports fuzzy config listening.

Verifying this change

XXXX

Follow this checklist to help us incorporate your contribution quickly and easily:

  • Make sure there is a Github issue filed for the change (usually before you start working on it). Trivial changes like typos do not require a Github issue. Your pull request should address just this issue, without pulling in other changes - one PR resolves one issue.
  • Format the pull request title like [ISSUE #123] Fix UnknownException when host config not exist. Each commit in the pull request should have a meaningful subject line and body.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Write necessary unit-test to verify your logic correction, more mock a little better when cross module dependency exist. If the new feature or significant change is committed, please remember to add integration-test in test module.
  • Run mvn -B clean package apache-rat:check findbugs:findbugs -Dmaven.test.skip=true to make sure basic checks pass. Run mvn clean install -DskipITs to make sure unit-test pass. Run mvn clean test-compile failsafe:integration-test to make sure integration-test pass.

@codecov-commenter
Copy link

codecov-commenter commented Mar 20, 2024

Codecov Report

Attention: Patch coverage is 15.03532% with 842 lines in your changes are missing coverage. Please review.

Project coverage is 66.61%. Comparing base (64b9e8d) to head (7f84376).
Report is 13 commits behind head on develop.

Additional details and impacted files

Impacted file tree graph

@@              Coverage Diff              @@
##             develop   #11856      +/-   ##
=============================================
- Coverage      67.81%   66.61%   -1.20%     
- Complexity      8893     8945      +52     
=============================================
  Files           1237     1254      +17     
  Lines          40443    41399     +956     
  Branches        4286     4422     +136     
=============================================
+ Hits           27425    27580     +155     
- Misses         11046    11805     +759     
- Partials        1972     2014      +42     
Files Coverage Δ
...acos/client/config/impl/ConfigTransportClient.java 86.11% <ø> (ø)
...n/java/com/alibaba/nacos/api/common/Constants.java 53.84% <0.00%> (-4.49%) ⬇️
...emote/response/ConfigBatchFuzzyListenResponse.java 0.00% <0.00%> (ø)
...mote/response/FuzzyListenNotifyChangeResponse.java 0.00% <0.00%> (ø)
...remote/response/FuzzyListenNotifyDiffResponse.java 0.00% <0.00%> (ø)
...m/alibaba/nacos/config/server/model/CacheItem.java 76.38% <0.00%> (-3.33%) ⬇️
...java/com/alibaba/nacos/client/utils/ParamUtil.java 86.17% <62.50%> (-5.88%) ⬇️
...onfig/server/configuration/ConfigCommonConfig.java 72.72% <40.00%> (-27.28%) ⬇️
...mote/request/AbstractFuzzyListenNotifyRequest.java 0.00% <0.00%> (ø)
...acos/config/server/service/ConfigCacheService.java 75.34% <0.00%> (-1.09%) ⬇️
... and 17 more

... and 32 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 64b9e8d...7f84376. Read the comment docs.

@KomachiSion KomachiSion added the Nacos3.0 Nacos 3.0 Architecture Evolution label Mar 25, 2024
@KomachiSion
Copy link
Collaborator

Is this PR should be merged into Nacos 3.0 branch? @shiyiyue1102

// Check if the context is consistent with the server
if (context.getIsConsistentWithServer().get()) {
// Skip if a full synchronization is not needed
if (!needAllSync) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么不在前面直接返回,要在for循环来continue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为ClientWorker中包含所有的模糊监听上下文(多个模糊监听表达式),每个都有属于自身的isConsistentWithServer字段,所以需要在for循环里面continue。(可能存在有些已经与服务端同步的,有些没有同步的)

/**
* Flag indicating whether the context is discarded.
*/
private volatile boolean isDiscard = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isDiscard字段的存在是不是让逻辑更复杂了,取消订阅的时候直接删除订阅的数据是不是就可以了

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isDiscard如果为空时,代表此模糊监听上下文已经没有Listener,需要通知服务端取消订阅,让服务端不再进行通知。
如果去除该字段,直接用listener == null判断其实也可行。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不过原有普通订阅也有这个字段,保持一致没毛病

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

// Get the connection for the client
Connection connection = connectionManager.getConnection(event.getClientId());
if (connection == null) {
// If connection is not available, return
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里增加一些日志更好,不然可能客户端任务订阅成了,但是服务端这边又没有日志验证

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

ConfigExecutor.getClientConfigNotifierServiceExecutor()
.schedule(retryTask, retryTask.tryTimes * 2L, TimeUnit.SECONDS);
} else {
// Client is already offline, ignore the task.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议补充client 离线的 log,方便后续问题排查

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

hujun-w-2
hujun-w-2 previously approved these changes Apr 1, 2024
// Execute the fuzzy listen operation
ConfigBatchFuzzyListenResponse listenResponse = (ConfigBatchFuzzyListenResponse) requestProxy(
rpcClient, configBatchFuzzyListenRequest);
if (listenResponse != null && listenResponse.isSuccess()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的设计是只要请求服务端成功就认为本地的context列表与服务端一致?

Copy link
Contributor Author

@stone-98 stone-98 Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FuzzyListenContext的字段如下:
isConsistentWithServer:是否与服务端同步
isInitializing:是否在初始化中
当请求服务端成功,isConsistentWithServer = true,代表本地的context与服务端同步了(但是列表未一致),但是此时isInitializing = true,需要等到服务端将所有的配置推送到客户端后,才将isInitializing = false,代表已经完成了初始化,此时本地的context的配置列表才与服务端一致。
然后在断开连接时,会将isConsistentWithServer = false,代表需要重新同步。

@hujun-w-2
Copy link
Collaborator

目前的实现是垮了 namespace ,这个与 Nacos 的隔离理念是否不符

@stone-98
Copy link
Contributor Author

stone-98 commented Apr 3, 2024

目前的实现是垮了 namespace ,这个与 Nacos 的隔离理念是否不符

大佬,我没太理解这句话的意思,目前的实现客户端是不跨namespace的,服务端是跨namespace的,只是说request带了namespace。

@hujun-w-2
Copy link
Collaborator

目前的实现是垮了 namespace ,这个与 Nacos 的隔离理念是否不符

大佬,我没太理解这句话的意思,目前的实现客户端是不跨namespace的,服务端是跨namespace的,只是说request带了namespace。

嗯,我是担心不同namespace的监听数据冲突,我刚刚又仔细看了一下,确实客户端维护的 fuzzyListenContextMap 是每个ns都是单独的,服务端的唯一key也包含了namespace。

@hujun-w-2
Copy link
Collaborator

@KomachiSion 是否考虑先合并到2.x呢?

@hujun-w-2
Copy link
Collaborator

以前单个配置监听定时批量监听的时候会返回change的配置列表,相当于有个兜底的与服务端对账,模糊监听的当时有考虑这个吗

}

// Execute fuzzy listen operation for addition
doExecuteConfigFuzzyListen(addContextMap, true);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里先分类成 addContextMap和removeContextMap,再调用两次doExecuteConfigFuzzyListen 看着有点绕,为什么不直接遍历fuzzyListenContextMap,执行一次doExecuteConfigFuzzyListen呢,反正doExecuteConfigFuzzyListen内部都会判断分类

Copy link
Contributor Author

@stone-98 stone-98 Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

确实,仔细看了下,这个确实有点复杂了,后续我更新下。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

/**
* Set of fuzzy listening contexts.
*/
private Set<Context> contexts = new HashSet<>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的context为什么不直接用FuzzyListenContext ,相当于两个类做同样的事情

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里理论上使用FuzzyListenContext是ok的,但是FuzzyListenContext 不单单是一个实体,还承载着通知listener的能力,所以这个请求里面就直接使用了一个内部类去存关键字段,这里是否有必要直接使用FuzzyListenContext我也有点纠结......

@stone-98
Copy link
Contributor Author

stone-98 commented Apr 6, 2024

以前单个配置监听定时批量监听的时候会返回change的配置列表,相当于有个兜底的与服务端对账,模糊监听的当时有考虑这个吗

这里的ConfigBatchFuzzyListenRequest分别有两个作用:

  • 创建以及取消模糊监听
  • 对账(客户端此上下文匹配的配置与服务端此上下文匹配的配置进行对账,然后将变更的配置会推送至SDK)

@stone-98 stone-98 changed the title Nacos supports fuzzy config listening. Nacos supports fuzzy config listen. Apr 7, 2024
@hujun-w-2
Copy link
Collaborator

以前单个配置监听定时批量监听的时候会返回change的配置列表,相当于有个兜底的与服务端对账,模糊监听的当时有考虑这个吗

这里的ConfigBatchFuzzyListenRequest分别有两个作用:

  • 创建以及取消模糊监听
  • 对账(客户端此上下文匹配的配置与服务端此上下文匹配的配置进行对账,然后将变更的配置会推送至SDK)

目前对账的逻辑是每隔5分钟服务端会把所有配置重新推一遍,这个开销是不是太大了,如果多个客户端同时监听服务端多个配置,那推送的量是很大的,这块感觉需要优化一下

@hujun-w-2
Copy link
Collaborator

以前单个配置监听定时批量监听的时候会返回change的配置列表,相当于有个兜底的与服务端对账,模糊监听的当时有考虑这个吗

这里的ConfigBatchFuzzyListenRequest分别有两个作用:

  • 创建以及取消模糊监听
  • 对账(客户端此上下文匹配的配置与服务端此上下文匹配的配置进行对账,然后将变更的配置会推送至SDK)

目前对账的逻辑是每隔5分钟服务端会把所有配置重新推一遍,这个开销是不是太大了,如果多个客户端同时监听服务端多个配置,那推送的量是很大的,这块感觉需要优化一下

优化思路:客户端批量对比的时候将同一个groupPattern下的配置 md5 集合再计算一个 md5 提供给服务端,服务端如果对比发现有变更的话告诉客户端需要变更,客户端再将同一个groupPattern下的所有 md5 传给服务端,服务端再将不一致的重新推一编。这个只是一个简单的思路,应该还有更好的方案

@stone-98
Copy link
Contributor Author

以前单个配置监听定时批量监听的时候会返回change的配置列表,相当于有个兜底的与服务端对账,模糊监听的当时有考虑这个吗

这里的ConfigBatchFuzzyListenRequest分别有两个作用:

  • 创建以及取消模糊监听
  • 对账(客户端此上下文匹配的配置与服务端此上下文匹配的配置进行对账,然后将变更的配置会推送至SDK)

目前对账的逻辑是每隔5分钟服务端会把所有配置重新推一遍,这个开销是不是太大了,如果多个客户端同时监听服务端多个配置,那推送的量是很大的,这块感觉需要优化一下

优化思路:客户端批量对比的时候将同一个groupPattern下的配置 md5 集合再计算一个 md5 提供给服务端,服务端如果对比发现有变更的话告诉客户端需要变更,客户端再将同一个groupPattern下的所有 md5 传给服务端,服务端再将不一致的重新推一编。这个只是一个简单的思路,应该还有更好的方案

后续有时间会处理一下。

@shiyiyue1102 shiyiyue1102 changed the title Nacos supports fuzzy config listen. Config fuzzy watch support Dec 26, 2024
@shiyiyue1102 shiyiyue1102 added area/Config area/Client Related to Nacos Client SDK labels Dec 26, 2024
@shiyiyue1102 shiyiyue1102 changed the base branch from develop to v3.0-develop December 26, 2024 10:25
@shiyiyue1102 shiyiyue1102 changed the base branch from v3.0-develop to v3.0-develop-support-fuzzy-watch December 27, 2024 02:07
@shiyiyue1102 shiyiyue1102 merged commit 17bd04e into alibaba:v3.0-develop-support-fuzzy-watch Dec 27, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/Client Related to Nacos Client SDK area/Config Nacos3.0 Nacos 3.0 Architecture Evolution
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants