Remove FileWatcher from KNN. #2182

0ctopus13prime · 2024-10-03T21:43:28Z

Description

This PR is to remove FileWatcher from KNN.
FileWatcher periodically monitors a vector index file then clean up resource once it is deleted.
However, as outlined in the doc above, releasing allocated native memory is well being dealt with in 80DV Producer already. Therefore, we can safely remove FileWatcher.
Just that, in since the newest version is now using vector format, I've added the same cleaning logic in there as well.

Related Issues

Move away from file watcher for releasing memory : Move away from file watcher for releasing memory #1885
[BUG] KNN doesn't release memory when close index : [BUG] KNN doesn't release memory when close index #1012

Check List

New functionality includes testing.
New functionality has been documented.
API changes companion pull request created.
Commits are signed per the DCO using --signoff.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

0ctopus13prime · 2024-10-03T21:44:10Z

@jmazanec15
Please note that this is the initial PR for code sharing.
Will add integ testings + unit tests shortly.

src/main/java/org/opensearch/knn/index/codec/KNN990Codec/NativeEngines990KnnVectorsReader.java

src/main/java/org/opensearch/knn/index/codec/KNN80Codec/KNN80DocValuesProducer.java

src/main/java/org/opensearch/knn/index/codec/util/KNNCodecUtil.java

src/main/java/org/opensearch/knn/index/codec/KNN990Codec/NativeEngines990KnnVectorsReader.java

0ctopus13prime · 2024-10-15T16:57:31Z

Memory usage monitoring

During the benchmark, the memory usage stayed pretty much the same as the baseline.

Candidate

Baseline

shatejas

Do verify the latencies as well since you are doing the benchmarking

src/main/java/org/opensearch/knn/index/codec/KNN990Codec/NativeEngines990KnnVectorsReader.java

src/main/java/org/opensearch/knn/index/codec/util/KNNCodecUtil.java

src/main/java/org/opensearch/knn/index/codec/KNN80Codec/KNN80DocValuesProducer.java

src/main/java/org/opensearch/knn/index/memory/NativeMemoryCacheManager.java

src/main/java/org/opensearch/knn/index/memory/NativeMemoryAllocation.java

… it to other. Signed-off-by: Dooyong Kim <[email protected]>

Signed-off-by: Dooyong Kim <[email protected]>

0ctopus13prime · 2024-10-17T06:21:32Z

Do verify the latencies as well since you are doing the benchmarking

yeah, that's the plan.
I've stuck in fixing subtle issues, and I think it's good to go benchmarking.
I'm not expecting there are sever performance regression, but will post the gap in this issue.

Signed-off-by: Dooyong Kim <[email protected]>

0ctopus13prime · 2024-10-17T06:35:16Z

Please note that in NativeIndexWriter, we still have FSDirectory dependency.
Which will be dealt with in the next PR as a part of introducing writing layer in native engines.

0ctopus13prime · 2024-10-17T21:02:09Z

Found few testing cases are not updated according to this change.
Will fix the testing cases and re-raise PR.

0ctopus13prime · 2024-10-17T23:56:30Z

Performance Benchmark Comparison

Conclusion : I could not find an implication that having this change would degrade performance.

Metric	Task	Baseline-Value	Candidate-Value	(Candidate - Baseline) / Baseline	Unit
Cumulative indexing time of primary shards		30.4969	30.3151	-0.005961262	min
Min cumulative indexing time across primary shards		0.00015	0.000133333	-0.111113333	min
Median cumulative indexing time across primary shards		15.2484	15.1575	-0.005961281	min
Max cumulative indexing time across primary shards		30.4967	30.315	-0.005958022	min
Cumulative indexing throttle time of primary shards		0	0	0	min
Min cumulative indexing throttle time across primary shards		0	0	0	min
Median cumulative indexing throttle time across primary shards		0	0	0	min
Max cumulative indexing throttle time across primary shards		0	0	0	min
Cumulative merge time of primary shards		175.918	172.859	-0.017388783	min
Cumulative merge count of primary shards		117	116	-0.008547009
Min cumulative merge time across primary shards		0	0	0	min
Median cumulative merge time across primary shards		87.9589	86.4296	-0.017386529	min
Max cumulative merge time across primary shards		175.918	172.859	-0.017388783	min
Cumulative merge throttle time of primary shards		0.8007	0.89955	0.123454477	min
Min cumulative merge throttle time across primary shards		0	0	0	min
Median cumulative merge throttle time across primary shards		0.40035	0.449775	0.123454477	min
Max cumulative merge throttle time across primary shards		0.8007	0.89955	0.123454477	min
Cumulative refresh time of primary shards		1.2548	1.44825	0.154167995	min
Cumulative refresh count of primary shards		90	88	-0.022222222
Min cumulative refresh time across primary shards		0.0003	0.0003	0	min
Median cumulative refresh time across primary shards		0.6274	0.724125	0.154167995	min
Max cumulative refresh time across primary shards		1.2545	1.44795	0.154204862	min
Cumulative flush time of primary shards		12.2934	11.8517	-0.035929849	min
Cumulative flush count of primary shards		58	58	0
Min cumulative flush time across primary shards		0	0	0	min
Median cumulative flush time across primary shards		6.14672	5.92586	-0.035931359	min
Max cumulative flush time across primary shards		12.2934	11.8517	-0.035929849	min
Total Young Gen GC time		0.332	0.413	0.243975904	s
Total Young Gen GC count		18	18	0
Total Old Gen GC time		0	0	0	s
Total Old Gen GC count		0	0	0
Store size		29.8184	29.8183	-3.35363E-06	GB
Translog size		5.82E-07	5.82E-07	0	GB
Heap used for segments		0	0	0	MB
Heap used for doc values		0	0	0	MB
Heap used for terms		0	0	0	MB
Heap used for norms		0	0	0	MB
Heap used for points		0	0	0	MB
Heap used for stored fields		0	0	0	MB
Segment count		2	2	0
Min Throughput	custom-vector-bulk	7290.25	3034.92	-0.583701519	docs/s
Mean Throughput	custom-vector-bulk	11204.8	11096	-0.009710124	docs/s
Median Throughput	custom-vector-bulk	10741.7	10421.3	-0.029827681	docs/s
Max Throughput	custom-vector-bulk	21616.1	20570.6	-0.048366727	docs/s
50th percentile latency	custom-vector-bulk	80.3333	80.4247	0.00113776	ms
90th percentile latency	custom-vector-bulk	166.626	164.697	-0.011576825	ms
99th percentile latency	custom-vector-bulk	315.914	319.73	0.012079237	ms
99.9th percentile latency	custom-vector-bulk	1505.69	1567.09	0.040778646	ms
99.99th percentile latency	custom-vector-bulk	4159.22	4244.81	0.020578378	ms
100th percentile latency	custom-vector-bulk	6280.65	7164.91	0.14079116	ms
50th percentile service time	custom-vector-bulk	80.3333	80.4247	0.00113776	ms
90th percentile service time	custom-vector-bulk	166.626	164.697	-0.011576825	ms
99th percentile service time	custom-vector-bulk	315.914	319.73	0.012079237	ms
99.9th percentile service time	custom-vector-bulk	1505.69	1567.09	0.040778646	ms
99.99th percentile service time	custom-vector-bulk	4159.22	4244.81	0.020578378	ms
100th percentile service time	custom-vector-bulk	6280.65	7164.91	0.14079116	ms
error rate	custom-vector-bulk	0	0	0	%
Min Throughput	force-merge-segments	0	0	0	ops/s
Mean Throughput	force-merge-segments	0	0	0	ops/s
Median Throughput	force-merge-segments	0	0	0	ops/s
Max Throughput	force-merge-segments	0	0	0	ops/s
100th percentile latency	force-merge-segments	6.17E+06	6.10E+06	-0.011397476	ms
100th percentile service time	force-merge-segments	6.17E+06	6.10E+06	-0.011397476	ms
error rate	force-merge-segments	0	0	0	%
Min Throughput	warmup-indices	0.23	0.23	0	ops/s
Mean Throughput	warmup-indices	0.23	0.23	0	ops/s
Median Throughput	warmup-indices	0.23	0.23	0	ops/s
Max Throughput	warmup-indices	0.23	0.23	0	ops/s
100th percentile latency	warmup-indices	4423.45	4382.95	-0.009155749	ms
100th percentile service time	warmup-indices	4423.45	4382.95	-0.009155749	ms
error rate	warmup-indices	0	0	0	%
Min Throughput	prod-queries	0.67	0.67	0	ops/s
Mean Throughput	prod-queries	0.67	1	0.492537313	ops/s
Median Throughput	prod-queries	0.67	0.67	0	ops/s
Max Throughput	prod-queries	0.67	1.66	1.47761194	ops/s
50th percentile latency	prod-queries	8.23135	8.26458	0.004037005	ms
90th percentile latency	prod-queries	9.44003	9.03913	-0.042468085	ms
99th percentile latency	prod-queries	26.661	26.0028	-0.024687746	ms
100th percentile latency	prod-queries	1499.85	1482.8	-0.011367803	ms
50th percentile service time	prod-queries	8.23135	8.26458	0.004037005	ms
90th percentile service time	prod-queries	9.44003	9.03913	-0.042468085	ms
99th percentile service time	prod-queries	26.661	26.0028	-0.024687746	ms
100th percentile service time	prod-queries	1499.85	1482.8	-0.011367803	ms
error rate	prod-queries	0	0	0	%
Mean recall@k	prod-queries	0.3	0.29	-0.033333333
Mean recall@1	prod-queries	0.37	0.28	-0.243243243

Signed-off-by: Dooyong Kim <[email protected]>

CHANGELOG.md

0ctopus13prime · 2024-10-18T19:10:24Z

src/main/java/org/opensearch/knn/index/KNNSettings.java

@@ -562,8 +562,10 @@ public static boolean isFaissAVX2Disabled() {

    public static boolean isFaissAVX512Disabled() {
        return Booleans.parseBoolean(
-            KNNSettings.state().getSettingValue(KNNSettings.KNN_FAISS_AVX512_DISABLED).toString(),


KNNSettings.state().getSettingValue(KNNSettings.KNN_FAISS_AVX512_DISABLED) can return null value, so we should not call toString without checking its nullity.

src/main/java/org/opensearch/knn/index/codec/util/NativeMemoryCacheKeyHelper.java

src/main/java/org/opensearch/knn/index/codec/util/KNNCodecUtil.java

src/main/java/org/opensearch/knn/index/codec/KNN990Codec/NativeEngines990KnnVectorsReader.java

src/main/java/org/opensearch/knn/index/codec/util/KNNCodecUtil.java

Signed-off-by: Dooyong Kim <[email protected]>

Signed-off-by: Doo Yong Kim <[email protected]>

shatejas

Looks good, waiting for the CI to pass

jmazanec15

LGTM

opensearch-trigger-bot · 2024-10-21T22:54:14Z

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-2182-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 e5599aa151cf43700d0ea327ad536cc106e5bb61
# Push it to GitHub
git push --set-upstream origin backport/backport-2182-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-2182-to-2.x.

0ctopus13prime · 2024-10-21T22:54:47Z

Looks like auto backporting is failing. Will raise a backport PR shortly.

Signed-off-by: Dooyong Kim <[email protected]> (cherry picked from commit e5599aa)

0ctopus13prime · 2024-10-21T23:02:17Z

Backport PR : #2225

Signed-off-by: Dooyong Kim <[email protected]> (cherry picked from commit e5599aa)

0ctopus13prime requested review from heemin32, navneet1v, VijayanB, vamshin, jmazanec15, naveentatikonda, junqiu-lei, martin-gaievski, ryanbogan and luyuncheng as code owners October 3, 2024 21:43

0ctopus13prime mentioned this pull request Oct 3, 2024

Move away from file watcher for releasing memory #1885

Closed

VijayanB reviewed Oct 3, 2024

View reviewed changes

src/main/java/org/opensearch/knn/index/codec/KNN990Codec/NativeEngines990KnnVectorsReader.java Outdated Show resolved Hide resolved

VijayanB reviewed Oct 3, 2024

View reviewed changes

src/main/java/org/opensearch/knn/index/codec/KNN990Codec/NativeEngines990KnnVectorsReader.java Outdated Show resolved Hide resolved

VijayanB reviewed Oct 3, 2024

View reviewed changes

src/main/java/org/opensearch/knn/index/codec/KNN80Codec/KNN80DocValuesProducer.java Outdated Show resolved Hide resolved

VijayanB reviewed Oct 3, 2024

View reviewed changes

src/main/java/org/opensearch/knn/index/codec/util/KNNCodecUtil.java Outdated Show resolved Hide resolved

0ctopus13prime commented Oct 3, 2024

View reviewed changes

src/main/java/org/opensearch/knn/index/codec/KNN990Codec/NativeEngines990KnnVectorsReader.java Outdated Show resolved Hide resolved

shatejas reviewed Oct 15, 2024

View reviewed changes

Dooyong Kim added 4 commits October 15, 2024 11:13

Remove FileWatcher and the logic obtaining a full file path then pass…

a0dadac

… it to other. Signed-off-by: Dooyong Kim <[email protected]>

nit: remove redundant loggins and made it use a concise API.

e6b88eb

Signed-off-by: Dooyong Kim <[email protected]>

Fix a bug accessing the first element without checking list's emptiness.

61aecfa

Signed-off-by: Dooyong Kim <[email protected]>

Remove FSDirectory dependency in KNNIndexShard

f383f82

Signed-off-by: Dooyong Kim <[email protected]>

jmazanec15 mentioned this pull request Oct 16, 2024

[BUG] KNN doesn't release memory when close index #1012

Closed

Deprecate FSDirectory dependency and FileWatcher.

94058d3

Signed-off-by: Dooyong Kim <[email protected]>

0ctopus13prime force-pushed the decomission-file-watcher branch from f8638be to 94058d3 Compare October 17, 2024 06:16

Fix fillIndexToVectorFileName to use member field segment read state.

da59f34

Signed-off-by: Dooyong Kim <[email protected]>

Added java comments, java docs.

c380d84

Signed-off-by: Dooyong Kim <[email protected]>

Fix unit tests and integration tests.

88509d4

Signed-off-by: Dooyong Kim <[email protected]>

jmazanec15 reviewed Oct 18, 2024

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

0ctopus13prime commented Oct 18, 2024

View reviewed changes

shatejas reviewed Oct 18, 2024

View reviewed changes

0ctopus13prime force-pushed the decomission-file-watcher branch 2 times, most recently from 1a52a1b to 6c4fe69 Compare October 21, 2024 18:11

shatejas reviewed Oct 21, 2024

View reviewed changes

src/main/java/org/opensearch/knn/index/codec/util/KNNCodecUtil.java Outdated Show resolved Hide resolved

src/main/java/org/opensearch/knn/index/codec/util/KNNCodecUtil.java Outdated Show resolved Hide resolved

Reflecting feedbacks (moving logics + new branding) during code review.

bb677d6

Signed-off-by: Dooyong Kim <[email protected]>

0ctopus13prime force-pushed the decomission-file-watcher branch from a8494a0 to bb677d6 Compare October 21, 2024 21:32

Merge branch 'main' into decomission-file-watcher

1753cb5

Signed-off-by: Doo Yong Kim <[email protected]>

shatejas reviewed Oct 21, 2024

View reviewed changes

jmazanec15 approved these changes Oct 21, 2024

View reviewed changes

shatejas approved these changes Oct 21, 2024

View reviewed changes

jmazanec15 merged commit e5599aa into opensearch-project:main Oct 21, 2024
29 of 30 checks passed

jmazanec15 added the backport 2.x label Oct 21, 2024

jmazanec15 added the Refactoring Improve the design, structure, and implementation while preserving its functionality label Oct 21, 2024

0ctopus13prime deleted the decomission-file-watcher branch October 21, 2024 22:58

0ctopus13prime restored the decomission-file-watcher branch October 21, 2024 22:58

0ctopus13prime added a commit to 0ctopus13prime/k-NN that referenced this pull request Oct 21, 2024

Remove FileWatcher from KNN (opensearch-project#2182)

77e2e61

Signed-off-by: Dooyong Kim <[email protected]> (cherry picked from commit e5599aa)

0ctopus13prime mentioned this pull request Oct 21, 2024

[Backport 2.x] Remove FileWatcher from KNN (#2182) #2225

Merged

0ctopus13prime added a commit to 0ctopus13prime/k-NN that referenced this pull request Oct 22, 2024

Remove FileWatcher from KNN (opensearch-project#2182)

03be40b

Signed-off-by: Dooyong Kim <[email protected]> (cherry picked from commit e5599aa)

naveentatikonda pushed a commit that referenced this pull request Oct 22, 2024

Remove FileWatcher from KNN (#2182) (#2225)

ee00593

Signed-off-by: Dooyong Kim <[email protected]> (cherry picked from commit e5599aa)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove FileWatcher from KNN. #2182

Remove FileWatcher from KNN. #2182

0ctopus13prime commented Oct 3, 2024

0ctopus13prime commented Oct 3, 2024

0ctopus13prime commented Oct 15, 2024

shatejas left a comment

0ctopus13prime commented Oct 17, 2024

0ctopus13prime commented Oct 17, 2024

0ctopus13prime commented Oct 17, 2024

0ctopus13prime commented Oct 17, 2024

0ctopus13prime Oct 18, 2024

shatejas left a comment

jmazanec15 left a comment

opensearch-trigger-bot bot commented Oct 21, 2024

0ctopus13prime commented Oct 21, 2024

0ctopus13prime commented Oct 21, 2024

Remove FileWatcher from KNN. #2182

Remove FileWatcher from KNN. #2182

Conversation

0ctopus13prime commented Oct 3, 2024

Description

Related Issues

Check List

0ctopus13prime commented Oct 3, 2024

0ctopus13prime commented Oct 15, 2024

Memory usage monitoring

Candidate

Baseline

shatejas left a comment

Choose a reason for hiding this comment

0ctopus13prime commented Oct 17, 2024

0ctopus13prime commented Oct 17, 2024

0ctopus13prime commented Oct 17, 2024

0ctopus13prime commented Oct 17, 2024

Performance Benchmark Comparison

0ctopus13prime Oct 18, 2024

Choose a reason for hiding this comment

shatejas left a comment

Choose a reason for hiding this comment

jmazanec15 left a comment

Choose a reason for hiding this comment

opensearch-trigger-bot bot commented Oct 21, 2024

0ctopus13prime commented Oct 21, 2024

0ctopus13prime commented Oct 21, 2024