You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently, for native libraries, we introduced a change to interact with the files via indexinput and indexoutput. With this, we should be able to remove our custom compoundformat in our codec (see #2185).
However, when removing it, we get an error like:
Caused by: org.apache.lucene.index.CorruptIndexException: compound sub-files must have a valid codec header and footer: codec header mismatch: actual header=1232620912 vs expected header=1071082519 (resource=BufferedChecksumIndexInput(MemorySegmentIndexInput(path="/Users/jmazane/workspace/Opensearch/DockerRunner/k-NN-1/build/testclusters/integTest-0/data/nodes/0/indices/6zG12XzjQLaWyWiAL_OFaQ/0/index/_0_165_test_nested.test_vector.faiss")))
at org.apache.lucene.codecs.CodecUtil.verifyAndCopyIndexHeader(CodecUtil.java:287) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
We should get rid of the CompoundFormat so we can move towards just extending the PerFieldVectorFormat. To do this, we need to write the header, and make sure to read it before forwarding on the output to the underlying libraries.
From a bwc perspective, for old codecs, we will need to keep around the old CompoundFormat. But we should be able to remove on new codecs.
The text was updated successfully, but these errors were encountered:
Description
Recently, for native libraries, we introduced a change to interact with the files via indexinput and indexoutput. With this, we should be able to remove our custom compoundformat in our codec (see #2185).
However, when removing it, we get an error like:
This is because for the native index files we write a footer but no header: https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/codec/nativeindex/NativeIndexWriter.java#L141-L150. See CompoundFormat interface: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/CompoundFormat.java#L41-L45.
We should get rid of the CompoundFormat so we can move towards just extending the PerFieldVectorFormat. To do this, we need to write the header, and make sure to read it before forwarding on the output to the underlying libraries.
From a bwc perspective, for old codecs, we will need to keep around the old CompoundFormat. But we should be able to remove on new codecs.
The text was updated successfully, but these errors were encountered: