-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
diskann support new data type(fp16/bf16) #393
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cqy123456 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@cqy123456 🔍 Important: PR Classification Needed! For efficient project management and a seamless review process, it's essential to classify your PR correctly. Here's how:
For any PR outside the kind/improvement category, ensure you link to the associated issue using the format: “issue: #”. Thanks for your efforts and contribution to the community!. |
309cf27
to
3d07dc7
Compare
src/index/diskann/diskann.cc
Outdated
@@ -32,7 +32,8 @@ | |||
namespace knowhere { | |||
template <typename DataType> | |||
class DiskANNIndexNode : public IndexNode { | |||
static_assert(std::is_same_v<DataType, fp32>, "DiskANN only support float"); | |||
static_assert(KnowhereFloatTypeCheck<DataType>::value, | |||
"DiskANN only support floating point data type(float32, float16, brain_float16)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use bfloat16
instead of brain_float16
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
tests/ut/test_diskann.cc
Outdated
@@ -355,29 +354,36 @@ TEST_CASE("Test DiskANN GetVectorByIds", "[diskann]") { | |||
json["data_path"] = kRawDataPath; | |||
json["max_degree"] = 5; | |||
json["search_list_size"] = kK; | |||
json["pq_code_budget_gb"] = sizeof(float) * dim * kNumRows * 0.03125 / (1024 * 1024 * 1024); | |||
json["pq_code_budget_gb"] = sizeof(DataType) * dim * kNumRows * 0.125 / (1024 * 1024 * 1024); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is 0.125
magic number and why it got changed from 0.03125
(1/32th) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.03125 make fp16/bf16 crash in small chunk num(<10), but this problem fixed in pr391
tests/ut/test_diskann.cc
Outdated
@@ -412,3 +418,15 @@ TEST_CASE("Test DiskANN GetVectorByIds", "[diskann]") { | |||
fs::remove_all(kDir); | |||
fs::remove(kDir); | |||
} | |||
|
|||
TEST_CASE("Test DiskANN Base ", "[diskann]") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe, a 3 tests for each of the data types instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove these cases to e2e test
tests/ut/test_diskann.cc
Outdated
BaseSearchTest<knowhere::fp16>(); | ||
} | ||
|
||
TEST_CASE("Test DiskANN GetVectorByIds", "[diskann]") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove these cases to e2e test
@@ -583,6 +585,12 @@ namespace diskann { | |||
template<typename T> | |||
float prepare_base_for_inner_products(const std::string in_file, | |||
const std::string out_file) { | |||
if (!knowhere::KnowhereFloatTypeCheck<T>::value) { | |||
std::stringstream stream; | |||
stream << "DiskANN currently only supports floating point data for IP." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would interpret this message that only float
is supported, but not fp16
or bf16
. Please mention fp16
and bf16
if they are also supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
if (!knowhere::KnowhereFloatTypeCheck<T>::value) { | ||
std::stringstream stream; | ||
stream | ||
<< "DiskANN currently only supports floating point data for Cosine." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same note for float
, fp16
and bf16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
thirdparty/DiskANN/src/distance.cpp
Outdated
float res = 0; | ||
for (size_t i = 0; i < size; i++) { | ||
res += ((float) x[i] - (float) y[i]) * ((float) x[i] - (float) y[i]); | ||
auto x_fp32 = std::make_unique<float[]>(size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, every invokation of distance calculation will trigger two vector allocations? O_o
Should I add fp16 and bf16 distance calculations into Faiss instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fp16/bf16 distance function will implemented later, like faiss::fvec_inner_product and fvec_L2sqr.
thirdparty/DiskANN/src/distance.cpp
Outdated
float res = 0; | ||
for (size_t i = 0; i < size; i++) { | ||
res += (float) x[i] * (float) y[i]; | ||
auto x_fp32 = std::make_unique<float[]>(size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
b5989fe
to
4c951e1
Compare
/lgtm |
@@ -583,6 +585,13 @@ namespace diskann { | |||
template<typename T> | |||
float prepare_base_for_inner_products(const std::string in_file, | |||
const std::string out_file) { | |||
if (!knowhere::KnowhereFloatTypeCheck<T>::value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do constexpr
4c951e1
to
bdbd137
Compare
2691198
to
1c14de3
Compare
/lgtm |
Signed-off-by: cqy123456 <[email protected]>
1c14de3
to
52f899f
Compare
/lgtm |
/kind feature |
issue: #287
related milvus pr:milvus-io/milvus#30716 , sift1m result also can see milvus-io/milvus#30716
knowhere-test pr: https://github.com/milvus-io/knowhere-test/pull/176