You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I believe there might be a missed discount inversion within the CUDA implementation of LightGBM's LambdarankNDCG rank objective class? I'm unfortunately better at math than optimized C++/CUDA so I may be missing something simple. I'll try to link to the appropriate code/lines to justify my thoughts.
Within the normal, non-CUDA implementation of GetGradientsForOneQuery within rank_objective.hpphere there are two calls to DCGCalculator::GetDiscount to get the discounts.
I believe those calls go to DCGCalculator::GetDiscounthere, which looks up pre-computed discounts.
Those pre-computed discounts seem to occur here during the DCGCalculator::Init call. Notably, the discounts are 1.0 / std::log2(2.0 + i), which aligns with the math (that the discounts are inverted).
Within the CUDA implementation of GetGradientsKernel_LambdarankNDCG within cuda_rank_objective.cuhere the discounts are not precomputed.
Instead, each discount here and here directly compute the discount as log2(2.0f + i).
It does not appear to invert the discount later in the code. In fact, both the non-CUDA and CUDA implementation are nigh-identical within that lambda gradient calculation loop (besides the above inversion discrepancy and a pre-computed sigmoid lookup table).
Reproducible example
The code runs, trains, and seemingly works fine even without the inversion. So in all likelihood I might just be misunderstanding something.
Environment info
LightGBM Version 4.6.0
Additional Comments
Since the ranks should always be 0+, I believe the bug can be fixed with simple inversions within the function:
Description
Hello, I believe there might be a missed discount inversion within the CUDA implementation of LightGBM's LambdarankNDCG rank objective class? I'm unfortunately better at math than optimized C++/CUDA so I may be missing something simple. I'll try to link to the appropriate code/lines to justify my thoughts.
Within the normal, non-CUDA implementation of
GetGradientsForOneQuery
withinrank_objective.hpp
here there are two calls toDCGCalculator::GetDiscount
to get the discounts.I believe those calls go to
DCGCalculator::GetDiscount
here, which looks up pre-computed discounts.Those pre-computed discounts seem to occur here during the
DCGCalculator::Init
call. Notably, the discounts are1.0 / std::log2(2.0 + i)
, which aligns with the math (that the discounts are inverted).Within the CUDA implementation of
GetGradientsKernel_LambdarankNDCG
withincuda_rank_objective.cu
here the discounts are not precomputed.Instead, each discount here and here directly compute the discount as
log2(2.0f + i)
.It does not appear to invert the discount later in the code. In fact, both the non-CUDA and CUDA implementation are nigh-identical within that lambda gradient calculation loop (besides the above inversion discrepancy and a pre-computed sigmoid lookup table).
Reproducible example
The code runs, trains, and seemingly works fine even without the inversion. So in all likelihood I might just be misunderstanding something.
Environment info
LightGBM Version 4.6.0
Additional Comments
Since the ranks should always be 0+, I believe the bug can be fixed with simple inversions within the function:
const double high_discount = log2(2.0f + high_rank);
->const double high_discount = 1.0f / log2(2.0f + high_rank);
const double low_discount = log2(2.0f + low_rank);
->const double low_discount = 1.0f / log2(2.0f + low_rank);
The text was updated successfully, but these errors were encountered: